« Scaling Law: Compute Optimal Curves on a Simple Model
June 07, 2024, 9:00 AM - 9:45 AM
Location:
DIMACS Center
Rutgers University
CoRE Building
96 Frelinghuysen Road
Piscataway, NJ 08854
Click here for map.
Courtney Paquette, McGill University
We describe a program of analysis of stochastic gradient methods on high dimensional random objectives. We illustrate some assumptions under which the loss curves are universal, in that they can completely be described in terms of some underlying covariances. Furthermore, we give description of these loss curves that can be analyzed precisely. We show how this can be applied to SGD on a simple power-law model. This is a simple two-hyperparameter family of optimization problems, which displays 4 distinct phases of loss curves; these phases are determined by the relative complexities of the target, data distribution, and whether these are ‘high-dimensional’ or not (which in context can be precisely defined). In each phase, we can also give, for a given compute budget, the optimal parameter dimensionality. Joint work with Elliot Paquette (McGill), Jeffrey Pennington (Google Deepmind), and Lechao Xiao (Google Deepmind).