Scaling Law: Compute Optimal Curves on a Simple Model

June 07, 2024, 9:00 AM - 9:45 AM

Location:

DIMACS Center

Rutgers University

CoRE Building

96 Frelinghuysen Road

Piscataway, NJ 08854

Click here for map.

Courtney Paquette, McGill University

We describe a program of analysis of stochastic gradient methods on high dimensional random objectives.  We illustrate some assumptions under which the loss curves are universal, in that they can completely be described in terms of some underlying covariances.   Furthermore, we give description of these loss curves that can be analyzed precisely.   We show how this can be applied to SGD on a simple power-law model.  This is a simple two-hyperparameter family of optimization problems, which displays 4 distinct phases of loss curves; these phases are determined by the relative complexities of the target, data distribution, and whether these are ‘high-dimensional’ or not (which in context can be precisely defined).  In each phase, we can also give, for a given compute budget, the optimal parameter dimensionality. Joint work with Elliot Paquette (McGill), Jeffrey Pennington (Google Deepmind), and Lechao Xiao  (Google Deepmind).

 

[Video]   [Slides]