DIMACS :: Details

Learning Over-Parameterized Models with Gradient Descent: An Average-Case Analysis over Quadratic Loss Functions

August 14, 2018, 11:00 AM - 11:30 AM

Location:

Iacocca Hall

Lehigh University

Bethlehem PA

Click here for map.

Hossein Mobahi, Google

Abstract

With recent advancements in distributed computing infrastructure, the era of giga-dimensional optimization is now upon us. State-of-the-art models for high-dimensional problems are typically over-parameterized. Over-parameterization of a model often makes the associated loss function have small curvature in many directions. In such cases, the condition number of the Hessian matrix can be very large. Based on classic worst-case analysis of gradient descent, poor conditioning should have an adverse effect on the training time of a model. In practice, however, we observe precisely the opposite behavior, namely that training seems to proceed faster for such models. We suggest that a possible resolution may come from performing an average-case rather than a worst-case analysis. For concreteness and tractability, we focus the analysis on quadratic loss surfaces and establish an upper bound on the iteration complexity via average-case analysis. The result indicates that when most of the eigenvalues values of the Hessian are small, training becomes faster. This prediction is confirmed by our experiments on synthetic problems as well as preliminary experiments on deep networks applied to real data.