DIMACS :: Details

Analyzing Optimization in Deep Learning via Trajectories

April 03, 2019, 11:00 AM - 12:00 PM

Location:

Conference Room 301

Rutgers University

CoRE Building

96 Frelinghuysen Road

Piscataway, NJ 08854

Cohen Nadav, Institute for Advanced Study

Abstract

The prominent approach for analyzing optimization in deep learning is based on the geometry of loss landscapes. While this approach has led to successful treatments of shallow (two layer) networks, it suffers from inherent limitations when facing deep (three or more layer) models. In this talk I will argue that a more refined perspective is in order, one that accounts for the specific trajectories taken by the optimizer. I will then demonstrate a manifestation of the latter approach, by analyzing the trajectories of gradient descent over arbitrarily deep linear neural networks. We will derive what is, to the best of my knowledge, the most general guarantee to date for efficient convergence to global minimum of a gradient-based algorithm training a deep network. Moreover, in stark contrast to conventional wisdom, we will see that sometimes, gradient descent can train a deep linear network faster than a classic linear model. In other words, depth can accelerate optimization, even without any gain in expressiveness, and despite introducing non-convexity to a formerly convex problem.