Fast Equilibrium for SGD

GT Transport optimal - EDP - Machine learning

13
nov. 2023

Intervenant :	Yi Wang
Institution :	Johns Hopkins
Heure :	15h15 - 16h15
Lieu :	2L8

In this seminar, we will discuss the fast equilibrium on stochastic gradient descent of neural networks. Under the assumptions that critical points are non-degenerate and the stochastic noise is a standard Gaussian, we prove an SGD with constant effective learning rate consists of three stages: descent, diffusion and tunneling, and explicitly identify temporary equilibrium states that can be observed within practical training time. This interprets the gap between the mixing time in the fast equilibrium conjecture and the previously known upper bound. We will then talk about generic assumptions and show that with probability close to 1, in exponential time trajectories will not escape the attracting basin containing their initial position.

Voir tous les événements