GT Transport optimal - EDP - Machine learning
Fast Equilibrium for SGD
13
nov. 2023
nov. 2023
Intervenant : | Yi Wang |
Institution : | Johns Hopkins |
Heure : | 15h15 - 16h15 |
Lieu : | 2L8 |
In this seminar, we will discuss the fast equilibrium on stochastic gradient descent of neural networks. Under the assumptions that critical points are non-degenerate and the stochastic noise is a standard Gaussian, we prove an SGD with constant effective learning rate consists of three stages: descent, diffusion and tunneling, and explicitly identify temporary equilibrium states that can be observed within practical training time. This interprets the gap between the mixing time in the fast equilibrium conjecture and the previously known upper bound. We will then talk about generic assumptions and show that with probability close to 1, in exponential time trajectories will not escape the attracting basin containing their initial position.