|15h45 - 16h45
The training of neural networks with first order methods still remains misunderstood in theory, despite compelling empirical evidence. Not only it is believed that neural networks converge towards global minimisers, but the implicit bias of optimisation algorithms makes them converge towards specific minimisers with nice generalisation properties. This talk focuses on the early alignment phase that appears in the training dynamics of two layer networks with small initialisations. During this early alignment phase, the numerous neurons align towards a few number of key directions, hence leading to some sparsity in the number of represented neurons. Although we believe this phenomenon to be key in the implicit bias of gradient methods, it also has some serious drawbacks, e.g., being at the origin of convergence towards spurious local minima of the network parameters.