Séminaire Probabilités et Statistiques
Separation of non-ergodic uniform convergence rates for regularized learning in games
05
fév. 2026
logo_team
Intervenant : Julien Grand-Clément
Institution : HEC
Heure : 14h00 - 15h00
Lieu : 3L15

Self-play via online learning is a leading paradigm for solving large-scale games and has enabled recent superhuman performance (e.g., Go, Poker). This work clarifies that different convergence notions in self-play (last iterate, best iterate, and a randomly sampled iterate) can behave fundamentally differently. For a broad class of learning dynamics, including Optimistic Multiplicative Weights Update (OMWU), we prove a separation: even in two-player zero-sum games, last-iterate convergence can be arbitrarily slow, random-iterate convergence can be slower than any polynomial, while best-iterate convergence is polynomial. This departs from much prior theory where these notions align, and we attribute the gap to OMWU’s insufficient “forgetfulness,” linking it to empirical behavior in practical game solving. 

based on https://arxiv.org/pdf/2503.02825 and https://arxiv.org/pdf/2406.10631

 

 

Voir tous les événements