Reaching centralized performance in decentralized bandits

Séminaire Probabilités et Statistiques

27
jan. 2022

Intervenant :	Etienne Boursier
Institution :	EPFL
Heure :	15h45 - 16h45

In sequential learning, data is acquired on the fly and an algorithm learns to behave as well as if it got in hindsight the state of nature, e.g. distributions of rewards. In many real life scenarios, learning agents are not alone and interact, or interfere, with many others. As a consequence, their decisions have an impact on the others and, by extension, on the generating process of rewards.
We aim at studying how sequential learning algorithms behave in strategic environments, when facing and interfering with each others. This talk focuses on multiplayer bandits, where learning agents might interfere with each other through collisions.

When agents are cooperative, the difficulty of the problem comes from its decentralized aspect, as the different agents take decisions solely based on their observations. In this case, we propose algorithms that not only coordinate the agents to avoid negative interference with each other, but also leverage these interferences to transfer information between the agents, thus reaching performances similar to centralized algorithms.

With competing agents, we propose algorithms with both satisfying performance (small regret) and strategic (e.g. epsilon-Nash equilibria) guarantees.

Voir tous les événements