H.B. Keller Colloquium
Algorithms based on Regret Matching+ (RM+) and its variants are among the most popular approaches for solving large-scale games (such as poker) in practice. However, the theoretical understanding of their success is still a mystery --- unlike other algorithms such as optimistic gradient descent, which have been shown to enjoy stability, low regret, and fast convergence when employed by all players, very little is known about RM+ beyond its classical regret guarantee against a worst-case adversary.
In this talk, I will present our recent attempts towards a better understanding of RM+. Specifically, I will start by showing that, contrary to popular belief, RM+ and many of its variants can in fact produce unstable strategies that suffer large regret and diverge even on a simple 3 × 3 game. Then, I will discuss how a couple of simple fixes can help stabilize the algorithms and eventually lead to constant regret and last-iterate convergence. I will conclude with some experimental results to show the advantages of our proposed algorithms.
This talk is based on joint work with Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, and Weiqiang Zheng.