Special CMX Seminar
Online Event
A Mathematical Perspective on Transformers
Borjan Geshkovski,
Junior Researcher,
Laboratoire Jacques-Louis Lions at Sorbonne Université,
Inria,
This talk will report on several results, insights and perspectives Cyril Letrouit, Yury Polyanskiy, Philippe Rigollet and myself have found regarding Transformers. We model Transformers as interacting particle systems on the unit sphere (each particle representing a token, and time representing a layer), with a non-linear coupling called self-attention. On high-dimensional spheres, we prove that randomly initialized particles converge to a single cluster in long time. The result can be quantified by describing the phase transition between the clustering and non-clustering regime. The appearance of dynamic metastability will also be discussed.
For more information, please contact Jolene Brink by phone at (626) 395-2813 or by email at [email protected] or visit Zoom Link.
Event Series
CMX Special Seminar Series