Title: Attention

(highlight [1,3,7])

  1. Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).
  2. Kaplan, Jared, et al. "Scaling laws for neural language models." arXiv preprint arXiv:2001.08361 (2020).
  3. Geshkovski, Borjan, et al. "A mathematical perspective on transformers." Bulletin of the American Mathematical Society 62.3 (2025): 427-479.
  4. Geshkovski, Borjan, et al. "Dynamic metastability in the self-attention model." arXiv preprint arXiv:2410.06833 (2024).
  5. Lu, Yiping, et al. "Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View." ICLR 2020
  6. Vuckovic, James, Aristide Baratin, and Remi Tachet des Combes. "A mathematical theory of attention." arXiv preprint arXiv:2007.02876 (2020).
  7. Kozachkov, Leo, Ksenia V. Kastanenka, and Dmitry Krotov. "Building transformers from neurons and astrocytes." Procxeedings of the National Academy of Sciences 120.34 (2023): e2219150120.
  8. Mancas, Matei, et al. From Human Attention to Computational Attention. Vol. 2. New York, NY, USA: Springer, 2016.