Round and Round We Go! What makes Rotary Positional Encodings useful?

预备知识

感觉是相当不错的实验性文章, 看它的审稿意见, 审稿人主要质疑的点是模型过于局限于 Gemma, 而位置编码过于局限于 RoPE.

核心思想

RoPE

RoPE 的距离衰减

20250512203336

20250512210829

RoPE 的高低频

高频

20250512211237

20250512211548

低频

20250513155841

参考文献

  1. Barbero F., Vitvitskyi A., Perivolaropoulos C., Pascanu R., and Velickovic P. Round and Round We Go! What makes Rotary Positional Encodings useful? ICLR, 2025. [PDF] [Code]