EARN: Efficient Inference Acceleration for LLM-based Generative Recommendation by Register Tokens

研究背景

EARN

出发点

20250702111849

有趣的发现

20250702112145

Layer Order Token Position
NLP Layer$\uparrow$ $\longrightarrow$ Less Sparse head attention sink
LLMRec Layer$\uparrow$ $\longrightarrow$ Sparser head & tail attention sink

方法

20250702113603

参考文献

  1. Yang C., Lin X., Wang W., Li Y., Sun T., Han X. and Chua T. EARN: Efficient Inference Acceleration for LLM-based Generative Recommendation by Register Tokens. KDD, 2025. [PDF] [Code]
  2. Barbero F., Arroyo A., Gu X., Perivolaropoulos C., Bronstein M., Velickovic, P. and Pascanu R. Why Do LLMs Attend to the First Token? arXiv, 2025. [PDF] [Code]