EARN: Efficient Inference Acceleration for LLM-based Generative Recommendation by Register Tokens

预备知识

EARN

出发点

20250702111849

有趣的发现

20250702112145

Layer OrderToken Position
NLPLayer$\uparrow$ $\longrightarrow$ Less Sparsehead attention sink
LLMRecLayer$\uparrow$ $\longrightarrow$ Sparserhead & tail attention sink

方法

20250702113603

参考文献

  1. Yang C., Lin X., Wang W., Li Y., Sun T., Han X. and Chua T. EARN: Efficient Inference Acceleration for LLM-based Generative Recommendation by Register Tokens. KDD, 2025. [PDF] [Code]
  2. Barbero F., Arroyo A., Gu X., Perivolaropoulos C., Bronstein M., Velickovic, P. and Pascanu R. Why Do LLMs Attend to the First Token? arXiv, 2025. [PDF] [Code]