May 13, 2025Base of RoPE Bounds Context Length讨论 RoPE base 对于相似 Tokens 感知能力的影响NoteLLMPositional EncodingTheoreticalNeurIPS2024
May 12, 2025Round and Round We Go! What makes Rotary Positional Encodings useful?理解 RoPE 的高低频NoteLLMPositional EncodingTheoreticalICLR2025
May 11, 2025Transformers need glasses! Information Over-Squashing in Language TasksLLM Representational CollapseNoteLLMRepresentational CollapseOver-SquashingEmpiricalNeurIPS2024
April 5, 2025Language Representations Can be What Recommenders Need: Findings and PotentialsNext-token embedding 之于协同过滤NoteCollaborative FilteringLLMUniversal EmbeddingEmpiricalICLR2025
April 2, 2025Physics of Language Models: Part 3.1, Knowledge Storage and Extraction探究 LLM 如何记忆和提取知识的实验性文章NoteLLMKnowledgeSeminalEmpiricalICML2024