ICML

August 25, 2025

GRAND: Graph Neural Diffusion

May 7, 2025

1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed

Adam 预训练的 1-bit SGD 优化方法

May 7, 2025

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

GaLore, 低秩空间中的梯度投影以及权重更新

May 6, 2025

SWALP: Stochastic Weight Averaging in Low-Precision Training

SWALP, 通过 SWA 稳定低精度训练

April 2, 2025

Physics of Language Models: Part 3.1, Knowledge Storage and Extraction

探究 LLM 如何记忆和提取知识的实验性文章

March 30, 2025

Meta-Learning with Memory-Augmented Neural Networks

MANN, 外置记忆模块