May 7, 20251-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence SpeedAdam 预训练的 1-bit SGD 优化方法NoteLow-PrecisionQuantizationError CompensationOptimizerTheoreticalICML2021
May 7, 2025GaLore: Memory-Efficient LLM Training by Gradient Low-Rank ProjectionGaLore, 低秩空间中的梯度投影以及权重更新NoteLightweightLow-PrecisionOptimizerSVDTheoreticalICML2024
May 6, 2025SWALP: Stochastic Weight Averaging in Low-Precision TrainingSWALP, 通过 SWA 稳定低精度训练NoteLow-PrecisionFQTSWAEmpiricalICML2019
April 2, 2025Physics of Language Models: Part 3.1, Knowledge Storage and Extraction探究 LLM 如何记忆和提取知识的实验性文章NoteLLMKnowledgeSeminalEmpiricalICML2024
March 30, 2025Meta-Learning with Memory-Augmented Neural NetworksMANN, 外置记忆模块NoteMemorySeminalEmpiricalICML2016