ICML
1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
Adam 预训练的 1-bit SGD 优化方法
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
GaLore, 低秩空间中的梯度投影以及权重更新
SWALP: Stochastic Weight Averaging in Low-Precision Training
SWALP, 通过 SWA 稳定低精度训练
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
探究 LLM 如何记忆和提取知识的实验性文章
Meta-Learning with Memory-Augmented Neural Networks
MANN, 外置记忆模块