Scaling Laws

预备知识

20250926144709


proof:


核心思想

20250926174553

20250926160052

20250926194426

(L-N-D) Scaling Law 用于 Overfitting 判断

20250926160829

(L-N-D) Scaling Law 用于早停判断

20250926165417

参考文献

  1. Hestness J., Narang S., Ardalani N., Diamos G., Jun H., Kianinejad H., Patwary M. M. A., Yang Y. and Zhou Y. Deep Learning Scaling is Predictable, Empirically. arXiv, 2017. [PDF] [Code]
  2. McCandlish S., Kaplan J., Dario A., and OpenAI Dota Team. An Empirical Model of Large-Batch Training. arXiv, 2018. [PDF] [Code]
  3. Kaplan J., McCandlish S., Henighan T., Brown T. B., Chess B., Child R., Gray S., Radford A., Wu J. and Amodei D. Scaling Laws for Neural Language Models. arXiv, 2020. [PDF] [Code]
  4. Hoffmann J., Borgeaud S., Mensch A., Buchatskaya E., Cai T., Rutherford E., de Las Casas D., Hendricks L. A., Welbl J., Clark A., Hennigan T., Noland E., Millican K., van den Driessche G., Damoc B., Guy A., Osindero S., Simonyan K., Elsen E., Rae J. W., Vinyals O. and Sifre L. Training Compute-Optimal Large Language Models. arXiv, 2022. [PDF] [Code]