Network Quantization

网络量化主要分为三个主流方向:
- FQT: Fully Quantized Quantization, 在训练时将权重, 梯度, 激活值以低精度表示.
- QAT: Quantization-Aware Training, 通过一些训练, 使得推理量化更容易
- PTQ: Post-Training Quantization, 推理量化, 无需反复训练

2025

Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities

Survey, 调研了一系列低精度训练方法

2023

Training Transformers with 4-bit Integers

4-bit FQT, 针对 transformers 4-bit 量化提出了一系列解决方法

4-bit PTQ, 通过 Weight, Activation 的互补 Scaling 解决激活值的异常值

2021

4-bit QAT, 通过 Logarithmic Unbiased Quantization 来更好地适应对数形状的梯度分布

Cyclic Precision Training, 类似 CosineAnnealingWarmRestarts 的 Precision 循环机制

2020

Ultra-Low Precision 4-bit Training of Deep Neural Networks

4-bit FQT, 提出一种特殊的 FP4 format 以及 Grad Scale 机制来实现 4-bit 量化

2019

FQT, 通过 Stochastic Weight Averaging 稳定低精度训练

Backprop with Approximate Activations for Memory-efficient Network Training

针对使用 BatchNorm, ReLU 特殊的反向传播机制

QAT, 通过首特征值确定 block 所需的 bitwidth, 并通过 multi-state fine-tuning 恢复精度

2018

Training Deep Neural Networks with 8-bit Floating Point Numbers

8-bit FQT, 特殊的 FP8 格式 + Chunk-based accumulation + stochastic rounding

Range Batch Normalization

8-bit FQT, 改进低精度训练中 BN 的数值不稳定的问题

2017

Mixed Precision Training

FQT, FP16 混合精度训练的开山之作

QAT, 一步一步地量化网络

2015

Deep Learning with Limited Numerical Precision

深度学习中初步尝试低精度训练, 提出了很多 tricks