Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

预备知识

核心思想

20250721143428

Next-Scale Prediction

20250721145423

Multi-Scale Quantization

20250721145726

$\mathcal{E}$: Encoder; $\mathcal{Q}$: quantizer (返回离散编码); $\phi_k$ 额外的卷积层.

最后, 通过 VAR 得到的不同 scale 的预测结果, 通过 Decoder 可以’恢复’出图像.

参考文献

  1. Tian K., Jiang Y., Yuan Z., Peng B. and Wang L. Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction. NeurIPS, 2024. [PDF] [Code]