Taming Transformers for High-Resolution Image Synthesis

预备知识

核心思想

20250311144000

Part1: 离散编码

To do so, we propose VQGAN, a variant of the original VQVAE, and use a discriminator and perceptual loss to keep good perceptual quality at increased compression rate.

Part2: Transformer 生成

参考文献

  1. Esser P., Rombach R. and Ommer B. Taming Transformers for High-Resolution Image Synthesis. CVPR, 2021. [PDF] [Code]