2021
Temporal Cross-Effects in Knowledge Tracing
借鉴 Hawkes Process 思想建模知识掌握的交叉影响和衰减
1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
Adam 预训练的 1-bit SGD 优化方法
CPT: Efficient Deep Neural Network Training via Cyclic Precision
CPT, 类似 CosineAnnealingWarmRestarts 的 Precision 循环机制
Taming Transformers for High-Resolution Image Synthesis
VQGAN, 自回归式的图片生成