MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

预备知识

核心思想

20250715133503

20250715134359

参考文献

  1. Yu L., Simig D., Flaherty C., Aghajanyan A., Zettlemoyer L. and Lewis M. MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers. NeurIPS, 2023. [PDF] [Code]