Infinity $\infty$: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
π Note
This repo is used for hosting Infinity's checkpoints. For more details, please refer to
π Introduction
We present Infinity, a Bitwise Visual AutoRegressive Modeling capable of generating high-resolution and photorealistic images. Infinity redefines visual autoregressive model under a bitwise token prediction framework with an infinite-vocabulary tokenizer & classifier and bitwise self-correction. Theoretically scaling the tokenizer vocabulary size to infinity and concurrently scaling the transformer size, our method significantly unleashes powerful scaling capabilities. Infinity sets a new record for autoregressive text-to-image models, outperforming top-tier diffusion models like SD3-Medium and SDXL. Notably, Infinity surpasses SD3-Medium by improving the GenEval benchmark score from 0.62 to 0.73 and the ImageReward benchmark score from 0.87 to 0.96, achieving a win rate of 66%. Without extra optimization, Infinity generates a high-quality 1024Γ1024 image in 0.8 seconds, making it 2.6Γ faster than SD3-Medium and establishing it as the fastest text-to-image model.
π Infinity Model ZOO
We provide Infinity models for you to play with, which are on or can be downloaded from the following links:
Visual Tokenizer
vocabulary | stride | IN-256 rFID $\downarrow$ | IN-256 PSNR $\uparrow$ | IN-512 rFID $\downarrow$ | IN-512 PSNR $\uparrow$ | HF weightsπ€ |
---|---|---|---|---|---|---|
$V_d=2^{16}$ | 16 | 1.22 | 20.9 | 0.31 | 22.6 | infinity_vae_d16.pth |
$V_d=2^{24}$ | 16 | 0.75 | 22.0 | 0.30 | 23.5 | infinity_vae_d24.pth |
$V_d=2^{32}$ | 16 | 0.61 | 22.7 | 0.23 | 24.4 | infinity_vae_d32.pth |
$V_d=2^{64}$ | 16 | 0.33 | 24.9 | 0.15 | 26.4 | infinity_vae_d64.pth |
$V_d=2^{32}$ | 16 | 0.75 | 21.9 | 0.32 | 23.6 | infinity_vae_d32_reg.pth |
Infinity
model | Resolution | GenEval | DPG | HPSv2.1 | HF weightsπ€ |
---|---|---|---|---|---|
Infinity-2B | 1024 | 0.69 / 0.73 $^{\dagger}$ | 83.5 | 32.2 | infinity_2b_reg.pth |
Infinity-20B | 1024 | - | - | - | Coming Soon |
${\dagger}$ result is tested with a prompt rewriter.
You can load these models to generate images via the codes in interactive_infer.ipynb. Note: you need to download infinity_vae_d32reg.pth and flan-t5-xl first.
π Citation
If our work assists your research, feel free to give us a star β or cite us using:
@misc{han2024infinityscalingbitwiseautoregressive,
title={Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis},
author={Jian Han and Jinlai Liu and Yi Jiang and Bin Yan and Yuqi Zhang and Zehuan Yuan and Bingyue Peng and Xiaobing Liu},
year={2024},
eprint={2412.04431},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.04431},
}