Safetensors
English
Infinity / README.md
hanjian.thu123
[update] README
064fdd1
|
raw
history blame
5.43 kB

Infinity $\infty$: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

demo platform  arXiv  arXiv  huggingface weights  code 

Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

πŸ“Œ Note

This repo is used for hosting Infinity's checkpoints. For more details, please refer to code 

πŸ“– Introduction

We present Infinity, a Bitwise Visual AutoRegressive Modeling capable of generating high-resolution and photorealistic images. Infinity redefines visual autoregressive model under a bitwise token prediction framework with an infinite-vocabulary tokenizer & classifier and bitwise self-correction. Theoretically scaling the tokenizer vocabulary size to infinity and concurrently scaling the transformer size, our method significantly unleashes powerful scaling capabilities. Infinity sets a new record for autoregressive text-to-image models, outperforming top-tier diffusion models like SD3-Medium and SDXL. Notably, Infinity surpasses SD3-Medium by improving the GenEval benchmark score from 0.62 to 0.73 and the ImageReward benchmark score from 0.87 to 0.96, achieving a win rate of 66%. Without extra optimization, Infinity generates a high-quality 1024Γ—1024 image in 0.8 seconds, making it 2.6Γ— faster than SD3-Medium and establishing it as the fastest text-to-image model.

πŸ“€ Infinity Model ZOO

We provide Infinity models for you to play with, which are on or can be downloaded from the following links:

Visual Tokenizer

vocabulary stride IN-256 rFID $\downarrow$ IN-256 PSNR $\uparrow$ IN-512 rFID $\downarrow$ IN-512 PSNR $\uparrow$ HF weightsπŸ€—
$V_d=2^{16}$ 16 1.22 20.9 0.31 22.6 infinity_vae_d16.pth
$V_d=2^{24}$ 16 0.75 22.0 0.30 23.5 infinity_vae_d24.pth
$V_d=2^{32}$ 16 0.61 22.7 0.23 24.4 infinity_vae_d32.pth
$V_d=2^{64}$ 16 0.33 24.9 0.15 26.4 infinity_vae_d64.pth
$V_d=2^{32}$ 16 0.75 21.9 0.32 23.6 infinity_vae_d32_reg.pth

Infinity

model Resolution GenEval DPG HPSv2.1 HF weightsπŸ€—
Infinity-2B 1024 0.69 / 0.73 $^{\dagger}$ 83.5 32.2 infinity_2b_reg.pth
Infinity-20B 1024 - - - Coming Soon

${\dagger}$ result is tested with a prompt rewriter.

You can load these models to generate images via the codes in interactive_infer.ipynb. Note: you need to download infinity_vae_d32reg.pth and flan-t5-xl first.

πŸ“– Citation

If our work assists your research, feel free to give us a star ⭐ or cite us using:

@misc{han2024infinityscalingbitwiseautoregressive,
    title={Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis}, 
    author={Jian Han and Jinlai Liu and Yi Jiang and Bin Yan and Yuqi Zhang and Zehuan Yuan and Bingyue Peng and Xiaobing Liu},
    year={2024},
    eprint={2412.04431},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2412.04431}, 
}