PixelFlow: Pixel-Space Generative Models with Flow
Shoufa Chen, Chongjian Ge, Shilong Zhang, Peize Sun, Ping Luo
The University of Hong Kong, Adobe
Introduction
We present PixelFlow, a family of image generation models that operate directly in the raw pixel space, in contrast to the predominant latent-space models. This approach simplifies the image generation process by eliminating the need for a pre-trained Variational Autoencoder (VAE) and enabling the whole model end-to-end trainable. Through efficient cascade flow modeling, PixelFlow achieves affordable computation cost in pixel space. It achieves an FID of 1.98 on 256x256 ImageNet class-conditional image generation benchmark. The qualitative text-to-image results demonstrate that PixelFlow excels in image quality, artistry, and semantic control. We hope this new paradigm will inspire and open up new opportunities for next-generation visual generation models.
Model Zoo
Model | Task | Params | FID | Checkpoint |
---|---|---|---|---|
PixelFlow | class-to-image | 677M | 1.98 | π€ |
PixelFlow | text-to-image | 882M | N/A | π€ |
Setup
1. Create Environment
conda create -n pixelflow python=3.12
conda activate pixelflow
2. Install Dependencies:
- PyTorch 2.6.0 β install it according to your system configuration (CUDA version, etc.).
- flash-attention v2.7.4.post1: optional, required only for training.
- Other packages:
pip3 install -r requirements.txt
Demo
We provide an online Gradio demo for class-to-image generation.
You can also easily deploy both class-to-image and text-to-image demos locally by:
python app.py --checkpoint /path/to/checkpoint --class_cond # for class-to-image
or
python app.py --checkpoint /path/to/checkpoint # for text-to-image
Training
1. ImageNet Preparation
- Download the ImageNet dataset from http://www.image-net.org/.
- Use the extract_ILSVRC.sh to extract and organize the training and validation images into labeled subfolders.
2. Training Command
torchrun --nnodes=1 --nproc_per_node=8 train.py configs/pixelflow_xl_c2i.yaml
Evaluation (FID, Inception Score, etc.)
We provide a sample_ddp.py script, adapted from DiT, for generating sample images and saving them both as a folder and as a .npz file. The .npz file is compatible with ADM's TensorFlow evaluation suite, allowing direct computation of FID, Inception Score, and other metrics.
torchrun --nnodes=1 --nproc_per_node=8 sample_ddp.py --pretrained /path/to/checkpoint
BibTeX
@article{chen2025pixelflow,
title={PixelFlow: Pixel-Space Generative Models with Flow},
author={Chen, Shoufa and Ge, Chongjian and Zhang, Shilong and Sun, Peize and Luo, Ping},
journal={arXiv preprint arXiv:2504.07963},
year={2025}
}
- Downloads last month
- 13