PixelFlow: Pixel-Space Generative Models with Flow

arXiv GitHub demo 

pixelflow

PixelFlow: Pixel-Space Generative Models with Flow
Shoufa Chen, Chongjian Ge, Shilong Zhang, Peize Sun, Ping Luo
The University of Hong Kong, Adobe

Introduction

We present PixelFlow, a family of image generation models that operate directly in the raw pixel space, in contrast to the predominant latent-space models. This approach simplifies the image generation process by eliminating the need for a pre-trained Variational Autoencoder (VAE) and enabling the whole model end-to-end trainable. Through efficient cascade flow modeling, PixelFlow achieves affordable computation cost in pixel space. It achieves an FID of 1.98 on 256x256 ImageNet class-conditional image generation benchmark. The qualitative text-to-image results demonstrate that PixelFlow excels in image quality, artistry, and semantic control. We hope this new paradigm will inspire and open up new opportunities for next-generation visual generation models.

Model Zoo

Model Task Params FID Checkpoint
PixelFlow class-to-image 677M 1.98 πŸ€—
PixelFlow text-to-image 882M N/A πŸ€—

Setup

1. Create Environment

conda create -n pixelflow python=3.12
conda activate pixelflow

2. Install Dependencies:

  • PyTorch 2.6.0 β€” install it according to your system configuration (CUDA version, etc.).
  • flash-attention v2.7.4.post1: optional, required only for training.
  • Other packages: pip3 install -r requirements.txt

Demo demo

We provide an online Gradio demo for class-to-image generation.

You can also easily deploy both class-to-image and text-to-image demos locally by:

python app.py --checkpoint /path/to/checkpoint --class_cond  # for class-to-image

or

python app.py --checkpoint /path/to/checkpoint  # for text-to-image

Training

1. ImageNet Preparation

2. Training Command

torchrun --nnodes=1 --nproc_per_node=8 train.py configs/pixelflow_xl_c2i.yaml

Evaluation (FID, Inception Score, etc.)

We provide a sample_ddp.py script, adapted from DiT, for generating sample images and saving them both as a folder and as a .npz file. The .npz file is compatible with ADM's TensorFlow evaluation suite, allowing direct computation of FID, Inception Score, and other metrics.

torchrun --nnodes=1 --nproc_per_node=8 sample_ddp.py --pretrained /path/to/checkpoint

BibTeX

@article{chen2025pixelflow,
  title={PixelFlow: Pixel-Space Generative Models with Flow},
  author={Chen, Shoufa and Ge, Chongjian and Zhang, Shilong and Sun, Peize and Luo, Ping},
  journal={arXiv preprint arXiv:2504.07963},
  year={2025}
}
Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train ShoufaChen/PixelFlow-Class2Image

Spaces using ShoufaChen/PixelFlow-Class2Image 2