|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- poloclub/diffusiondb |
|
base_model: |
|
- PixArt-alpha/PixArt-Sigma-XL-2-1024-MS |
|
pipeline_tag: text-to-image |
|
library_name: diffusers |
|
--- |
|
# AMD Nitro Diffusion |
|
|
|
|
|
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6355aded9c72a7e742f341a4/AsUvS7acUDLZhKOMRSH37.jpeg) |
|
|
|
## Introduction |
|
AMD Nitro Diffusion is a series of efficient text-to-image generation models that are distilled from popular diffusion models on AMD Instinct™ GPUs. The release consists of: |
|
|
|
* [Stable Diffusion 2.1 Nitro](https://huggingface.co/amd/SD2.1-Nitro): a UNet-based one-step model distilled from [Stable Diffusion 2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1-base). |
|
* [PixArt-Sigma Nitro](https://huggingface.co/amd/PixArt-Sigma-Nitro): a high resolution transformer-based one-step model distilled from [PixArt-Sigma](https://pixart-alpha.github.io/PixArt-sigma-project/). |
|
|
|
⚡️ [Open-source code](https://github.com/AMD-AIG-AIMA/AMD-Diffusion-Distillation)! The models are based on our re-implementation of [Latent Adversarial Diffusion Distillation](https://arxiv.org/abs/2403.12015), the method used to build the popular Stable Diffusion 3 Turbo model. Since the original authors didn't provide training code, we release our re-implementation to help advance further research in the field. |
|
|
|
|
|
|
|
## Details |
|
|
|
* **Model architecture**: PixArt-Sigma Nitro has the same architecture as PixArt-Sigma and is compatible with the diffusers pipeline. |
|
* **Inference steps**: This model is distilled to perform inference in just a single step. However, the training code also supports distilling a model for 2, 4 or 8 steps. |
|
* **Hardware**: We use a single node consisting of 4 AMD Instinct™ MI250 GPUs for distilling PixArt-Sigma Nitro. |
|
* **Dataset**: We use 1M prompts from [DiffusionDB](https://huggingface.co/datasets/poloclub/diffusiondb) and generate the corresponding images from the base PixArt-Sigma model. |
|
* **Training cost**: The distillation process achieves reasonable results in less than 2 days on a single node. |
|
|
|
|
|
|
|
## Quickstart |
|
|
|
```python |
|
from diffusers import PixArtSigmaPipeline |
|
import torch |
|
from safetensors.torch import load_file |
|
|
|
pipe = PixArtSigmaPipeline.from_pretrained("PixArt-alpha/PixArt-Sigma-XL-2-1024-MS") |
|
|
|
ckpt_path = '<path to distilled checkpoint>' |
|
transformer_state_dict = load_file(ckpt_path) |
|
pipe.transformer.load_state_dict(transformer_state_dict) |
|
pipe = pipe.to("cuda") |
|
|
|
image = pipe(prompt='a photo of a cat', |
|
num_inference_steps=1, |
|
guidance_scale=0, |
|
timesteps=[400]).images[0] |
|
``` |
|
|
|
For more details on training and evaluation please visit the [GitHub repo](https://github.com/AMD-AIG-AIMA/AMD-Diffusion-Distillation). |
|
|
|
|
|
|
|
## Results |
|
|
|
|
|
Compared to [PixArt-Sigma](https://pixart-alpha.github.io/PixArt-sigma-project/), our model achieves a 90.9% reduction in FLOPs at the cost of just 3.7% lower CLIP score and 10.5% higher FID. |
|
|
|
| Model | FID ↓ | CLIP ↑ |FLOPs| Latency on AMD Instinct MI250 (sec) |
|
| :---: | :---: | :---: | :---: | :---: |
|
| PixArt-Sigma, 20 steps | 34.14 | 0.3289 |187.96 | 7.46 |
|
| **PixArt-Sigma Nitro**, 1 step | 37.75 | 0.3167|17.04 | 0.53 |
|
|
|
|
|
|
|
## License |
|
Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All Rights Reserved. |
|
Licensed under the Apache License, Version 2.0 (the "License"); |
|
you may not use this file except in compliance with the License. |
|
You may obtain a copy of the License at |
|
http://www.apache.org/licenses/LICENSE-2.0 |
|
Unless required by applicable law or agreed to in writing, software |
|
distributed under the License is distributed on an "AS IS" BASIS, |
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
|
See the License for the specific language governing permissions and |
|
limitations under the License. |