|
--- |
|
license: other |
|
tags: |
|
- stable-diffusion |
|
- text-to-image |
|
inference: false |
|
--- |
|
# Stable Diffusion |
|
|
|
Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. |
|
This model card gives an overview of all available model checkpoints. For more in-detail model cards, please have a look at the model repositories listed under [Model Access](#model-access). |
|
|
|
** Stable Diffusion V1** |
|
|
|
In its first version, 4 model checkpoints are released: **stable-diffusion-v1-1**, **stable-diffusion-v1-2**, **stable-diffusion-v1-3** and **stable-diffusion-v1-4**. |
|
*Higher* versions have been trained for longer and are thus usually better in terms of image generation quality then *lower* versions. More specifically: |
|
|
|
- **stable-diffusion-v1-1**: The checkpoint is randomely initialized and has been trained on 237,000 steps at resolution `256x256` on [laion2B-en](https://huggingface.co/datasets/laion/laion2B-en). |
|
194,000 steps at resolution `512x512` on [laion-high-resolution](https://huggingface.co/datasets/laion/laion-high-resolution) (170M examples from LAION-5B with resolution `>= 1024x1024`). |
|
- **stable-diffusion-v1-2** (https://huggingface.co/CompVis/stable-diffusion-v1-2): The checkpoint is resumed training from `stable-diffusion-v1-1`. |
|
515,000 steps at resolution `512x512` on "laion-improved-aesthetics" (a subset of laion2B-en, |
|
filtered to images with an original size `>= 512x512`, estimated aesthetics score `> 5.0`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an [improved aesthetics estimator](https://github.com/christophschuhmann/improved-aesthetic-predictor)). |
|
- **stable-diffusion-v1-3** (https://huggingface.co/CompVis/stable-diffusion-v1-3): The checkpoint is resumed training from `stable-diffusion-v1-2`. 195,000 steps at resolution `512x512` on "laion-improved-aesthetics" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598) |
|
- **stable-diffusion-v1-4** (https://huggingface.co/CompVis/stable-diffusion-v1-4) The checkpoint is resumed training. |
|
|
|
The model can be used both with [🤗's `diffusers` library](https://github.com/huggingface/diffusers) or the original [Stable Diffusion GitHub repository](https://github.com/CompVis/stable-diffusion). |
|
|
|
## Model access |
|
|
|
Each checkpoint can be accessed as soon as having *"click-requested"* them on the respective model repositories. |
|
|
|
**For [🤗's `diffusers`](https://github.com/huggingface/diffusers)**: |
|
|
|
- [`stable-diffusion-v1-1`](https://huggingface.co/CompVis/stable-diffusion-v1-1) |
|
- [`stable-diffusion-v1-2`](https://huggingface.co/CompVis/stable-diffusion-v1-2) |
|
- [`stable-diffusion-v1-3`](https://huggingface.co/CompVis/stable-diffusion-v1-3) |
|
- [`stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4) |
|
|
|
**For with the original [Stable Diffusion GitHub repository](https://github.com/CompVis/stable-diffusion)**: |
|
|
|
- [`stable-diffusion-v-1-1-original`](https://huggingface.co/CompVis/stable-diffusion-v-1-1-original) |
|
- [`stable-diffusion-v-1-2-original`](https://huggingface.co/CompVis/stable-diffusion-v-1-2-original) |
|
- [`stable-diffusion-v-1-3-original`](https://huggingface.co/CompVis/stable-diffusion-v-1-3-original) |
|
- [`stable-diffusion-v-1-4-original`](https://huggingface.co/CompVis/stable-diffusion-v-1-4-original) |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@InProceedings{Rombach_2022_CVPR, |
|
author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn}, |
|
title = {High-Resolution Image Synthesis With Latent Diffusion Models}, |
|
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, |
|
month = {June}, |
|
year = {2022}, |
|
pages = {10684-10695} |
|
} |
|
``` |
|
|
|
*This model card was written by: Robin Rombach and Patrick Esser and is based on the [DALL-E Mini model card](https://huggingface.co/dalle-mini/dalle-mini).* |
|
|