license: other
tags:
- stable-diffusion
- text-to-image
inference: false
Stable Diffusion
Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. This model card gives an overview of all available model checkpoints. For more in-detail model cards, please have a look at the model repositories listed under Model Access.
Stable Diffusion Version 1
For the first version 4 model checkpoints are released. Higher versions have been trained for longer and are thus usually better in terms of image generation quality then lower versions. More specifically:
- stable-diffusion-v1-1: The checkpoint is randomely initialized and has been trained on 237,000 steps at resolution
256x256
on laion2B-en. 194,000 steps at resolution512x512
on laion-high-resolution (170M examples from LAION-5B with resolution>= 1024x1024
). - stable-diffusion-v1-2 (https://huggingface.co/CompVis/stable-diffusion-v1-2): The checkpoint is resumed training from
stable-diffusion-v1-1
. 515,000 steps at resolution512x512
on "laion-improved-aesthetics" (a subset of laion2B-en, filtered to images with an original size>= 512x512
, estimated aesthetics score> 5.0
, and an estimated watermark probability< 0.5
. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an improved aesthetics estimator). - stable-diffusion-v1-3 (https://huggingface.co/CompVis/stable-diffusion-v1-3): The checkpoint is resumed training from
stable-diffusion-v1-2
. 195,000 steps at resolution512x512
on "laion-improved-aesthetics" and 10 % dropping of the text-conditioning to improve classifier-free guidance sampling - stable-diffusion-v1-4 (https://huggingface.co/CompVis/stable-diffusion-v1-4) The checkpoint is resumed training.
The model can be used both with 🤗's diffusers
library or the original Stable Diffusion GitHub repository.
Model access
Each checkpoint can be accessed as soon as having "click-requested" them on the respective model repositories.
For 🤗's diffusers
:
For Stable Diffusion GitHub repository:
stable-diffusion-v-1-1-original
stable-diffusion-v-1-2-original
stable-diffusion-v-1-3-original
stable-diffusion-v-1-4-original
Citation
@InProceedings{Rombach_2022_CVPR,
author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
title = {High-Resolution Image Synthesis With Latent Diffusion Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {10684-10695}
}
This model card was written by: Robin Rombach and Patrick Esser and is based on the DALL-E Mini model card.