warp-ai
/

wuerstchen-prior-model-finetuned

Diffusers

Safetensors

Model card Files Files and versions Community

kashif HF staff commited on Sep 9, 2023

Commit

43dcb47

1 Parent(s): 701141f

Update README.md

Browse files

Files changed (1) hide show

README.md +10 -10

README.md CHANGED Viewed

@@ -4,22 +4,22 @@ license: mit
 <img src="https://cdn-uploads.huggingface.co/production/uploads/634cb5eefb80cc6bcaf63c3e/i-DYpDHw8Pwiy7QBKZVR5.jpeg" width=1500>
 ## Würstchen - Overview
-Würstchen is diffusion model, whose text-conditional model works in a highly compressed latent space of images. Why is this important? Compressing data can reduce
-computational costs for both training and inference by magnitudes. Training on 1024x1024 images, is way more expensive than training at 32x32. Usually, other works make
-use of a relatively small compression, in the range of 4x - 8x spatial compression. Würstchen takes this to an extreme. Through it's novel design, we achieve a 42x spatial
-compression. This was unseen before, because common methods fail to faithfully reconstruct detailed images after 16x spatial compression already. Würstchen employs a
-two-stage compression, what we call Stage A and Stage B. Stage A is a VQGAN and Stage B is a Diffusion Autoencoder (more details can be found in the [paper](https://arxiv.org/abs/2306.00637)).
-A third model, Stage C, is learnt in that highly compressed latent space. This training requires fractions of the compute used for current top-performing models, allowing
 also cheaper and faster inference.
 ## Würstchen - Prior
 The Prior is what we refer to as "Stage C". It is the text-conditional model, operating in the small latent space that Stage A and Stage B encode images into. During
-inference it's job is to generate the image latents given text. These image latents are then sent to Stage A & B to decode the latents into pixel space.
 ### Prior - Model - Finetuned
-This is the fully finetuned checkpoint. We recommend using the [interpolated model](https://huggingface.co/warp-ai/wuerstchen-prior-model-interpolated), as this checkpoint is overfit to being very
 artistic. However, if you are specifically looking for a very artistic checkpoint, go for this one. In the [documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wuerstchen)
-we also give a short overview for the different Prior (Stage C) checkpoints.
 **Note:** This model is only able to generate 1024x1024 images and shows repetitive patterns when sampling at different resolutions as the finetuning was only done on
 1024x1024. The [interpolated model](https://huggingface.co/warp-ai/wuerstchen-prior-model-interpolated) does not have this problem.
@@ -30,7 +30,7 @@ We also observed that the Prior (Stage C) adapts extremely fast to new resolutio
 <img src="https://cdn-uploads.huggingface.co/production/uploads/634cb5eefb80cc6bcaf63c3e/IfVsUDcP15OY-5wyLYKnQ.jpeg" width=1000>
 ## How to run
-This pipeline should be run together with https://huggingface.co/warp-diffusion/wuerstchen:
 ```py
 import torch

 <img src="https://cdn-uploads.huggingface.co/production/uploads/634cb5eefb80cc6bcaf63c3e/i-DYpDHw8Pwiy7QBKZVR5.jpeg" width=1500>
 ## Würstchen - Overview
+Würstchen is a diffusion model, whose text-conditional model works in a highly compressed latent space of images. Why is this important? Compressing data can reduce
+computational costs for both training and inference by magnitudes. Training on 1024x1024 images is way more expensive than training on 32x32. Usually, other works make
+use of a relatively small compression, in the range of 4x - 8x spatial compression. Würstchen takes this to an extreme. Through its novel design, we achieve a 42x spatial
+compression. This was unseen before because common methods fail to faithfully reconstruct detailed images after 16x spatial compression. Würstchen employs a
+two-stage compression, what we call Stage A and Stage B. Stage A is a VQGAN, and Stage B is a Diffusion Autoencoder (more details can be found in the [paper](https://arxiv.org/abs/2306.00637)).
+A third model, Stage C, is learned in that highly compressed latent space. This training requires fractions of the compute used for current top-performing models, allowing
 also cheaper and faster inference.
 ## Würstchen - Prior
 The Prior is what we refer to as "Stage C". It is the text-conditional model, operating in the small latent space that Stage A and Stage B encode images into. During
+inference, its job is to generate the image latents given text. These image latents are then sent to Stages A & B to decode the latents into pixel space.
 ### Prior - Model - Finetuned
+This is the fully finetuned checkpoint. We recommend using the [interpolated model](https://huggingface.co/warp-ai/wuerstchen-prior-model-interpolated), as this checkpoint is overfitted to being very
 artistic. However, if you are specifically looking for a very artistic checkpoint, go for this one. In the [documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wuerstchen)
+We also give a short overview of the different Prior (Stage C) checkpoints.
 **Note:** This model is only able to generate 1024x1024 images and shows repetitive patterns when sampling at different resolutions as the finetuning was only done on
 1024x1024. The [interpolated model](https://huggingface.co/warp-ai/wuerstchen-prior-model-interpolated) does not have this problem.
 <img src="https://cdn-uploads.huggingface.co/production/uploads/634cb5eefb80cc6bcaf63c3e/IfVsUDcP15OY-5wyLYKnQ.jpeg" width=1000>
 ## How to run
+This pipeline should be run together with https://huggingface.co/warp-ai/wuerstchen:
 ```py
 import torch