Spaces:
Runtime error
Runtime error
Update app.py
Browse files
app.py
CHANGED
@@ -144,47 +144,6 @@ def latents_to_pil(latents):
|
|
144 |
|
145 |
"""We'll use a pic from the web here, but you can load your own instead by uploading it and editing the filename in the next cell."""
|
146 |
|
147 |
-
# Download a demo Image
|
148 |
-
!curl --output macaw.jpg 'https://lafeber.com/pet-birds/wp-content/uploads/2018/06/Scarlet-Macaw-2.jpg'
|
149 |
-
|
150 |
-
# Load the image with PIL
|
151 |
-
input_image = Image.open('macaw.jpg').resize((512, 512))
|
152 |
-
input_image
|
153 |
-
|
154 |
-
"""Encoding this into the latent space of the AE with the function defined above looks like this:"""
|
155 |
-
|
156 |
-
# Encode to the latent space
|
157 |
-
encoded = pil_to_latent(input_image)
|
158 |
-
encoded.shape
|
159 |
-
|
160 |
-
# Let's visualize the four channels of this latent representation:
|
161 |
-
fig, axs = plt.subplots(1, 4, figsize=(16, 4))
|
162 |
-
for c in range(4):
|
163 |
-
axs[c].imshow(encoded[0][c].cpu(), cmap='Greys')
|
164 |
-
|
165 |
-
"""This 4x64x64 tensor captures lots of information about the image, hopefully enough that when we feed it through the decoder we get back something very close to our input image:"""
|
166 |
-
|
167 |
-
# Decode this latent representation back into an image
|
168 |
-
decoded = latents_to_pil(encoded)[0]
|
169 |
-
decoded
|
170 |
-
|
171 |
-
"""You'll see some small differences if you squint! Forcus on the eye if you can't see anything obvious. This is pretty impressive - that 4x64x64 latent seems to hold a lot more information that a 64px image...
|
172 |
-
|
173 |
-
This autoencoder has been trained to squish down an image to a smaller representation and then re-create the image back from this compressed version again.
|
174 |
-
|
175 |
-
In this particular case the compression factor is 48, we start with a 3x512x512(chxhtxwd) image and it get compressed to a latent vector 4x64x64. Each 3x8x8 pixel volume in the input image gets compressed down to just 4 numbers(4x1x1). You can find AEs with a higher compression ratio (eg f16 like some popular VQGAN models) but at some point they begin to introduce artifacts that we don't want.
|
176 |
-
|
177 |
-
Why do we even use an autoencoder? We can do diffusion in pixel space - where the model gets all the image data as inputs and produces an output prediction of the same shape. But this means processing a LOT of data, and make high-resolution generation very computationally expensive. Some solutions to this involve doing diffusion at low resolution (64px for eg) and then training a separate model to upscale repeatedly (as with D2/Imagen). But latent diffusion instead does the diffusion process in this 'latent space', using the compressed representations from our AE rather than raw images. These representations are information rich, and can be small enough to handle manageably on consumer hardware. Once we've generated a new 'image' as a latent representation, the autoencoder can take those final latent outputs and turn them into actual pixels.
|
178 |
-
|
179 |
-
# The Scheduler
|
180 |
-
Now we need to talk about adding noise...
|
181 |
-
|
182 |
-
During training, we add some noise to an image an then have the model try to predict the noise. If we always added a ton of noise, the model might not have much to work with. If we only add a tiny amount, the model won't be able to do much with the random starting points we use for sampling. So during training the amount is varied, according to some distribution.
|
183 |
-
|
184 |
-
During sampling, we want to 'denoise' over a number of steps. How many steps and how much noise we should aim for at each step are going to affect the final result.
|
185 |
-
|
186 |
-
The scheduler is in charge of handling all of these details. For example: `scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)` sets up a scheduler that matches the one used to train this model. When we want to sample over a smaller number of steps, we set this up with `scheduler.set_timesteps`:
|
187 |
-
"""
|
188 |
|
189 |
# Setting the number of sampling steps:
|
190 |
set_timesteps(scheduler, 15)
|
|
|
144 |
|
145 |
"""We'll use a pic from the web here, but you can load your own instead by uploading it and editing the filename in the next cell."""
|
146 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
147 |
|
148 |
# Setting the number of sampling steps:
|
149 |
set_timesteps(scheduler, 15)
|