Spaces:
Runtime error
Runtime error
Update app.py
Browse files
app.py
CHANGED
@@ -793,38 +793,12 @@ for i, t in tqdm(enumerate(scheduler.timesteps), total=len(scheduler.timesteps))
|
|
793 |
|
794 |
#"""The version on the right shows the predicted 'final output' (x0) at each step, and this is what is usually used for progress videos etc. The version on the left is the 'next step'. I found it interesteing to compare the two - watching the progress videos only you'd think drastic changes are happening expecially at early stages, but since the changes made per-step are relatively small the actual process is much more gradual.
|
795 |
|
796 |
-
### Classifier Free Guidance
|
797 |
|
798 |
-
By default, the model doesn't often do what we ask. If we want it to follow the prompt better, we use a hack called CFG. There's a good explanation in this video (AI coffee break GLIDE).
|
799 |
-
|
800 |
-
In the code, this comes down to us doing:
|
801 |
-
|
802 |
-
`noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)`
|
803 |
-
|
804 |
-
This works suprisingly well :) Explore changing the guidance_scale in the code above and see how this affects the results. How high can you push it before the results get worse?
|
805 |
-
|
806 |
-
## Sampling
|
807 |
-
|
808 |
-
There is still some complexity hidden from us inside `latents = scheduler.step(noise_pred, i, latents)["prev_sample"]`. How exactly does the sampler go from the current noisy latents to a slightly less noisy version? Why don't we just use the model in a single step? Are there other ways to view this?
|
809 |
-
|
810 |
-
The model tries to predict the noise in an image. For low noise values, we assume it does a pretty good job. For higher noise levels, it has a hard task! So instead of producing a perfect image, the results tend to look like a blurry mess - see the start of the video above for a visual! So, samplers use the model predictions to move a small amount towards the model prediction (removing some of the noise) and then get another prediction based on this marginally-less-rubbish input, and hope that this iteratively improves the result.
|
811 |
-
|
812 |
-
Different samplers do this in different ways. You can try to inspect the code for the default LMS sampler with:
|
813 |
-
"""
|
814 |
-
|
815 |
-
# ??scheduler.step
|
816 |
-
|
817 |
-
"""**Time to draw some diagrams!** (Whiteboard/paper interlude)
|
818 |
|
819 |
# Guidance
|
820 |
|
821 |
|
822 |
-
OK, final trick! How can we add some extra control to this generation process?
|
823 |
-
|
824 |
-
At each step, we're going to use our model as before to predict the noise component of x. Then we'll use this to produce a predicted output image, and apply some loss function to this image.
|
825 |
|
826 |
-
This function can be anything, but let's demo with a super simple example. If we want images that have a lot of blue, we can craft a loss function that gives a high loss if pixels have a low blue component:
|
827 |
-
"""
|
828 |
|
829 |
def blue_loss(images):
|
830 |
# How far are the blue channel values to 0.9:
|
|
|
793 |
|
794 |
#"""The version on the right shows the predicted 'final output' (x0) at each step, and this is what is usually used for progress videos etc. The version on the left is the 'next step'. I found it interesteing to compare the two - watching the progress videos only you'd think drastic changes are happening expecially at early stages, but since the changes made per-step are relatively small the actual process is much more gradual.
|
795 |
|
|
|
796 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
797 |
|
798 |
# Guidance
|
799 |
|
800 |
|
|
|
|
|
|
|
801 |
|
|
|
|
|
802 |
|
803 |
def blue_loss(images):
|
804 |
# How far are the blue channel values to 0.9:
|