AYYasaswini commited on
Commit
9372e74
·
verified ·
1 Parent(s): 1b31565

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +0 -26
app.py CHANGED
@@ -793,38 +793,12 @@ for i, t in tqdm(enumerate(scheduler.timesteps), total=len(scheduler.timesteps))
793
 
794
  #"""The version on the right shows the predicted 'final output' (x0) at each step, and this is what is usually used for progress videos etc. The version on the left is the 'next step'. I found it interesteing to compare the two - watching the progress videos only you'd think drastic changes are happening expecially at early stages, but since the changes made per-step are relatively small the actual process is much more gradual.
795
 
796
- ### Classifier Free Guidance
797
 
798
- By default, the model doesn't often do what we ask. If we want it to follow the prompt better, we use a hack called CFG. There's a good explanation in this video (AI coffee break GLIDE).
799
-
800
- In the code, this comes down to us doing:
801
-
802
- `noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)`
803
-
804
- This works suprisingly well :) Explore changing the guidance_scale in the code above and see how this affects the results. How high can you push it before the results get worse?
805
-
806
- ## Sampling
807
-
808
- There is still some complexity hidden from us inside `latents = scheduler.step(noise_pred, i, latents)["prev_sample"]`. How exactly does the sampler go from the current noisy latents to a slightly less noisy version? Why don't we just use the model in a single step? Are there other ways to view this?
809
-
810
- The model tries to predict the noise in an image. For low noise values, we assume it does a pretty good job. For higher noise levels, it has a hard task! So instead of producing a perfect image, the results tend to look like a blurry mess - see the start of the video above for a visual! So, samplers use the model predictions to move a small amount towards the model prediction (removing some of the noise) and then get another prediction based on this marginally-less-rubbish input, and hope that this iteratively improves the result.
811
-
812
- Different samplers do this in different ways. You can try to inspect the code for the default LMS sampler with:
813
- """
814
-
815
- # ??scheduler.step
816
-
817
- """**Time to draw some diagrams!** (Whiteboard/paper interlude)
818
 
819
  # Guidance
820
 
821
 
822
- OK, final trick! How can we add some extra control to this generation process?
823
-
824
- At each step, we're going to use our model as before to predict the noise component of x. Then we'll use this to produce a predicted output image, and apply some loss function to this image.
825
 
826
- This function can be anything, but let's demo with a super simple example. If we want images that have a lot of blue, we can craft a loss function that gives a high loss if pixels have a low blue component:
827
- """
828
 
829
  def blue_loss(images):
830
  # How far are the blue channel values to 0.9:
 
793
 
794
  #"""The version on the right shows the predicted 'final output' (x0) at each step, and this is what is usually used for progress videos etc. The version on the left is the 'next step'. I found it interesteing to compare the two - watching the progress videos only you'd think drastic changes are happening expecially at early stages, but since the changes made per-step are relatively small the actual process is much more gradual.
795
 
 
796
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
797
 
798
  # Guidance
799
 
800
 
 
 
 
801
 
 
 
802
 
803
  def blue_loss(images):
804
  # How far are the blue channel values to 0.9: