Webaverse
/

Stable-Dreamfusion

stable-diffusion

dreamfusion

text2mesh

Model card Files Files and versions Community

ashawkey commited on Oct 6, 2022

Commit

0cfd813

1 Parent(s): a34832b

update readme

Browse files

Files changed (1) hide show

readme.md +26 -6

readme.md CHANGED Viewed

@@ -6,10 +6,14 @@ The original paper's project page: [_DreamFusion: Text-to-3D using 2D Diffusion_
 Examples generated from text prompts only:
 ### [Gallery](assets/gallery.md) | [Update Logs](assets/update_logs.md)
 # Important Notice
-This project is a **work-in-progress**, and contains lots of differences from the paper. Also, many features are still not implmented now. The current generation quality cannot match the results from the original paper, and still fail badly for many prompts.
 ## Notable differences from the paper
 * Since the Imagen model is not publicly available, we use [Stable Diffusion](https://github.com/CompVis/stable-diffusion) to replace it (implementation from [diffusers](https://github.com/huggingface/diffusers)). Different from Imagen, Stable-Diffusion is a latent diffusion model, which diffuses in a latent space instead of the original image space. Therefore, we need the loss to propagate back from the VAE's encoder part too, which introduces extra time cost in training. Currently, 15000 training steps take about 5 hours to train on a V100.
@@ -18,9 +22,8 @@ This project is a **work-in-progress**, and contains lots of differences from th
 ## TODOs
-* The shading part & normal evaluation.
-* Exporting colored mesh.
 # Install
@@ -29,7 +32,7 @@ git clone https://github.com/ashawkey/stable-dreamfusion.git
 cd stable-dreamfusion
 ```
-**Important**: To download the Stable Diffusion model checkpoint, you should create a file under this directory called `TOKEN` and copy your hugging face [access token](https://huggingface.co/docs/hub/security-tokens) into it.
 ### Install with pip
 ```bash
@@ -53,7 +56,7 @@ We also provide the `setup.py` to build each extension:
 bash scripts/install_ext.sh
 # if you want to install manually, here is an example:
-pip install ./raymarching # install to python path (you still need the raymarching/ folder, since this only install the built extension.)
 ```
 ### Tested environments
@@ -81,6 +84,23 @@ python main_nerf.py --text "a hamburger" --workspace trial_clip -O --guidance cl
 python main_nerf.py --text "a hamburger" --workspace trial_clip -O --test --gui --guidance clip
 ```
 # Acknowledgement
 * The amazing original work: [_DreamFusion: Text-to-3D using 2D Diffusion_](https://dreamfusion3d.github.io/).

 Examples generated from text prompts only:
+Exported meshes viewed with MeshLab:
 ### [Gallery](assets/gallery.md) | [Update Logs](assets/update_logs.md)
 # Important Notice
+This project is a **work-in-progress**, and contains lots of differences from the paper. Also, many features are still not implemented now. The current generation quality cannot match the results from the original paper, and still fail badly for many prompts.
 ## Notable differences from the paper
 * Since the Imagen model is not publicly available, we use [Stable Diffusion](https://github.com/CompVis/stable-diffusion) to replace it (implementation from [diffusers](https://github.com/huggingface/diffusers)). Different from Imagen, Stable-Diffusion is a latent diffusion model, which diffuses in a latent space instead of the original image space. Therefore, we need the loss to propagate back from the VAE's encoder part too, which introduces extra time cost in training. Currently, 15000 training steps take about 5 hours to train on a V100.
 ## TODOs
+* The normal evaluation & shading part.
+* Improve the surface quality.
 # Install
 cd stable-dreamfusion
 ```
+**Important**: To download the Stable Diffusion model checkpoint, you should create a file called `TOKEN` under this directory (i.e., `stable-dreamfusion/TOKEN`) and copy your hugging face [access token](https://huggingface.co/docs/hub/security-tokens) into it.
 ### Install with pip
 ```bash
 bash scripts/install_ext.sh
 # if you want to install manually, here is an example:
+pip install ./raymarching # install to python path (you still need the raymarching/ folder, since this only installs the built extension.)
 ```
 ### Tested environments
 python main_nerf.py --text "a hamburger" --workspace trial_clip -O --test --gui --guidance clip
 ```
+# Code organization
+* The key SDS loss is located at `./nerf/sd.py > StableDiffusion > train_step`:
+```python
+# 1. we need to interpolate the NeRF rendering to 512x512, to feed it to SD's VAE.
+pred_rgb_512 = F.interpolate(pred_rgb, (512, 512), mode='bilinear', align_corners=False)
+# 2. image (512x512) --- VAE --> latents (64x64), this is SD's difference from Imagen.
+latents = self.encode_imgs(pred_rgb_512)
+... # timestep sampling, noise adding and UNet noise predicting
+# 3. the SDS loss, since UNet part is ignored and cannot simply audodiff, we manually set the grad for latents.
+w = (1 - self.scheduler.alphas_cumprod[t]).to(self.device)
+grad = w * (noise_pred - noise)
+latents.backward(gradient=grad, retain_graph=True)
+```
+* Other regularizations are in `./nerf/utils.py > Trainer > train_step`.
+* NeRF Rendering core function: `./nerf/renderer.py > NeRFRenderer > run_cuda`.
 # Acknowledgement
 * The amazing original work: [_DreamFusion: Text-to-3D using 2D Diffusion_](https://dreamfusion3d.github.io/).