Spaces:
Running
on
Zero
Running
on
Zero
<!--Copyright 2024 Marigold authors and The HuggingFace Team. All rights reserved. | |
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations under the License. | |
--> | |
# Marigold Pipelines for Computer Vision Tasks | |
[Marigold](../api/pipelines/marigold) is a novel diffusion-based dense prediction approach, and a set of pipelines for various computer vision tasks, such as monocular depth estimation. | |
This guide will show you how to use Marigold to obtain fast and high-quality predictions for images and videos. | |
Each pipeline supports one Computer Vision task, which takes an input RGB image as input and produces a *prediction* of the modality of interest, such as a depth map of the input image. | |
Currently, the following tasks are implemented: | |
| Pipeline | Predicted Modalities | Demos | | |
|---------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------:| | |
| [MarigoldDepthPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/marigold/pipeline_marigold_depth.py) | [Depth](https://en.wikipedia.org/wiki/Depth_map), [Disparity](https://en.wikipedia.org/wiki/Binocular_disparity) | [Fast Demo (LCM)](https://huggingface.co/spaces/prs-eth/marigold-lcm), [Slow Original Demo (DDIM)](https://huggingface.co/spaces/prs-eth/marigold) | | |
| [MarigoldNormalsPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/marigold/pipeline_marigold_normals.py) | [Surface normals](https://en.wikipedia.org/wiki/Normal_mapping) | [Fast Demo (LCM)](https://huggingface.co/spaces/prs-eth/marigold-normals-lcm) | | |
The original checkpoints can be found under the [PRS-ETH](https://huggingface.co/prs-eth/) Hugging Face organization. | |
These checkpoints are meant to work with diffusers pipelines and the [original codebase](https://github.com/prs-eth/marigold). | |
The original code can also be used to train new checkpoints. | |
| Checkpoint | Modality | Comment | | |
|-----------------------------------------------------------------------------------------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | |
| [prs-eth/marigold-v1-0](https://huggingface.co/prs-eth/marigold-v1-0) | Depth | The first Marigold Depth checkpoint, which predicts *affine-invariant depth* maps. The performance of this checkpoint in benchmarks was studied in the original [paper](https://huggingface.co/papers/2312.02145). Designed to be used with the `DDIMScheduler` at inference, it requires at least 10 steps to get reliable predictions. Affine-invariant depth prediction has a range of values in each pixel between 0 (near plane) and 1 (far plane); both planes are chosen by the model as part of the inference process. See the `MarigoldImageProcessor` reference for visualization utilities. | | |
| [prs-eth/marigold-depth-lcm-v1-0](https://huggingface.co/prs-eth/marigold-depth-lcm-v1-0) | Depth | The fast Marigold Depth checkpoint, fine-tuned from `prs-eth/marigold-v1-0`. Designed to be used with the `LCMScheduler` at inference, it requires as little as 1 step to get reliable predictions. The prediction reliability saturates at 4 steps and declines after that. | | |
| [prs-eth/marigold-normals-v0-1](https://huggingface.co/prs-eth/marigold-normals-v0-1) | Normals | A preview checkpoint for the Marigold Normals pipeline. Designed to be used with the `DDIMScheduler` at inference, it requires at least 10 steps to get reliable predictions. The surface normals predictions are unit-length 3D vectors with values in the range from -1 to 1. *This checkpoint will be phased out after the release of `v1-0` version.* | | |
| [prs-eth/marigold-normals-lcm-v0-1](https://huggingface.co/prs-eth/marigold-normals-lcm-v0-1) | Normals | The fast Marigold Normals checkpoint, fine-tuned from `prs-eth/marigold-normals-v0-1`. Designed to be used with the `LCMScheduler` at inference, it requires as little as 1 step to get reliable predictions. The prediction reliability saturates at 4 steps and declines after that. *This checkpoint will be phased out after the release of `v1-0` version.* | | |
The examples below are mostly given for depth prediction, but they can be universally applied with other supported modalities. | |
We showcase the predictions using the same input image of Albert Einstein generated by Midjourney. | |
This makes it easier to compare visualizations of the predictions across various modalities and checkpoints. | |
<div class="flex gap-4" style="justify-content: center; width: 100%;"> | |
<div style="flex: 1 1 50%; max-width: 50%;"> | |
<img class="rounded-xl" src="https://marigoldmonodepth.github.io/images/einstein.jpg"/> | |
<figcaption class="mt-1 text-center text-sm text-gray-500"> | |
Example input image for all Marigold pipelines | |
</figcaption> | |
</div> | |
</div> | |
### Depth Prediction Quick Start | |
To get the first depth prediction, load `prs-eth/marigold-depth-lcm-v1-0` checkpoint into `MarigoldDepthPipeline` pipeline, put the image through the pipeline, and save the predictions: | |
```python | |
import diffusers | |
import torch | |
pipe = diffusers.MarigoldDepthPipeline.from_pretrained( | |
"prs-eth/marigold-depth-lcm-v1-0", variant="fp16", torch_dtype=torch.float16 | |
).to("cuda") | |
image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg") | |
depth = pipe(image) | |
vis = pipe.image_processor.visualize_depth(depth.prediction) | |
vis[0].save("einstein_depth.png") | |
depth_16bit = pipe.image_processor.export_depth_to_16bit_png(depth.prediction) | |
depth_16bit[0].save("einstein_depth_16bit.png") | |
``` | |
The visualization function for depth [`~pipelines.marigold.marigold_image_processing.MarigoldImageProcessor.visualize_depth`] applies one of [matplotlib's colormaps](https://matplotlib.org/stable/users/explain/colors/colormaps.html) (`Spectral` by default) to map the predicted pixel values from a single-channel `[0, 1]` depth range into an RGB image. | |
With the `Spectral` colormap, pixels with near depth are painted red, and far pixels are assigned blue color. | |
The 16-bit PNG file stores the single channel values mapped linearly from the `[0, 1]` range into `[0, 65535]`. | |
Below are the raw and the visualized predictions; as can be seen, dark areas (mustache) are easier to distinguish in the visualization: | |
<div class="flex gap-4"> | |
<div style="flex: 1 1 50%; max-width: 50%;"> | |
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/marigold/marigold_einstein_lcm_depth_16bit.png"/> | |
<figcaption class="mt-1 text-center text-sm text-gray-500"> | |
Predicted depth (16-bit PNG) | |
</figcaption> | |
</div> | |
<div style="flex: 1 1 50%; max-width: 50%;"> | |
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/marigold/marigold_einstein_lcm_depth.png"/> | |
<figcaption class="mt-1 text-center text-sm text-gray-500"> | |
Predicted depth visualization (Spectral) | |
</figcaption> | |
</div> | |
</div> | |
### Surface Normals Prediction Quick Start | |
Load `prs-eth/marigold-normals-lcm-v0-1` checkpoint into `MarigoldNormalsPipeline` pipeline, put the image through the pipeline, and save the predictions: | |
```python | |
import diffusers | |
import torch | |
pipe = diffusers.MarigoldNormalsPipeline.from_pretrained( | |
"prs-eth/marigold-normals-lcm-v0-1", variant="fp16", torch_dtype=torch.float16 | |
).to("cuda") | |
image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg") | |
normals = pipe(image) | |
vis = pipe.image_processor.visualize_normals(normals.prediction) | |
vis[0].save("einstein_normals.png") | |
``` | |
The visualization function for normals [`~pipelines.marigold.marigold_image_processing.MarigoldImageProcessor.visualize_normals`] maps the three-dimensional prediction with pixel values in the range `[-1, 1]` into an RGB image. | |
The visualization function supports flipping surface normals axes to make the visualization compatible with other choices of the frame of reference. | |
Conceptually, each pixel is painted according to the surface normal vector in the frame of reference, where `X` axis points right, `Y` axis points up, and `Z` axis points at the viewer. | |
Below is the visualized prediction: | |
<div class="flex gap-4" style="justify-content: center; width: 100%;"> | |
<div style="flex: 1 1 50%; max-width: 50%;"> | |
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/marigold/marigold_einstein_lcm_normals.png"/> | |
<figcaption class="mt-1 text-center text-sm text-gray-500"> | |
Predicted surface normals visualization | |
</figcaption> | |
</div> | |
</div> | |
In this example, the nose tip almost certainly has a point on the surface, in which the surface normal vector points straight at the viewer, meaning that its coordinates are `[0, 0, 1]`. | |
This vector maps to the RGB `[128, 128, 255]`, which corresponds to the violet-blue color. | |
Similarly, a surface normal on the cheek in the right part of the image has a large `X` component, which increases the red hue. | |
Points on the shoulders pointing up with a large `Y` promote green color. | |
### Speeding up inference | |
The above quick start snippets are already optimized for speed: they load the LCM checkpoint, use the `fp16` variant of weights and computation, and perform just one denoising diffusion step. | |
The `pipe(image)` call completes in 280ms on RTX 3090 GPU. | |
Internally, the input image is encoded with the Stable Diffusion VAE encoder, then the U-Net performs one denoising step, and finally, the prediction latent is decoded with the VAE decoder into pixel space. | |
In this case, two out of three module calls are dedicated to converting between pixel and latent space of LDM. | |
Because Marigold's latent space is compatible with the base Stable Diffusion, it is possible to speed up the pipeline call by more than 3x (85ms on RTX 3090) by using a [lightweight replacement of the SD VAE](../api/models/autoencoder_tiny): | |
```diff | |
import diffusers | |
import torch | |
pipe = diffusers.MarigoldDepthPipeline.from_pretrained( | |
"prs-eth/marigold-depth-lcm-v1-0", variant="fp16", torch_dtype=torch.float16 | |
).to("cuda") | |
+ pipe.vae = diffusers.AutoencoderTiny.from_pretrained( | |
+ "madebyollin/taesd", torch_dtype=torch.float16 | |
+ ).cuda() | |
image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg") | |
depth = pipe(image) | |
``` | |
As suggested in [Optimizations](../optimization/torch2.0#torch.compile), adding `torch.compile` may squeeze extra performance depending on the target hardware: | |
```diff | |
import diffusers | |
import torch | |
pipe = diffusers.MarigoldDepthPipeline.from_pretrained( | |
"prs-eth/marigold-depth-lcm-v1-0", variant="fp16", torch_dtype=torch.float16 | |
).to("cuda") | |
+ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True) | |
image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg") | |
depth = pipe(image) | |
``` | |
## Qualitative Comparison with Depth Anything | |
With the above speed optimizations, Marigold delivers predictions with more details and faster than [Depth Anything](https://huggingface.co/docs/transformers/main/en/model_doc/depth_anything) with the largest checkpoint [LiheYoung/depth-anything-large-hf](https://huggingface.co/LiheYoung/depth-anything-large-hf): | |
<div class="flex gap-4"> | |
<div style="flex: 1 1 50%; max-width: 50%;"> | |
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/marigold/marigold_einstein_lcm_depth.png"/> | |
<figcaption class="mt-1 text-center text-sm text-gray-500"> | |
Marigold LCM fp16 with Tiny AutoEncoder | |
</figcaption> | |
</div> | |
<div style="flex: 1 1 50%; max-width: 50%;"> | |
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/marigold/einstein_depthanything_large.png"/> | |
<figcaption class="mt-1 text-center text-sm text-gray-500"> | |
Depth Anything Large | |
</figcaption> | |
</div> | |
</div> | |
## Maximizing Precision and Ensembling | |
Marigold pipelines have a built-in ensembling mechanism combining multiple predictions from different random latents. | |
This is a brute-force way of improving the precision of predictions, capitalizing on the generative nature of diffusion. | |
The ensembling path is activated automatically when the `ensemble_size` argument is set greater than `1`. | |
When aiming for maximum precision, it makes sense to adjust `num_inference_steps` simultaneously with `ensemble_size`. | |
The recommended values vary across checkpoints but primarily depend on the scheduler type. | |
The effect of ensembling is particularly well-seen with surface normals: | |
```python | |
import diffusers | |
model_path = "prs-eth/marigold-normals-v1-0" | |
model_paper_kwargs = { | |
diffusers.schedulers.DDIMScheduler: { | |
"num_inference_steps": 10, | |
"ensemble_size": 10, | |
}, | |
diffusers.schedulers.LCMScheduler: { | |
"num_inference_steps": 4, | |
"ensemble_size": 5, | |
}, | |
} | |
image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg") | |
pipe = diffusers.MarigoldNormalsPipeline.from_pretrained(model_path).to("cuda") | |
pipe_kwargs = model_paper_kwargs[type(pipe.scheduler)] | |
depth = pipe(image, **pipe_kwargs) | |
vis = pipe.image_processor.visualize_normals(depth.prediction) | |
vis[0].save("einstein_normals.png") | |
``` | |
<div class="flex gap-4"> | |
<div style="flex: 1 1 50%; max-width: 50%;"> | |
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/marigold/marigold_einstein_lcm_normals.png"/> | |
<figcaption class="mt-1 text-center text-sm text-gray-500"> | |
Surface normals, no ensembling | |
</figcaption> | |
</div> | |
<div style="flex: 1 1 50%; max-width: 50%;"> | |
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/marigold/marigold_einstein_normals.png"/> | |
<figcaption class="mt-1 text-center text-sm text-gray-500"> | |
Surface normals, with ensembling | |
</figcaption> | |
</div> | |
</div> | |
As can be seen, all areas with fine-grained structurers, such as hair, got more conservative and on average more correct predictions. | |
Such a result is more suitable for precision-sensitive downstream tasks, such as 3D reconstruction. | |
## Quantitative Evaluation | |
To evaluate Marigold quantitatively in standard leaderboards and benchmarks (such as NYU, KITTI, and other datasets), follow the evaluation protocol outlined in the paper: load the full precision fp32 model and use appropriate values for `num_inference_steps` and `ensemble_size`. | |
Optionally seed randomness to ensure reproducibility. Maximizing `batch_size` will deliver maximum device utilization. | |
```python | |
import diffusers | |
import torch | |
device = "cuda" | |
seed = 2024 | |
model_path = "prs-eth/marigold-v1-0" | |
model_paper_kwargs = { | |
diffusers.schedulers.DDIMScheduler: { | |
"num_inference_steps": 50, | |
"ensemble_size": 10, | |
}, | |
diffusers.schedulers.LCMScheduler: { | |
"num_inference_steps": 4, | |
"ensemble_size": 10, | |
}, | |
} | |
image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg") | |
generator = torch.Generator(device=device).manual_seed(seed) | |
pipe = diffusers.MarigoldDepthPipeline.from_pretrained(model_path).to(device) | |
pipe_kwargs = model_paper_kwargs[type(pipe.scheduler)] | |
depth = pipe(image, generator=generator, **pipe_kwargs) | |
# evaluate metrics | |
``` | |
## Using Predictive Uncertainty | |
The ensembling mechanism built into Marigold pipelines combines multiple predictions obtained from different random latents. | |
As a side effect, it can be used to quantify epistemic (model) uncertainty; simply specify `ensemble_size` greater than 1 and set `output_uncertainty=True`. | |
The resulting uncertainty will be available in the `uncertainty` field of the output. | |
It can be visualized as follows: | |
```python | |
import diffusers | |
import torch | |
pipe = diffusers.MarigoldDepthPipeline.from_pretrained( | |
"prs-eth/marigold-depth-lcm-v1-0", variant="fp16", torch_dtype=torch.float16 | |
).to("cuda") | |
image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg") | |
depth = pipe( | |
image, | |
ensemble_size=10, # any number greater than 1; higher values yield higher precision | |
output_uncertainty=True, | |
) | |
uncertainty = pipe.image_processor.visualize_uncertainty(depth.uncertainty) | |
uncertainty[0].save("einstein_depth_uncertainty.png") | |
``` | |
<div class="flex gap-4"> | |
<div style="flex: 1 1 50%; max-width: 50%;"> | |
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/marigold/marigold_einstein_depth_uncertainty.png"/> | |
<figcaption class="mt-1 text-center text-sm text-gray-500"> | |
Depth uncertainty | |
</figcaption> | |
</div> | |
<div style="flex: 1 1 50%; max-width: 50%;"> | |
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/marigold/marigold_einstein_normals_uncertainty.png"/> | |
<figcaption class="mt-1 text-center text-sm text-gray-500"> | |
Surface normals uncertainty | |
</figcaption> | |
</div> | |
</div> | |
The interpretation of uncertainty is easy: higher values (white) correspond to pixels, where the model struggles to make consistent predictions. | |
Evidently, the depth model is the least confident around edges with discontinuity, where the object depth changes drastically. | |
The surface normals model is the least confident in fine-grained structures, such as hair, and dark areas, such as the collar. | |
## Frame-by-frame Video Processing with Temporal Consistency | |
Due to Marigold's generative nature, each prediction is unique and defined by the random noise sampled for the latent initialization. | |
This becomes an obvious drawback compared to traditional end-to-end dense regression networks, as exemplified in the following videos: | |
<div class="flex gap-4"> | |
<div style="flex: 1 1 50%; max-width: 50%;"> | |
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/marigold/marigold_obama.gif"/> | |
<figcaption class="mt-1 text-center text-sm text-gray-500">Input video</figcaption> | |
</div> | |
<div style="flex: 1 1 50%; max-width: 50%;"> | |
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/marigold/marigold_obama_depth_independent.gif"/> | |
<figcaption class="mt-1 text-center text-sm text-gray-500">Marigold Depth applied to input video frames independently</figcaption> | |
</div> | |
</div> | |
To address this issue, it is possible to pass `latents` argument to the pipelines, which defines the starting point of diffusion. | |
Empirically, we found that a convex combination of the very same starting point noise latent and the latent corresponding to the previous frame prediction give sufficiently smooth results, as implemented in the snippet below: | |
```python | |
import imageio | |
from PIL import Image | |
from tqdm import tqdm | |
import diffusers | |
import torch | |
device = "cuda" | |
path_in = "obama.mp4" | |
path_out = "obama_depth.gif" | |
pipe = diffusers.MarigoldDepthPipeline.from_pretrained( | |
"prs-eth/marigold-depth-lcm-v1-0", variant="fp16", torch_dtype=torch.float16 | |
).to(device) | |
pipe.vae = diffusers.AutoencoderTiny.from_pretrained( | |
"madebyollin/taesd", torch_dtype=torch.float16 | |
).to(device) | |
pipe.set_progress_bar_config(disable=True) | |
with imageio.get_reader(path_in) as reader: | |
size = reader.get_meta_data()['size'] | |
last_frame_latent = None | |
latent_common = torch.randn( | |
(1, 4, 768 * size[1] // (8 * max(size)), 768 * size[0] // (8 * max(size))) | |
).to(device=device, dtype=torch.float16) | |
out = [] | |
for frame_id, frame in tqdm(enumerate(reader), desc="Processing Video"): | |
frame = Image.fromarray(frame) | |
latents = latent_common | |
if last_frame_latent is not None: | |
latents = 0.9 * latents + 0.1 * last_frame_latent | |
depth = pipe( | |
frame, match_input_resolution=False, latents=latents, output_latent=True | |
) | |
last_frame_latent = depth.latent | |
out.append(pipe.image_processor.visualize_depth(depth.prediction)[0]) | |
diffusers.utils.export_to_gif(out, path_out, fps=reader.get_meta_data()['fps']) | |
``` | |
Here, the diffusion process starts from the given computed latent. | |
The pipeline sets `output_latent=True` to access `out.latent` and computes its contribution to the next frame's latent initialization. | |
The result is much more stable now: | |
<div class="flex gap-4"> | |
<div style="flex: 1 1 50%; max-width: 50%;"> | |
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/marigold/marigold_obama_depth_independent.gif"/> | |
<figcaption class="mt-1 text-center text-sm text-gray-500">Marigold Depth applied to input video frames independently</figcaption> | |
</div> | |
<div style="flex: 1 1 50%; max-width: 50%;"> | |
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/marigold/marigold_obama_depth_consistent.gif"/> | |
<figcaption class="mt-1 text-center text-sm text-gray-500">Marigold Depth with forced latents initialization</figcaption> | |
</div> | |
</div> | |
## Marigold for ControlNet | |
A very common application for depth prediction with diffusion models comes in conjunction with ControlNet. | |
Depth crispness plays a crucial role in obtaining high-quality results from ControlNet. | |
As seen in comparisons with other methods above, Marigold excels at that task. | |
The snippet below demonstrates how to load an image, compute depth, and pass it into ControlNet in a compatible format: | |
```python | |
import torch | |
import diffusers | |
device = "cuda" | |
generator = torch.Generator(device=device).manual_seed(2024) | |
image = diffusers.utils.load_image( | |
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_depth_source.png" | |
) | |
pipe = diffusers.MarigoldDepthPipeline.from_pretrained( | |
"prs-eth/marigold-depth-lcm-v1-0", torch_dtype=torch.float16, variant="fp16" | |
).to(device) | |
depth_image = pipe(image, generator=generator).prediction | |
depth_image = pipe.image_processor.visualize_depth(depth_image, color_map="binary") | |
depth_image[0].save("motorcycle_controlnet_depth.png") | |
controlnet = diffusers.ControlNetModel.from_pretrained( | |
"diffusers/controlnet-depth-sdxl-1.0", torch_dtype=torch.float16, variant="fp16" | |
).to(device) | |
pipe = diffusers.StableDiffusionXLControlNetPipeline.from_pretrained( | |
"SG161222/RealVisXL_V4.0", torch_dtype=torch.float16, variant="fp16", controlnet=controlnet | |
).to(device) | |
pipe.scheduler = diffusers.DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, use_karras_sigmas=True) | |
controlnet_out = pipe( | |
prompt="high quality photo of a sports bike, city", | |
negative_prompt="", | |
guidance_scale=6.5, | |
num_inference_steps=25, | |
image=depth_image, | |
controlnet_conditioning_scale=0.7, | |
control_guidance_end=0.7, | |
generator=generator, | |
).images | |
controlnet_out[0].save("motorcycle_controlnet_out.png") | |
``` | |
<div class="flex gap-4"> | |
<div style="flex: 1 1 33%; max-width: 33%;"> | |
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_depth_source.png"/> | |
<figcaption class="mt-1 text-center text-sm text-gray-500"> | |
Input image | |
</figcaption> | |
</div> | |
<div style="flex: 1 1 33%; max-width: 33%;"> | |
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/marigold/motorcycle_controlnet_depth.png"/> | |
<figcaption class="mt-1 text-center text-sm text-gray-500"> | |
Depth in the format compatible with ControlNet | |
</figcaption> | |
</div> | |
<div style="flex: 1 1 33%; max-width: 33%;"> | |
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/marigold/motorcycle_controlnet_out.png"/> | |
<figcaption class="mt-1 text-center text-sm text-gray-500"> | |
ControlNet generation, conditioned on depth and prompt: "high quality photo of a sports bike, city" | |
</figcaption> | |
</div> | |
</div> | |
Hopefully, you will find Marigold useful for solving your downstream tasks, be it a part of a more broad generative workflow, or a perception task, such as 3D reconstruction. | |