amos1088's picture
uuu
a2919a7

A newer version of the Gradio SDK is available: 5.24.0

Upgrade

DEPTH

Monocular depth estimation is a computer vision task that involves predicting the depth information of a scene from a single image. In other words, it is the process of estimating the distance of objects in a scene from a single camera viewpoint.

Monocular depth estimation has various applications, including 3D reconstruction, augmented reality, autonomous driving, and robotics. It is a challenging task as it requires the model to understand the complex relationships between objects in the scene and the corresponding depth information, which can be affected by factors such as lighting conditions, occlusion, and texture.

Usage

from image_gen_aux import DepthPreprocessor
from image_gen_aux.utils import load_image

input_image = load_image("https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/depth/coffee_ship.png")

depth_preprocessor = DepthPreprocessor.from_pretrained("depth-anything/Depth-Anything-V2-Large-hf").to("cuda")
image = depth_preprocessor(input_image)[0]
image.save("depth.png")

Models

The Depth Preprocessor supports any depth estimation model that the transformers library supports that doesn't have a fixed image size restriction, but we mainly recommend and ensure the correct functionality for these models:

Model License Project Page
Depth Anything V2 CC-BY-NC-4.0 https://depth-anything-v2.github.io/
ZoeDepth MIT https://github.com/isl-org/ZoeDepth

Each model has different variations:

Depth Anything V2

Variation Repo ID
Small depth-anything/Depth-Anything-V2-Small-hf
Base depth-anything/Depth-Anything-V2-Base-hf
Large depth-anything/Depth-Anything-V2-Large-hf

ZoeDepth

Variation Repo ID
NYU Intel/zoedepth-nyu
KITTI Intel/zoedepth-kitti
NYU and KITTI Intel/zoedepth-nyu-kitti