Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
[**Lihe Yang**](https://liheyoung.github.io/)
1 · [**Bingyi Kang**](https://scholar.google.com/citations?user=NmHgX-wAAAAJ)
2+ · [**Zilong Huang**](http://speedinghzl.github.io/)
2 · [**Xiaogang Xu**](https://xiaogang00.github.io/)
3,4 · [**Jiashi Feng**](https://sites.google.com/site/jshfeng/)
2 · [**Hengshuang Zhao**](https://hszhao.github.io/)
1+
1The University of Hong Kong ·
2TikTok ·
3Zhejiang Lab ·
4Zhejiang University
+corresponding authors
**CVPR 2024**
This work presents Depth Anything, a highly practical solution for robust monocular depth estimation by training on a combination of 1.5M labeled images and **62M+ unlabeled images**.

## News
* **2024-02-27:** Depth Anything is accepted by CVPR 2024.
* **2024-02-05:** [Depth Anything Gallery](./gallery.md) is released. Thank all the users!
* **2024-02-02:** Depth Anything serves as the default depth processor for [InstantID](https://github.com/InstantID/InstantID) and [InvokeAI](https://github.com/invoke-ai/InvokeAI/releases/tag/v3.6.1).
* **2024-01-25:** Support [video depth visualization](./run_video.py). An [online demo for video](https://huggingface.co/spaces/JohanDL/Depth-Anything-Video) is also available.
* **2024-01-23:** The new ControlNet based on Depth Anything is integrated into [ControlNet WebUI](https://github.com/Mikubill/sd-webui-controlnet) and [ComfyUI's ControlNet](https://github.com/Fannovel16/comfyui_controlnet_aux).
* **2024-01-23:** Depth Anything [ONNX](https://github.com/fabio-sim/Depth-Anything-ONNX) and [TensorRT](https://github.com/spacewalk01/depth-anything-tensorrt) versions are supported.
* **2024-01-22:** Paper, project page, code, models, and demo ([HuggingFace](https://huggingface.co/spaces/LiheYoung/Depth-Anything), [OpenXLab](https://openxlab.org.cn/apps/detail/yyfan/depth_anything)) are released.
## Features of Depth Anything
***If you need other features, please first check [existing community supports](#community-support).***
- **Relative depth estimation**:
Our foundation models listed [here](https://huggingface.co/spaces/LiheYoung/Depth-Anything/tree/main/checkpoints) can provide relative depth estimation for any given image robustly. Please refer [here](#running) for details.
- **Metric depth estimation**
We fine-tune our Depth Anything model with metric depth information from NYUv2 or KITTI. It offers strong capabilities of both in-domain and zero-shot metric depth estimation. Please refer [here](./metric_depth) for details.
- **Better depth-conditioned ControlNet**
We re-train **a better depth-conditioned ControlNet** based on Depth Anything. It offers more precise synthesis than the previous MiDaS-based ControlNet. Please refer [here](./controlnet/) for details. You can also use our new ControlNet based on Depth Anything in [ControlNet WebUI](https://github.com/Mikubill/sd-webui-controlnet) or [ComfyUI's ControlNet](https://github.com/Fannovel16/comfyui_controlnet_aux).
- **Downstream high-level scene understanding**
The Depth Anything encoder can be fine-tuned to downstream high-level perception tasks, *e.g.*, semantic segmentation, 86.2 mIoU on Cityscapes and 59.4 mIoU on ADE20K. Please refer [here](./semseg/) for details.
## Performance
Here we compare our Depth Anything with the previously best MiDaS v3.1 BEiT