Add Github repository content
Browse files- README_Github.md +136 -0
README_Github.md
ADDED
@@ -0,0 +1,136 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<div align="center">
|
2 |
+
<h1>Depth Anything V2</h1>
|
3 |
+
|
4 |
+
[**Lihe Yang**](https://liheyoung.github.io/)<sup>1</sup> 路 [**Bingyi Kang**](https://bingykang.github.io/)<sup>2†</sup> 路 [**Zilong Huang**](http://speedinghzl.github.io/)<sup>2</sup>
|
5 |
+
<br>
|
6 |
+
[**Zhen Zhao**](http://zhaozhen.me/) 路 [**Xiaogang Xu**](https://xiaogang00.github.io/) 路 [**Jiashi Feng**](https://sites.google.com/site/jshfeng/)<sup>2</sup> 路 [**Hengshuang Zhao**](https://hszhao.github.io/)<sup>1*</sup>
|
7 |
+
|
8 |
+
<sup>1</sup>HKU   <sup>2</sup>TikTok
|
9 |
+
<br>
|
10 |
+
†project lead *corresponding author
|
11 |
+
|
12 |
+
<a href="https://arxiv.org/abs/2406.09414"><img src='https://img.shields.io/badge/arXiv-Depth Anything V2-red' alt='Paper PDF'></a>
|
13 |
+
<a href='https://depth-anything-v2.github.io'><img src='https://img.shields.io/badge/Project_Page-Depth Anything V2-green' alt='Project Page'></a>
|
14 |
+
<a href='https://huggingface.co/spaces/depth-anything/Depth-Anything-V2'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a>
|
15 |
+
<a href='https://huggingface.co/datasets/depth-anything/DA-2K'><img src='https://img.shields.io/badge/Benchmark-DA--2K-yellow' alt='Benchmark'></a>
|
16 |
+
</div>
|
17 |
+
|
18 |
+
This work presents Depth Anything V2. It significantly outperforms [V1](https://github.com/LiheYoung/Depth-Anything) in fine-grained details and robustness. Compared with SD-based models, it enjoys faster inference speed, fewer parameters, and higher depth accuracy.
|
19 |
+
|
20 |
+
![teaser](assets/teaser.png)
|
21 |
+
|
22 |
+
## News
|
23 |
+
|
24 |
+
- **2024-06-14:** Paper, project page, code, models, demo, and benchmark are all released.
|
25 |
+
|
26 |
+
|
27 |
+
## Pre-trained Models
|
28 |
+
|
29 |
+
We provide **four models** of varying scales for robust relative depth estimation:
|
30 |
+
|
31 |
+
| Model | Params | Checkpoint |
|
32 |
+
|:-|-:|:-:|
|
33 |
+
| Depth-Anything-V2-Small | 24.8M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Small/resolve/main/depth_anything_v2_vits.pth?download=true) |
|
34 |
+
| Depth-Anything-V2-Base | 97.5M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Base/resolve/main/depth_anything_v2_vitb.pth?download=true) |
|
35 |
+
| Depth-Anything-V2-Large | 335.3M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Large/resolve/main/depth_anything_v2_vitl.pth?download=true) |
|
36 |
+
| Depth-Anything-V2-Giant | 1.3B | Coming soon |
|
37 |
+
|
38 |
+
|
39 |
+
### Code snippet to use our models
|
40 |
+
```python
|
41 |
+
import cv2
|
42 |
+
import torch
|
43 |
+
|
44 |
+
from depth_anything_v2.dpt import DepthAnythingV2
|
45 |
+
|
46 |
+
# take depth-anything-v2-large as an example
|
47 |
+
model = DepthAnythingV2(encoder='vitl', features=256, out_channels=[256, 512, 1024, 1024])
|
48 |
+
model.load_state_dict(torch.load('checkpoints/depth_anything_v2_vitl.pth', map_location='cpu'))
|
49 |
+
model.eval()
|
50 |
+
|
51 |
+
raw_img = cv2.imread('your/image/path')
|
52 |
+
depth = model.infer_image(raw_img) # HxW raw depth map
|
53 |
+
```
|
54 |
+
|
55 |
+
## Usage
|
56 |
+
|
57 |
+
### Installation
|
58 |
+
|
59 |
+
```bash
|
60 |
+
git clone https://github.com/DepthAnything/Depth-Anything-V2
|
61 |
+
cd Depth-Anything-V2
|
62 |
+
pip install -r requirements.txt
|
63 |
+
```
|
64 |
+
|
65 |
+
### Running
|
66 |
+
|
67 |
+
```bash
|
68 |
+
python run.py --encoder <vits | vitb | vitl | vitg> --img-path <path> --outdir <outdir> [--input-size <size>] [--pred-only] [--grayscale]
|
69 |
+
```
|
70 |
+
Options:
|
71 |
+
- `--img-path`: You can either 1) point it to an image directory storing all interested images, 2) point it to a single image, or 3) point it to a text file storing all image paths.
|
72 |
+
- `--input-size` (optional): By default, we use input size `518` for model inference. **You can increase the size for even more fine-grained results.**
|
73 |
+
- `--pred-only` (optional): Only save the predicted depth map, without raw image.
|
74 |
+
- `--grayscale` (optional): Save the grayscale depth map, without applying color palette.
|
75 |
+
|
76 |
+
For example:
|
77 |
+
```bash
|
78 |
+
python run.py --encoder vitl --img-path assets/examples --outdir depth_vis
|
79 |
+
```
|
80 |
+
|
81 |
+
**If you want to use Depth Anything V2 on videos:**
|
82 |
+
|
83 |
+
```bash
|
84 |
+
python run_video.py --encoder vitl --video-path assets/examples_video --outdir video_depth_vis
|
85 |
+
```
|
86 |
+
|
87 |
+
*Please note that our larger model has better temporal consistency on videos.*
|
88 |
+
|
89 |
+
|
90 |
+
### Gradio demo
|
91 |
+
|
92 |
+
To use our gradio demo locally:
|
93 |
+
|
94 |
+
```bash
|
95 |
+
python app.py
|
96 |
+
```
|
97 |
+
|
98 |
+
You can also try our [online demo](https://huggingface.co/spaces/Depth-Anything/Depth-Anything-V2).
|
99 |
+
|
100 |
+
**Note:** Compared to V1, we have made a minor modification to the DINOv2-DPT architecture (originating from this [issue](https://github.com/LiheYoung/Depth-Anything/issues/81)). In V1, we *unintentionally* used features from the last four layers of DINOv2 for decoding. In V2, we use [intermediate features](https://github.com/DepthAnything/Depth-Anything-V2/blob/2cbc36a8ce2cec41d38ee51153f112e87c8e42d8/depth_anything_v2/dpt.py#L164-L169) instead. Although this modification did not improve details or accuracy, we decided to follow this common practice.
|
101 |
+
|
102 |
+
|
103 |
+
|
104 |
+
## Fine-tuned to Metric Depth Estimation
|
105 |
+
|
106 |
+
Please refer to [metric depth estimation](./metric_depth).
|
107 |
+
|
108 |
+
|
109 |
+
## DA-2K Evaluation Benchmark
|
110 |
+
|
111 |
+
Please refer to [DA-2K benchmark](./DA-2K.md).
|
112 |
+
|
113 |
+
## LICENSE
|
114 |
+
|
115 |
+
Depth-Anything-V2-Small model is under the Apache-2.0 license. Depth-Anything-V2-Base/Large/Giant models are under the CC-BY-NC-4.0 license.
|
116 |
+
|
117 |
+
|
118 |
+
## Citation
|
119 |
+
|
120 |
+
If you find this project useful, please consider citing:
|
121 |
+
|
122 |
+
```bibtex
|
123 |
+
@article{depth_anything_v2,
|
124 |
+
title={Depth Anything V2},
|
125 |
+
author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
|
126 |
+
journal={arXiv:2406.09414},
|
127 |
+
year={2024}
|
128 |
+
}
|
129 |
+
|
130 |
+
@inproceedings{depth_anything_v1,
|
131 |
+
title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data},
|
132 |
+
author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
|
133 |
+
booktitle={CVPR},
|
134 |
+
year={2024}
|
135 |
+
}
|
136 |
+
```
|