File size: 6,134 Bytes
658d54b 966e9a5 4150ea0 658d54b fb2a67c 658d54b 966e9a5 af15e10 4150ea0 af15e10 4150ea0 fb2a67c 4150ea0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
---
base_model:
- tencent/DepthCrafter
- stabilityai/stable-video-diffusion-img2vid-xt
language:
- en
library_name: geometry-crafter
license: other
tags:
- video-to-3d
- point-cloud
---
## ___***GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors***___
<div align="center">
_**[Tian-Xing Xu<sup>1</sup>](https://scholar.google.com/citations?user=zHp0rMIAAAAJ&hl=zh-CN),
[Xiangjun Gao<sup>3</sup>](https://scholar.google.com/citations?user=qgdesEcAAAAJ&hl=en),
[Wenbo Hu<sup>2 †</sup>](https://wbhu.github.io),
[Xiaoyu Li<sup>2</sup>](https://xiaoyu258.github.io),
[Song-Hai Zhang<sup>1 †</sup>](https://scholar.google.com/citations?user=AWtV-EQAAAAJ&hl=en),
[Ying Shan<sup>2</sup>](https://scholar.google.com/citations?user=4oXBp9UAAAAJ&hl=en)**_
<br>
<sup>1</sup>Tsinghua University
<sup>2</sup>ARC Lab, Tencent PCG
<sup>3</sup>HKUST

<a href='https://arxiv.org/abs/2504.01016'><img src='https://img.shields.io/badge/arXiv-2504.01016-b31b1b.svg'></a>
<a href='https://geometrycrafter.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a>
<a href='https://huggingface.co/spaces/TencentARC/GeometryCrafter'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue'></a>
</div>
## π Notice
**GeometryCrafter is still under active development!**
We recommend that everyone use English to communicate on issues, as this helps developers from around the world discuss, share experiences, and answer questions together. For further implementation details, please contact `[email protected]`. For business licensing and other related inquiries, don't hesitate to contact `[email protected]`.
If you find GeometryCrafter useful, **please help β this repo**, which is important to Open-Source projects. Thanks!
## π Introduction
We present GeometryCrafter, a novel approach that estimates temporally consistent, high-quality point maps from open-world videos, facilitating downstream applications such as 3D/4D reconstruction and depth-based video editing or generation. This model is described in detail in the paper [GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors](https://arxiv.org/abs/2504.01016).
Release Notes:
- `[01/04/2025]` π₯π₯π₯**GeometryCrafter** is released now, have fun!
## π Quick Start
### Installation
1. Clone this repo:
```bash
git clone --recursive https://github.com/TencentARC/GeometryCrafter
```
2. Install dependencies (please refer to [requirements.txt](requirements.txt)):
```bash
pip install -r requirements.txt
```
### Inference
Run inference code on our provided demo videos at 1.27FPS, which requires a GPU with ~40GB memory for 110 frames with 1024x576 resolution:
```bash
python run.py \
--video_path examples/video1.mp4 \
--save_folder workspace/examples_output \
--height 576 --width 1024
# resize the input video to the target resolution for processing, which should be divided by 64
# the output point maps will be restored to the original resolution before saving
# you can use --downsample_ratio to downsample the input video or reduce --decode_chunk_size to save the memory usage
```
Run inference code with our deterministic variant at 1.50 FPS
```bash
python run.py \
--video_path examples/video1.mp4 \
--save_folder workspace/examples_output \
--height 576 --width 1024 \
--model_type determ
```
Run low-resolution processing at 2.49 FPS, which requires a GPU with ~22GB memory:
```bash
python run.py \
--video_path examples/video1.mp4 \
--save_folder workspace/examples_output \
--height 384 --width 640
```
### Visualization
Visualize the predicted point maps with `Viser`
```bash
python visualize/vis_point_maps.py \
--video_path examples/video1.mp4 \
--data_path workspace/examples_output/video1.npz
```
## π€ Gradio Demo
- Online demo: [**GeometryCrafter**](https://huggingface.co/spaces/TencentARC/GeometryCrafter)
- Local demo:
```bash
gradio app.py
```
## π Dataset Evaluation
Please check the `evaluation` folder.
- To create the dataset we use in the paper, you need to run `evaluation/preprocess/gen_{dataset_name}.py`.
- You need to change `DATA_DIR` and `OUTPUT_DIR` first accordint to your working environment.
- Then you will get the preprocessed datasets containing extracted RGB video and point map npz files. We also provide the catelog of these files.
- Inference for all datasets scripts:
```bash
bash evaluation/run_batch.sh
```
(Remember to replace the `data_root_dir` and `save_root_dir` with your path.)
- Evaluation for all datasets scripts (scale-invariant point map estimation):
```bash
bash evaluation/eval.sh
```
(Remember to replace the `pred_data_root_dir` and `gt_data_root_dir` with your path.)
- Evaluation for all datasets scripts (affine-invariant depth estimation):
```bash
bash evaluation/eval_depth.sh
```
(Remember to replace the `pred_data_root_dir` and `gt_data_root_dir` with your path.)
- We also provide the comparison results of MoGe and the deterministic variant of our method. You can evaluate these methods under the same protocol by uncomment the corresponding lines in `evaluation/run.sh` `evaluation/eval.sh` `evaluation/run_batch.sh` and `evaluation/eval_depth.sh`.
## π€ Contributing
- Welcome to open issues and pull requests.
- Welcome to optimize the inference speed and memory usage, e.g., through model quantization, distillation, or other acceleration techniques.
## π Citation
If you find this work helpful, please consider citing:
```bibtex
@misc{xu2025geometrycrafterconsistentgeometryestimation,
title={GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors},
author={Tian-Xing Xu and Xiangjun Gao and Wenbo Hu and Xiaoyu Li and Song-Hai Zhang and Ying Shan},
year={2025},
eprint={2504.01016},
archivePrefix={arXiv},
primaryClass={cs.GR},
url={https://arxiv.org/abs/2504.01016},
}
``` |