Huiwenshi commited on
Commit
b155b2e
1 Parent(s): cb4a765

Upload folder using huggingface_hub

Browse files
Files changed (43) hide show
  1. .ipynb_checkpoints/README-checkpoint.md +231 -0
  2. .ipynb_checkpoints/app-checkpoint.py +343 -0
  3. .ipynb_checkpoints/app_hg-checkpoint.py +384 -0
  4. .ipynb_checkpoints/main-checkpoint.py +164 -0
  5. .ipynb_checkpoints/requirements-checkpoint.txt +24 -0
  6. README.md +0 -11
  7. app.py +31 -7
  8. infer/.ipynb_checkpoints/__init__-checkpoint.py +32 -0
  9. infer/.ipynb_checkpoints/gif_render-checkpoint.py +79 -0
  10. infer/.ipynb_checkpoints/image_to_views-checkpoint.py +126 -0
  11. infer/.ipynb_checkpoints/removebg-checkpoint.py +101 -0
  12. infer/.ipynb_checkpoints/text_to_image-checkpoint.py +105 -0
  13. infer/.ipynb_checkpoints/utils-checkpoint.py +87 -0
  14. infer/.ipynb_checkpoints/views_to_mesh-checkpoint.py +154 -0
  15. infer/__init__.py +4 -2
  16. infer/__pycache__/__init__.cpython-38.pyc +0 -0
  17. infer/__pycache__/gif_render.cpython-38.pyc +0 -0
  18. infer/__pycache__/image_to_views.cpython-38.pyc +0 -0
  19. infer/__pycache__/removebg.cpython-38.pyc +0 -0
  20. infer/__pycache__/text_to_image.cpython-38.pyc +0 -0
  21. infer/__pycache__/utils.cpython-38.pyc +0 -0
  22. infer/__pycache__/views_to_mesh.cpython-38.pyc +0 -0
  23. infer/gif_render.py +4 -2
  24. infer/image_to_views.py +4 -2
  25. infer/text_to_image.py +4 -2
  26. infer/utils.py +4 -2
  27. infer/views_to_mesh.py +4 -2
  28. main.py +4 -2
  29. mvd/.ipynb_checkpoints/hunyuan3d_mvd_lite_pipeline-checkpoint.py +392 -0
  30. mvd/.ipynb_checkpoints/hunyuan3d_mvd_std_pipeline-checkpoint.py +473 -0
  31. mvd/.ipynb_checkpoints/utils-checkpoint.py +87 -0
  32. mvd/__pycache__/hunyuan3d_mvd_lite_pipeline.cpython-38.pyc +0 -0
  33. mvd/__pycache__/hunyuan3d_mvd_std_pipeline.cpython-38.pyc +0 -0
  34. mvd/hunyuan3d_mvd_lite_pipeline.py +18 -17
  35. mvd/hunyuan3d_mvd_std_pipeline.py +4 -2
  36. mvd/utils.py +4 -2
  37. svrm/.ipynb_checkpoints/predictor-checkpoint.py +152 -0
  38. svrm/__pycache__/predictor.cpython-38.pyc +0 -0
  39. svrm/ldm/.ipynb_checkpoints/util-checkpoint.py +252 -0
  40. svrm/ldm/models/.ipynb_checkpoints/svrm-checkpoint.py +281 -0
  41. svrm/ldm/models/__pycache__/svrm.cpython-38.pyc +0 -0
  42. svrm/ldm/models/svrm.py +9 -3
  43. svrm/predictor.py +4 -2
.ipynb_checkpoints/README-checkpoint.md ADDED
@@ -0,0 +1,231 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- ## **Hunyuan3D-1.0** -->
2
+
3
+ <p align="center">
4
+ <img src="./assets/logo.png" height=200>
5
+ </p>
6
+
7
+ # Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation
8
+
9
+ <div align="center">
10
+ <a href="https://github.com/tencent/Hunyuan3D-1"><img src="https://img.shields.io/static/v1?label=Code&message=Github&color=blue&logo=github-pages"></a> &ensp;
11
+ <a href="https://3d.hunyuan.tencent.com"><img src="https://img.shields.io/static/v1?label=Homepage&message=Tencent Hunyuan3D&color=blue&logo=github-pages"></a> &ensp;
12
+ <a href="https://arxiv.org/pdf/2411.02293"><img src="https://img.shields.io/static/v1?label=Tech Report&message=Arxiv&color=red&logo=arxiv"></a> &ensp;
13
+ <a href="https://huggingface.co/Tencent/Hunyuan3D-1"><img src="https://img.shields.io/static/v1?label=Checkpoints&message=HuggingFace&color=yellow"></a> &ensp;
14
+ <a href="https://huggingface.co/spaces/Tencent/Hunyuan3D-1"><img src="https://img.shields.io/static/v1?label=Demo&message=HuggingFace&color=yellow"></a> &ensp;
15
+ </div>
16
+
17
+
18
+ ## 🔥🔥🔥 News!!
19
+
20
+ * Nov 5, 2024: 💬 We support demo running image_to_3d generation now. Please check the [script](#using-gradio) below.
21
+ * Nov 5, 2024: 💬 We support demo running text_to_3d generation now. Please check the [script](#using-gradio) below.
22
+
23
+
24
+ ## 📑 Open-source Plan
25
+
26
+ - [x] Inference
27
+ - [x] Checkpoints
28
+ - [ ] Baking related
29
+ - [ ] Training
30
+ - [ ] ComfyUI
31
+ - [ ] Distillation Version
32
+ - [ ] TensorRT Version
33
+
34
+
35
+
36
+ ## **Abstract**
37
+ <p align="center">
38
+ <img src="./assets/teaser.png" height=450>
39
+ </p>
40
+
41
+ While 3D generative models have greatly improved artists' workflows, the existing diffusion models for 3D generation suffer from slow generation and poor generalization. To address this issue, we propose a two-stage approach named Hunyuan3D-1.0 including a lite version and a standard version, that both support text- and image-conditioned generation.
42
+
43
+ In the first stage, we employ a multi-view diffusion model that efficiently generates multi-view RGB in approximately 4 seconds. These multi-view images capture rich details of the 3D asset from different viewpoints, relaxing the tasks from single-view to multi-view reconstruction. In the second stage, we introduce a feed-forward reconstruction model that rapidly and faithfully reconstructs the 3D asset given the generated multi-view images in approximately 7 seconds. The reconstruction network learns to handle noises and in-consistency introduced by the multi-view diffusion and leverages the available information from the condition image to efficiently recover the 3D structure.
44
+
45
+ Our framework involves the text-to-image model, i.e., Hunyuan-DiT, making it a unified framework to support both text- and image-conditioned 3D generation. Our standard version has 3x more parameters than our lite and other existing model. Our Hunyuan3D-1.0 achieves an impressive balance between speed and quality, significantly reducing generation time while maintaining the quality and diversity of the produced assets.
46
+
47
+
48
+ ## 🎉 **Hunyuan3D-1 Architecture**
49
+
50
+ <p align="center">
51
+ <img src="./assets/overview_3.png" height=400>
52
+ </p>
53
+
54
+
55
+ ## 📈 Comparisons
56
+
57
+ We have evaluated Hunyuan3D-1.0 with other open-source 3d-generation methods, our Hunyuan3D-1.0 received the highest user preference across 5 metrics. Details in the picture on the lower left.
58
+
59
+ The lite model takes around 10 seconds to produce a 3D mesh from a single image on an NVIDIA A100 GPU, while the standard model takes roughly 25 seconds. The plot laid out in the lower right demonstrates that Hunyuan3D-1.0 achieves an optimal balance between quality and efficiency.
60
+
61
+ <p align="center">
62
+ <img src="./assets/radar.png" height=300>
63
+ <img src="./assets/runtime.png" height=300>
64
+ </p>
65
+
66
+ ## Get Started
67
+
68
+ #### Begin by cloning the repository:
69
+
70
+ ```shell
71
+ git clone https://github.com/tencent/Hunyuan3D-1
72
+ cd Hunyuan3D-1
73
+ ```
74
+
75
+ #### Installation Guide for Linux
76
+
77
+ We provide an env_install.sh script file for setting up environment.
78
+
79
+ ```
80
+ # step 1, create conda env
81
+ conda create -n hunyuan3d-1 python=3.9 or 3.10 or 3.11 or 3.12
82
+ conda activate hunyuan3d-1
83
+
84
+ # step 2. install torch realated package
85
+ which pip # check pip corresponds to python
86
+
87
+ # modify the cuda version according to your machine (recommended)
88
+ pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
89
+
90
+ # step 3. install other packages
91
+ bash env_install.sh
92
+ ```
93
+ <details>
94
+ <summary>💡Other tips for envrionment installation</summary>
95
+
96
+ Optionally, you can install xformers or flash_attn to acclerate computation:
97
+
98
+ ```
99
+ pip install xformers --index-url https://download.pytorch.org/whl/cu121
100
+ ```
101
+ ```
102
+ pip install flash_attn
103
+ ```
104
+
105
+ Most environment errors are caused by a mismatch between machine and packages. You can try manually specifying the version, as shown in the following successful cases:
106
+ ```
107
+ # python3.9
108
+ pip install torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/cu118
109
+ ```
110
+
111
+ when install pytorch3d, the gcc version is preferably greater than 9, and the gpu driver should not be too old.
112
+
113
+ </details>
114
+
115
+ #### Download Pretrained Models
116
+
117
+ The models are available at [https://huggingface.co/tencent/Hunyuan3D-1](https://huggingface.co/tencent/Hunyuan3D-1):
118
+
119
+ + `Hunyuan3D-1/lite`, lite model for multi-view generation.
120
+ + `Hunyuan3D-1/std`, standard model for multi-view generation.
121
+ + `Hunyuan3D-1/svrm`, sparse-view reconstruction model.
122
+
123
+
124
+ To download the model, first install the huggingface-cli. (Detailed instructions are available [here](https://huggingface.co/docs/huggingface_hub/guides/cli).)
125
+
126
+ ```shell
127
+ python3 -m pip install "huggingface_hub[cli]"
128
+ ```
129
+
130
+ Then download the model using the following commands:
131
+
132
+ ```shell
133
+ mkdir weights
134
+ huggingface-cli download tencent/Hunyuan3D-1 --local-dir ./weights
135
+
136
+ mkdir weights/hunyuanDiT
137
+ huggingface-cli download Tencent-Hunyuan/HunyuanDiT-v1.1-Diffusers-Distilled --local-dir ./weights/hunyuanDiT
138
+ ```
139
+
140
+ #### Inference
141
+ For text to 3d generation, we supports bilingual Chinese and English, you can use the following command to inference.
142
+ ```python
143
+ python3 main.py \
144
+ --text_prompt "a lovely rabbit" \
145
+ --save_folder ./outputs/test/ \
146
+ --max_faces_num 90000 \
147
+ --do_texture_mapping \
148
+ --do_render
149
+ ```
150
+
151
+ For image to 3d generation, you can use the following command to inference.
152
+ ```python
153
+ python3 main.py \
154
+ --image_prompt "/path/to/your/image" \
155
+ --save_folder ./outputs/test/ \
156
+ --max_faces_num 90000 \
157
+ --do_texture_mapping \
158
+ --do_render
159
+ ```
160
+ We list some more useful configurations for easy usage:
161
+
162
+ | Argument | Default | Description |
163
+ |:------------------:|:---------:|:---------------------------------------------------:|
164
+ |`--text_prompt` | None |The text prompt for 3D generation |
165
+ |`--image_prompt` | None |The image prompt for 3D generation |
166
+ |`--t2i_seed` | 0 |The random seed for generating images |
167
+ |`--t2i_steps` | 25 |The number of steps for sampling of text to image |
168
+ |`--gen_seed` | 0 |The random seed for generating 3d generation |
169
+ |`--gen_steps` | 50 |The number of steps for sampling of 3d generation |
170
+ |`--max_faces_numm` | 90000 |The limit number of faces of 3d mesh |
171
+ |`--save_memory` | False |module will move to cpu automatically|
172
+ |`--do_texture_mapping` | False |Change vertex shadding to texture shading |
173
+ |`--do_render` | False |render gif |
174
+
175
+
176
+ We have also prepared scripts with different configurations for reference
177
+ - Inference Std-pipeline requires 30GB VRAM (24G VRAM with --save_memory).
178
+ - Inference Lite-pipeline requires 22GB VRAM (18G VRAM with --save_memory).
179
+ - Note: --save_memory will increase inference time
180
+
181
+ ```bash
182
+ bash scripts/text_to_3d_std.sh
183
+ bash scripts/text_to_3d_lite.sh
184
+ bash scripts/image_to_3d_std.sh
185
+ bash scripts/image_to_3d_lite.sh
186
+ ```
187
+
188
+ If your gpu memory is 16G, you can try to run modules in pipeline seperately:
189
+ ```bash
190
+ bash scripts/text_to_3d_std_separately.sh 'a lovely rabbit' ./outputs/test # >= 16G
191
+ bash scripts/text_to_3d_lite_separately.sh 'a lovely rabbit' ./outputs/test # >= 14G
192
+ bash scripts/image_to_3d_std_separately.sh ./demos/example_000.png ./outputs/test # >= 16G
193
+ bash scripts/image_to_3d_lite_separately.sh ./demos/example_000.png ./outputs/test # >= 10G
194
+ ```
195
+
196
+ #### Using Gradio
197
+
198
+ We have prepared two versions of multi-view generation, std and lite.
199
+
200
+ ```shell
201
+ # std
202
+ python3 app.py
203
+ python3 app.py --save_memory
204
+
205
+ # lite
206
+ python3 app.py --use_lite
207
+ python3 app.py --use_lite --save_memory
208
+ ```
209
+
210
+ Then the demo can be accessed through http://0.0.0.0:8080. It should be noted that the 0.0.0.0 here needs to be X.X.X.X with your server IP.
211
+
212
+ ## Camera Parameters
213
+
214
+ Output views are a fixed set of camera poses:
215
+
216
+ + Azimuth (relative to input view): `+0, +60, +120, +180, +240, +300`.
217
+
218
+
219
+ ## Citation
220
+
221
+ If you found this repository helpful, please cite our report:
222
+ ```bibtex
223
+ @misc{yang2024tencent,
224
+ title={Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation},
225
+ author={Xianghui Yang and Huiwen Shi and Bowen Zhang and Fan Yang and Jiacheng Wang and Hongxu Zhao and Xinhai Liu and Xinzhou Wang and Qingxiang Lin and Jiaao Yu and Lifu Wang and Zhuo Chen and Sicong Liu and Yuhong Liu and Yong Yang and Di Wang and Jie Jiang and Chunchao Guo},
226
+ year={2024},
227
+ eprint={2411.02293},
228
+ archivePrefix={arXiv},
229
+ primaryClass={cs.CV}
230
+ }
231
+ ```
.ipynb_checkpoints/app-checkpoint.py ADDED
@@ -0,0 +1,343 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
+
6
+ # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
+ # The below software and/or models in this distribution may have been
8
+ # modified by THL A29 Limited ("Tencent Modifications").
9
+ # All Tencent Modifications are Copyright (C) THL A29 Limited.
10
+
11
+ # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
12
+ # except for the third-party components listed below.
13
+ # Hunyuan 3D does not impose any additional limitations beyond what is outlined
14
+ # in the repsective licenses of these third-party components.
15
+ # Users must comply with all terms and conditions of original licenses of these third-party
16
+ # components and must ensure that the usage of the third party components adheres to
17
+ # all relevant laws and regulations.
18
+
19
+ # For avoidance of doubts, Hunyuan 3D means the large language models and
20
+ # their software and algorithms, including trained model weights, parameters (including
21
+ # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
22
+ # fine-tuning enabling code and other elements of the foregoing made publicly available
23
+ # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
24
+
25
+ import os
26
+ import warnings
27
+ import argparse
28
+ import gradio as gr
29
+ from glob import glob
30
+ import shutil
31
+ import torch
32
+ import numpy as np
33
+ from PIL import Image
34
+ from einops import rearrange
35
+
36
+ from infer import seed_everything, save_gif
37
+ from infer import Text2Image, Removebg, Image2Views, Views2Mesh, GifRenderer
38
+
39
+ warnings.simplefilter('ignore', category=UserWarning)
40
+ warnings.simplefilter('ignore', category=FutureWarning)
41
+ warnings.simplefilter('ignore', category=DeprecationWarning)
42
+
43
+ parser = argparse.ArgumentParser()
44
+ parser.add_argument("--use_lite", default=False, action="store_true")
45
+ parser.add_argument("--mv23d_cfg_path", default="./svrm/configs/svrm.yaml", type=str)
46
+ parser.add_argument("--mv23d_ckt_path", default="weights/svrm/svrm.safetensors", type=str)
47
+ parser.add_argument("--text2image_path", default="weights/hunyuanDiT", type=str)
48
+ parser.add_argument("--save_memory", default=False, action="store_true")
49
+ parser.add_argument("--device", default="cuda:0", type=str)
50
+ args = parser.parse_args()
51
+
52
+ ################################################################
53
+ # initial setting
54
+ ################################################################
55
+
56
+ CONST_PORT = 8080
57
+ CONST_MAX_QUEUE = 1
58
+ CONST_SERVER = '0.0.0.0'
59
+
60
+ CONST_HEADER = '''
61
+ <h2><b>Official 🤗 Gradio Demo</b></h2><h2><a href='https://github.com/tencent/Hunyuan3D-1' target='_blank'><b>Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D
62
+ Generationr</b></a></h2>
63
+ Code: <a href='https://github.com/tencent/Hunyuan3D-1' target='_blank'>GitHub</a>. Techenical report: <a href='https://arxiv.org/abs/placeholder' target='_blank'>ArXiv</a>.
64
+
65
+ ❗️❗️❗️**Important Notes:**
66
+ - By default, our demo can export a .obj mesh with vertex colors or a .glb mesh.
67
+ - If you select "texture mapping," it will export a .obj mesh with a texture map or a .glb mesh.
68
+ - If you select "render GIF," it will export a GIF image rendering of the .glb file.
69
+ - If the result is unsatisfactory, please try a different seed value (Default: 0).
70
+ '''
71
+
72
+ CONST_CITATION = r"""
73
+ If HunYuan3D-1 is helpful, please help to ⭐ the <a href='https://github.com/tencent/Hunyuan3D-1' target='_blank'>Github Repo</a>. Thanks! [![GitHub Stars](https://img.shields.io/github/stars/tencent/Hunyuan3D-1?style=social)](https://github.com/tencent/Hunyuan3D-1)
74
+ ---
75
+ 📝 **Citation**
76
+ If you find our work useful for your research or applications, please cite using this bibtex:
77
+ ```bibtex
78
+ @misc{yang2024tencent,
79
+ title={Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation},
80
+ author={Xianghui Yang and Huiwen Shi and Bowen Zhang and Fan Yang and Jiacheng Wang and Hongxu Zhao and Xinhai Liu and Xinzhou Wang and Qingxiang Lin and Jiaao Yu and Lifu Wang and Zhuo Chen and Sicong Liu and Yuhong Liu and Yong Yang and Di Wang and Jie Jiang and Chunchao Guo},
81
+ year={2024},
82
+ eprint={2411.02293},
83
+ archivePrefix={arXiv},
84
+ primaryClass={cs.CV}
85
+ }
86
+ ```
87
+ """
88
+
89
+ ################################################################
90
+ # prepare text examples and image examples
91
+ ################################################################
92
+
93
+ def get_example_img_list():
94
+ print('Loading example img list ...')
95
+ return sorted(glob('./demos/example_*.png'))
96
+
97
+ def get_example_txt_list():
98
+ print('Loading example txt list ...')
99
+ txt_list = list()
100
+ for line in open('./demos/example_list.txt'):
101
+ txt_list.append(line.strip())
102
+ return txt_list
103
+
104
+ example_is = get_example_img_list()
105
+ example_ts = get_example_txt_list()
106
+
107
+ ################################################################
108
+ # initial models
109
+ ################################################################
110
+
111
+ worker_xbg = Removebg()
112
+ print(f"loading {args.text2image_path}")
113
+ worker_t2i = Text2Image(
114
+ pretrain = args.text2image_path,
115
+ device = args.device,
116
+ save_memory = args.save_memory
117
+ )
118
+ worker_i2v = Image2Views(
119
+ use_lite = args.use_lite,
120
+ device = args.device,
121
+ save_memory = args.save_memory
122
+ )
123
+ worker_v23 = Views2Mesh(
124
+ args.mv23d_cfg_path,
125
+ args.mv23d_ckt_path,
126
+ use_lite = args.use_lite,
127
+ device = args.device,
128
+ save_memory = args.save_memory
129
+ )
130
+ worker_gif = GifRenderer(args.device)
131
+
132
+ def stage_0_t2i(text, image, seed, step):
133
+ os.makedirs('./outputs/app_output', exist_ok=True)
134
+ exists = set(int(_) for _ in os.listdir('./outputs/app_output') if not _.startswith("."))
135
+ if len(exists) == 30: shutil.rmtree(f"./outputs/app_output/0");cur_id = 0
136
+ else: cur_id = min(set(range(30)) - exists)
137
+ if os.path.exists(f"./outputs/app_output/{(cur_id + 1) % 30}"):
138
+ shutil.rmtree(f"./outputs/app_output/{(cur_id + 1) % 30}")
139
+ save_folder = f'./outputs/app_output/{cur_id}'
140
+ os.makedirs(save_folder, exist_ok=True)
141
+
142
+ dst = save_folder + '/img.png'
143
+
144
+ if not text:
145
+ if image is None:
146
+ return dst, save_folder
147
+ raise gr.Error("Upload image or provide text ...")
148
+ image.save(dst)
149
+ return dst, save_folder
150
+
151
+ image = worker_t2i(text, seed, step)
152
+ image.save(dst)
153
+ dst = worker_xbg(image, save_folder)
154
+ return dst, save_folder
155
+
156
+ def stage_1_xbg(image, save_folder):
157
+ if isinstance(image, str):
158
+ image = Image.open(image)
159
+ dst = save_folder + '/img_nobg.png'
160
+ rgba = worker_xbg(image)
161
+ rgba.save(dst)
162
+ return dst
163
+
164
+ def stage_2_i2v(image, seed, step, save_folder):
165
+ if isinstance(image, str):
166
+ image = Image.open(image)
167
+ gif_dst = save_folder + '/views.gif'
168
+ res_img, pils = worker_i2v(image, seed, step)
169
+ save_gif(pils, gif_dst)
170
+ views_img, cond_img = res_img[0], res_img[1]
171
+ img_array = np.asarray(views_img, dtype=np.uint8)
172
+ show_img = rearrange(img_array, '(n h) (m w) c -> (n m) h w c', n=3, m=2)
173
+ show_img = show_img[worker_i2v.order, ...]
174
+ show_img = rearrange(show_img, '(n m) h w c -> (n h) (m w) c', n=2, m=3)
175
+ show_img = Image.fromarray(show_img)
176
+ return views_img, cond_img, show_img
177
+
178
+ def stage_3_v23(
179
+ views_pil,
180
+ cond_pil,
181
+ seed,
182
+ save_folder,
183
+ target_face_count = 30000,
184
+ do_texture_mapping = True,
185
+ do_render =True
186
+ ):
187
+ do_texture_mapping = do_texture_mapping or do_render
188
+ obj_dst = save_folder + '/mesh_with_colors.obj'
189
+ glb_dst = save_folder + '/mesh.glb'
190
+ worker_v23(
191
+ views_pil,
192
+ cond_pil,
193
+ seed = seed,
194
+ save_folder = save_folder,
195
+ target_face_count = target_face_count,
196
+ do_texture_mapping = do_texture_mapping
197
+ )
198
+ return obj_dst, glb_dst
199
+
200
+ def stage_4_gif(obj_dst, save_folder, do_render_gif=True):
201
+ if not do_render_gif: return None
202
+ gif_dst = save_folder + '/output.gif'
203
+ worker_gif(
204
+ save_folder + '/mesh.obj',
205
+ gif_dst_path = gif_dst
206
+ )
207
+ return gif_dst
208
+ # ===============================================================
209
+ # gradio display
210
+ # ===============================================================
211
+ with gr.Blocks() as demo:
212
+ gr.Markdown(CONST_HEADER)
213
+ with gr.Row(variant="panel"):
214
+ with gr.Column(scale=2):
215
+ with gr.Tab("Text to 3D"):
216
+ with gr.Column():
217
+ text = gr.TextArea('一只黑白相间的熊猫在白色背景上居中坐着,呈现出卡通风格和可爱氛围。', lines=1, max_lines=10, label='Input text')
218
+ with gr.Row():
219
+ textgen_seed = gr.Number(value=0, label="T2I seed", precision=0)
220
+ textgen_step = gr.Number(value=25, label="T2I step", precision=0)
221
+ textgen_SEED = gr.Number(value=0, label="Gen seed", precision=0)
222
+ textgen_STEP = gr.Number(value=50, label="Gen step", precision=0)
223
+ textgen_max_faces = gr.Number(value=90000, label="max number of faces", precision=0)
224
+
225
+ with gr.Row():
226
+ textgen_do_texture_mapping = gr.Checkbox(label="texture mapping", value=False, interactive=True)
227
+ textgen_do_render_gif = gr.Checkbox(label="Render gif", value=False, interactive=True)
228
+ textgen_submit = gr.Button("Generate", variant="primary")
229
+
230
+ with gr.Row():
231
+ gr.Examples(examples=example_ts, inputs=[text], label="Txt examples", examples_per_page=10)
232
+
233
+ with gr.Tab("Image to 3D"):
234
+ with gr.Column():
235
+ input_image = gr.Image(label="Input image",
236
+ width=256, height=256, type="pil",
237
+ image_mode="RGBA", sources="upload",
238
+ interactive=True)
239
+ with gr.Row():
240
+ imggen_SEED = gr.Number(value=0, label="Gen seed", precision=0)
241
+ imggen_STEP = gr.Number(value=50, label="Gen step", precision=0)
242
+ imggen_max_faces = gr.Number(value=90000, label="max number of faces", precision=0)
243
+
244
+ with gr.Row():
245
+ imggen_do_texture_mapping = gr.Checkbox(label="texture mapping", value=False, interactive=True)
246
+ imggen_do_render_gif = gr.Checkbox(label="Render gif", value=False, interactive=True)
247
+ imggen_submit = gr.Button("Generate", variant="primary")
248
+ with gr.Row():
249
+ gr.Examples(
250
+ examples=example_is,
251
+ inputs=[input_image],
252
+ label="Img examples",
253
+ examples_per_page=10
254
+ )
255
+
256
+ with gr.Column(scale=3):
257
+ with gr.Row():
258
+ with gr.Column(scale=2):
259
+ rem_bg_image = gr.Image(label="No backgraound image", type="pil",
260
+ image_mode="RGBA", interactive=False)
261
+ with gr.Column(scale=3):
262
+ result_image = gr.Image(label="Multi views", type="pil", interactive=False)
263
+
264
+ with gr.Row():
265
+ result_3dobj = gr.Model3D(
266
+ clear_color=[0.0, 0.0, 0.0, 0.0],
267
+ label="Output Obj",
268
+ show_label=True,
269
+ visible=True,
270
+ camera_position=[90, 90, None],
271
+ interactive=False
272
+ )
273
+
274
+ result_3dglb = gr.Model3D(
275
+ clear_color=[0.0, 0.0, 0.0, 0.0],
276
+ label="Output Glb",
277
+ show_label=True,
278
+ visible=True,
279
+ camera_position=[90, 90, None],
280
+ interactive=False
281
+ )
282
+ result_gif = gr.Image(label="Rendered GIF", interactive=False)
283
+
284
+ with gr.Row():
285
+ gr.Markdown("""
286
+ We recommend downloading and opening Glb with 3D software, such as Blender, MeshLab, etc.
287
+
288
+ Limited by gradio, Obj file here only be shown as vertex shading, but Glb can be texture shading.
289
+ """)
290
+
291
+ #===============================================================
292
+ # gradio running code
293
+ #===============================================================
294
+
295
+ none = gr.State(None)
296
+ save_folder = gr.State()
297
+ cond_image = gr.State()
298
+ views_image = gr.State()
299
+ text_image = gr.State()
300
+
301
+ textgen_submit.click(
302
+ fn=stage_0_t2i, inputs=[text, none, textgen_seed, textgen_step],
303
+ outputs=[rem_bg_image, save_folder],
304
+ ).success(
305
+ fn=stage_2_i2v, inputs=[rem_bg_image, textgen_SEED, textgen_STEP, save_folder],
306
+ outputs=[views_image, cond_image, result_image],
307
+ ).success(
308
+ fn=stage_3_v23, inputs=[views_image, cond_image, textgen_SEED, save_folder,
309
+ textgen_max_faces, textgen_do_texture_mapping,
310
+ textgen_do_render_gif],
311
+ outputs=[result_3dobj, result_3dglb],
312
+ ).success(
313
+ fn=stage_4_gif, inputs=[result_3dglb, save_folder, textgen_do_render_gif],
314
+ outputs=[result_gif],
315
+ ).success(lambda: print('Text_to_3D Done ...'))
316
+
317
+ imggen_submit.click(
318
+ fn=stage_0_t2i, inputs=[none, input_image, textgen_seed, textgen_step],
319
+ outputs=[text_image, save_folder],
320
+ ).success(
321
+ fn=stage_1_xbg, inputs=[text_image, save_folder],
322
+ outputs=[rem_bg_image],
323
+ ).success(
324
+ fn=stage_2_i2v, inputs=[rem_bg_image, imggen_SEED, imggen_STEP, save_folder],
325
+ outputs=[views_image, cond_image, result_image],
326
+ ).success(
327
+ fn=stage_3_v23, inputs=[views_image, cond_image, imggen_SEED, save_folder,
328
+ imggen_max_faces, imggen_do_texture_mapping,
329
+ imggen_do_render_gif],
330
+ outputs=[result_3dobj, result_3dglb],
331
+ ).success(
332
+ fn=stage_4_gif, inputs=[result_3dglb, save_folder, imggen_do_render_gif],
333
+ outputs=[result_gif],
334
+ ).success(lambda: print('Image_to_3D Done ...'))
335
+
336
+ #===============================================================
337
+ # start gradio server
338
+ #===============================================================
339
+
340
+ gr.Markdown(CONST_CITATION)
341
+ demo.queue(max_size=CONST_MAX_QUEUE)
342
+ demo.launch(server_name=CONST_SERVER, server_port=CONST_PORT)
343
+
.ipynb_checkpoints/app_hg-checkpoint.py ADDED
@@ -0,0 +1,384 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
+
6
+ # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
+ # The below software and/or models in this distribution may have been
8
+ # modified by THL A29 Limited ("Tencent Modifications").
9
+ # All Tencent Modifications are Copyright (C) THL A29 Limited.
10
+
11
+ # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
12
+ # except for the third-party components listed below.
13
+ # Hunyuan 3D does not impose any additional limitations beyond what is outlined
14
+ # in the repsective licenses of these third-party components.
15
+ # Users must comply with all terms and conditions of original licenses of these third-party
16
+ # components and must ensure that the usage of the third party components adheres to
17
+ # all relevant laws and regulations.
18
+
19
+ # For avoidance of doubts, Hunyuan 3D means the large language models and
20
+ # their software and algorithms, including trained model weights, parameters (including
21
+ # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
22
+ # fine-tuning enabling code and other elements of the foregoing made publicly available
23
+ # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
24
+ import spaces
25
+ import os
26
+ os.environ['CUDA_HOME'] = '/usr/local/cuda-11*'
27
+ import warnings
28
+ import argparse
29
+ import gradio as gr
30
+ from glob import glob
31
+ import shutil
32
+ import torch
33
+ import numpy as np
34
+ from PIL import Image
35
+ from einops import rearrange
36
+ from huggingface_hub import snapshot_download
37
+
38
+ from infer import seed_everything, save_gif
39
+ from infer import Text2Image, Removebg, Image2Views, Views2Mesh, GifRenderer
40
+
41
+ warnings.simplefilter('ignore', category=UserWarning)
42
+ warnings.simplefilter('ignore', category=FutureWarning)
43
+ warnings.simplefilter('ignore', category=DeprecationWarning)
44
+
45
+ parser = argparse.ArgumentParser()
46
+ parser.add_argument("--use_lite", default=False, action="store_true")
47
+ parser.add_argument("--mv23d_cfg_path", default="./svrm/configs/svrm.yaml", type=str)
48
+ parser.add_argument("--mv23d_ckt_path", default="weights/svrm/svrm.safetensors", type=str)
49
+ parser.add_argument("--text2image_path", default="weights/hunyuanDiT", type=str)
50
+ parser.add_argument("--save_memory", default=True) # , action="store_true")
51
+ parser.add_argument("--device", default="cuda:0", type=str)
52
+ args = parser.parse_args()
53
+
54
+ def find_cuda():
55
+ # Check if CUDA_HOME or CUDA_PATH environment variables are set
56
+ cuda_home = os.environ.get('CUDA_HOME') or os.environ.get('CUDA_PATH')
57
+
58
+ if cuda_home and os.path.exists(cuda_home):
59
+ return cuda_home
60
+
61
+ # Search for the nvcc executable in the system's PATH
62
+ nvcc_path = shutil.which('nvcc')
63
+
64
+ if nvcc_path:
65
+ # Remove the 'bin/nvcc' part to get the CUDA installation path
66
+ cuda_path = os.path.dirname(os.path.dirname(nvcc_path))
67
+ return cuda_path
68
+
69
+ return None
70
+
71
+ cuda_path = find_cuda()
72
+
73
+ if cuda_path:
74
+ print(f"CUDA installation found at: {cuda_path}")
75
+ else:
76
+ print("CUDA installation not found")
77
+
78
+
79
+
80
+ def download_models():
81
+ # Create weights directory if it doesn't exist
82
+ os.makedirs("weights", exist_ok=True)
83
+ os.makedirs("weights/hunyuanDiT", exist_ok=True)
84
+
85
+ # Download Hunyuan3D-1 model
86
+ try:
87
+ snapshot_download(
88
+ repo_id="tencent/Hunyuan3D-1",
89
+ local_dir="./weights",
90
+ resume_download=True
91
+ )
92
+ print("Successfully downloaded Hunyuan3D-1 model")
93
+ except Exception as e:
94
+ print(f"Error downloading Hunyuan3D-1: {e}")
95
+
96
+ # Download HunyuanDiT model
97
+ try:
98
+ snapshot_download(
99
+ repo_id="Tencent-Hunyuan/HunyuanDiT-v1.1-Diffusers-Distilled",
100
+ local_dir="./weights/hunyuanDiT",
101
+ resume_download=True
102
+ )
103
+ print("Successfully downloaded HunyuanDiT model")
104
+ except Exception as e:
105
+ print(f"Error downloading HunyuanDiT: {e}")
106
+
107
+ # Download models before starting the app
108
+ download_models()
109
+
110
+ ################################################################
111
+
112
+ CONST_PORT = 8080
113
+ CONST_MAX_QUEUE = 1
114
+ CONST_SERVER = '0.0.0.0'
115
+
116
+ CONST_HEADER = '''
117
+ <h2><b>Official 🤗 Gradio Demo</b></h2><h2><a href='https://github.com/tencent/Hunyuan3D-1' target='_blank'><b>Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D
118
+ Generationr</b></a></h2>
119
+ Code: <a href='https://github.com/tencent/Hunyuan3D-1' target='_blank'>GitHub</a>. Techenical report: <a href='https://arxiv.org/abs/placeholder' target='_blank'>ArXiv</a>.
120
+
121
+ ❗️❗️❗️**Important Notes:**
122
+ - By default, our demo can export a .obj mesh with vertex colors or a .glb mesh.
123
+ - If you select "texture mapping," it will export a .obj mesh with a texture map or a .glb mesh.
124
+ - If you select "render GIF," it will export a GIF image rendering of the .glb file.
125
+ - If the result is unsatisfactory, please try a different seed value (Default: 0).
126
+ '''
127
+
128
+ CONST_CITATION = r"""
129
+ If HunYuan3D-1 is helpful, please help to ⭐ the <a href='https://github.com/tencent/Hunyuan3D-1' target='_blank'>Github Repo</a>. Thanks! [![GitHub Stars](https://img.shields.io/github/stars/tencent/Hunyuan3D-1?style=social)](https://github.com/tencent/Hunyuan3D-1)
130
+ ---
131
+ 📝 **Citation**
132
+ If you find our work useful for your research or applications, please cite using this bibtex:
133
+ ```bibtex
134
+ @misc{yang2024tencent,
135
+ title={Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation},
136
+ author={Xianghui Yang and Huiwen Shi and Bowen Zhang and Fan Yang and Jiacheng Wang and Hongxu Zhao and Xinhai Liu and Xinzhou Wang and Qingxiang Lin and Jiaao Yu and Lifu Wang and Zhuo Chen and Sicong Liu and Yuhong Liu and Yong Yang and Di Wang and Jie Jiang and Chunchao Guo},
137
+ year={2024},
138
+ eprint={2411.02293},
139
+ archivePrefix={arXiv},
140
+ primaryClass={cs.CV}
141
+ }
142
+ ```
143
+ """
144
+
145
+ ################################################################
146
+
147
+ def get_example_img_list():
148
+ print('Loading example img list ...')
149
+ return sorted(glob('./demos/example_*.png'))
150
+
151
+ def get_example_txt_list():
152
+ print('Loading example txt list ...')
153
+ txt_list = list()
154
+ for line in open('./demos/example_list.txt'):
155
+ txt_list.append(line.strip())
156
+ return txt_list
157
+
158
+ example_is = get_example_img_list()
159
+ example_ts = get_example_txt_list()
160
+ ################################################################
161
+
162
+ worker_xbg = Removebg()
163
+ print(f"loading {args.text2image_path}")
164
+ worker_t2i = Text2Image(
165
+ pretrain = args.text2image_path,
166
+ device = args.device,
167
+ save_memory = args.save_memory
168
+ )
169
+ worker_i2v = Image2Views(
170
+ use_lite = args.use_lite,
171
+ device = args.device,
172
+ save_memory = args.save_memory
173
+ )
174
+ worker_v23 = Views2Mesh(
175
+ args.mv23d_cfg_path,
176
+ args.mv23d_ckt_path,
177
+ use_lite = args.use_lite,
178
+ device = args.device,
179
+ save_memory = args.save_memory
180
+ )
181
+ worker_gif = GifRenderer(args.device)
182
+
183
+ @spaces.GPU
184
+ def stage_0_t2i(text, image, seed, step):
185
+ os.makedirs('./outputs/app_output', exist_ok=True)
186
+ exists = set(int(_) for _ in os.listdir('./outputs/app_output') if not _.startswith("."))
187
+ if len(exists) == 30: shutil.rmtree(f"./outputs/app_output/0");cur_id = 0
188
+ else: cur_id = min(set(range(30)) - exists)
189
+ if os.path.exists(f"./outputs/app_output/{(cur_id + 1) % 30}"):
190
+ shutil.rmtree(f"./outputs/app_output/{(cur_id + 1) % 30}")
191
+ save_folder = f'./outputs/app_output/{cur_id}'
192
+ os.makedirs(save_folder, exist_ok=True)
193
+
194
+ dst = os.path.join(save_folder, 'img.png')
195
+
196
+ if not text:
197
+ if image is None:
198
+ return dst, save_folder
199
+ raise gr.Error("Upload image or provide text ...")
200
+ image.save(dst)
201
+ return dst, save_folder
202
+
203
+ image = worker_t2i(text, seed, step)
204
+ image.save(dst)
205
+ dst = worker_xbg(image, save_folder)
206
+ return dst, save_folder
207
+
208
+ @spaces.GPU
209
+ def stage_1_xbg(image, save_folder):
210
+ if isinstance(image, str):
211
+ image = Image.open(image)
212
+ dst = save_folder + '/img_nobg.png'
213
+ rgba = worker_xbg(image)
214
+ rgba.save(dst)
215
+ return dst
216
+
217
+ @spaces.GPU
218
+ def stage_2_i2v(image, seed, step, save_folder):
219
+ if isinstance(image, str):
220
+ image = Image.open(image)
221
+ gif_dst = save_folder + '/views.gif'
222
+ res_img, pils = worker_i2v(image, seed, step)
223
+ save_gif(pils, gif_dst)
224
+ views_img, cond_img = res_img[0], res_img[1]
225
+ img_array = np.asarray(views_img, dtype=np.uint8)
226
+ show_img = rearrange(img_array, '(n h) (m w) c -> (n m) h w c', n=3, m=2)
227
+ show_img = show_img[worker_i2v.order, ...]
228
+ show_img = rearrange(show_img, '(n m) h w c -> (n h) (m w) c', n=2, m=3)
229
+ show_img = Image.fromarray(show_img)
230
+ return views_img, cond_img, show_img
231
+
232
+ @spaces.GPU
233
+ def stage_3_v23(
234
+ views_pil,
235
+ cond_pil,
236
+ seed,
237
+ save_folder,
238
+ target_face_count = 30000,
239
+ do_texture_mapping = True,
240
+ do_render =True
241
+ ):
242
+ do_texture_mapping = do_texture_mapping or do_render
243
+ obj_dst = save_folder + '/mesh_with_colors.obj'
244
+ glb_dst = save_folder + '/mesh.glb'
245
+ worker_v23(
246
+ views_pil,
247
+ cond_pil,
248
+ seed = seed,
249
+ save_folder = save_folder,
250
+ target_face_count = target_face_count,
251
+ do_texture_mapping = do_texture_mapping
252
+ )
253
+ return obj_dst, glb_dst
254
+
255
+ @spaces.GPU
256
+ def stage_4_gif(obj_dst, save_folder, do_render_gif=True):
257
+ if not do_render_gif: return None
258
+ gif_dst = save_folder + '/output.gif'
259
+ worker_gif(
260
+ save_folder + '/mesh.obj',
261
+ gif_dst_path = gif_dst
262
+ )
263
+ return gif_dst
264
+
265
+ #===============================================================
266
+ with gr.Blocks() as demo:
267
+ gr.Markdown(CONST_HEADER)
268
+ with gr.Row(variant="panel"):
269
+ with gr.Column(scale=2):
270
+ with gr.Tab("Text to 3D"):
271
+ with gr.Column():
272
+ text = gr.TextArea('一只黑白相间的熊猫在白色背景上居中坐着,呈现出卡通风格和可爱氛围。', lines=1, max_lines=10, label='Input text')
273
+ with gr.Row():
274
+ textgen_seed = gr.Number(value=0, label="T2I seed", precision=0)
275
+ textgen_step = gr.Number(value=25, label="T2I step", precision=0)
276
+ textgen_SEED = gr.Number(value=0, label="Gen seed", precision=0)
277
+ textgen_STEP = gr.Number(value=50, label="Gen step", precision=0)
278
+ textgen_max_faces = gr.Number(value=90000, label="max number of faces", precision=0)
279
+
280
+ with gr.Row():
281
+ textgen_do_texture_mapping = gr.Checkbox(label="texture mapping", value=False, interactive=True)
282
+ textgen_do_render_gif = gr.Checkbox(label="Render gif", value=False, interactive=True)
283
+ textgen_submit = gr.Button("Generate", variant="primary")
284
+
285
+ with gr.Row():
286
+ gr.Examples(examples=example_ts, inputs=[text], label="Txt examples", examples_per_page=10)
287
+
288
+ with gr.Tab("Image to 3D"):
289
+ with gr.Column():
290
+ input_image = gr.Image(label="Input image",
291
+ width=256, height=256, type="pil",
292
+ image_mode="RGBA", sources="upload",
293
+ interactive=True)
294
+ with gr.Row():
295
+ imggen_SEED = gr.Number(value=0, label="Gen seed", precision=0)
296
+ imggen_STEP = gr.Number(value=50, label="Gen step", precision=0)
297
+ imggen_max_faces = gr.Number(value=90000, label="max number of faces", precision=0)
298
+
299
+ with gr.Row():
300
+ imggen_do_texture_mapping = gr.Checkbox(label="texture mapping", value=False, interactive=True)
301
+ imggen_do_render_gif = gr.Checkbox(label="Render gif", value=False, interactive=True)
302
+ imggen_submit = gr.Button("Generate", variant="primary")
303
+ with gr.Row():
304
+ gr.Examples(examples=example_is, inputs=[input_image], label="Img examples", examples_per_page=10)
305
+
306
+ with gr.Column(scale=3):
307
+ with gr.Row():
308
+ with gr.Column(scale=2):
309
+ rem_bg_image = gr.Image(label="No backgraound image", type="pil",
310
+ image_mode="RGBA", interactive=False)
311
+ with gr.Column(scale=3):
312
+ result_image = gr.Image(label="Multi views", type="pil", interactive=False)
313
+
314
+ with gr.Row():
315
+ result_3dobj = gr.Model3D(
316
+ clear_color=[0.0, 0.0, 0.0, 0.0],
317
+ label="Output Obj",
318
+ show_label=True,
319
+ visible=True,
320
+ camera_position=[90, 90, None],
321
+ interactive=False
322
+ )
323
+
324
+ result_3dglb = gr.Model3D(
325
+ clear_color=[0.0, 0.0, 0.0, 0.0],
326
+ label="Output Glb",
327
+ show_label=True,
328
+ visible=True,
329
+ camera_position=[90, 90, None],
330
+ interactive=False
331
+ )
332
+ result_gif = gr.Image(label="Rendered GIF", interactive=False)
333
+
334
+ with gr.Row():
335
+ gr.Markdown("""
336
+ We recommend download and open Glb using 3D software, such as Blender, MeshLab, etc.
337
+ Limited by gradio, Obj file here only be shown as vertex shading, but Glb can be texture shading.
338
+ """)
339
+
340
+ #===============================================================
341
+
342
+ none = gr.State(None)
343
+ save_folder = gr.State()
344
+ cond_image = gr.State()
345
+ views_image = gr.State()
346
+ text_image = gr.State()
347
+
348
+ textgen_submit.click(
349
+ fn=stage_0_t2i, inputs=[text, none, textgen_seed, textgen_step],
350
+ outputs=[rem_bg_image, save_folder],
351
+ ).success(
352
+ fn=stage_2_i2v, inputs=[rem_bg_image, textgen_SEED, textgen_STEP, save_folder],
353
+ outputs=[views_image, cond_image, result_image],
354
+ ).success(
355
+ fn=stage_3_v23, inputs=[views_image, cond_image, textgen_SEED, save_folder, textgen_max_faces, textgen_do_texture_mapping, textgen_do_render_gif],
356
+ outputs=[result_3dobj, result_3dglb],
357
+ ).success(
358
+ fn=stage_4_gif, inputs=[result_3dglb, save_folder, textgen_do_render_gif],
359
+ outputs=[result_gif],
360
+ ).success(lambda: print('Text_to_3D Done ...'))
361
+
362
+ imggen_submit.click(
363
+ fn=stage_0_t2i, inputs=[none, input_image, textgen_seed, textgen_step],
364
+ outputs=[text_image, save_folder],
365
+ ).success(
366
+ fn=stage_1_xbg, inputs=[text_image, save_folder],
367
+ outputs=[rem_bg_image],
368
+ ).success(
369
+ fn=stage_2_i2v, inputs=[rem_bg_image, imggen_SEED, imggen_STEP, save_folder],
370
+ outputs=[views_image, cond_image, result_image],
371
+ ).success(
372
+ fn=stage_3_v23, inputs=[views_image, cond_image, imggen_SEED, save_folder, imggen_max_faces, imggen_do_texture_mapping, imggen_do_render_gif],
373
+ outputs=[result_3dobj, result_3dglb],
374
+ ).success(
375
+ fn=stage_4_gif, inputs=[result_3dglb, save_folder, imggen_do_render_gif],
376
+ outputs=[result_gif],
377
+ ).success(lambda: print('Image_to_3D Done ...'))
378
+
379
+ #===============================================================
380
+
381
+ gr.Markdown(CONST_CITATION)
382
+ demo.queue(max_size=CONST_MAX_QUEUE)
383
+ demo.launch()
384
+
.ipynb_checkpoints/main-checkpoint.py ADDED
@@ -0,0 +1,164 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
+
6
+ # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
+ # The below software and/or models in this distribution may have been
8
+ # modified by THL A29 Limited ("Tencent Modifications").
9
+ # All Tencent Modifications are Copyright (C) THL A29 Limited.
10
+
11
+ # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
12
+ # except for the third-party components listed below.
13
+ # Hunyuan 3D does not impose any additional limitations beyond what is outlined
14
+ # in the repsective licenses of these third-party components.
15
+ # Users must comply with all terms and conditions of original licenses of these third-party
16
+ # components and must ensure that the usage of the third party components adheres to
17
+ # all relevant laws and regulations.
18
+
19
+ # For avoidance of doubts, Hunyuan 3D means the large language models and
20
+ # their software and algorithms, including trained model weights, parameters (including
21
+ # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
22
+ # fine-tuning enabling code and other elements of the foregoing made publicly available
23
+ # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.l
24
+
25
+ import os
26
+ import warnings
27
+ import torch
28
+ from PIL import Image
29
+ import argparse
30
+
31
+ from infer import Text2Image, Removebg, Image2Views, Views2Mesh, GifRenderer
32
+
33
+ warnings.simplefilter('ignore', category=UserWarning)
34
+ warnings.simplefilter('ignore', category=FutureWarning)
35
+ warnings.simplefilter('ignore', category=DeprecationWarning)
36
+
37
+ def get_args():
38
+ parser = argparse.ArgumentParser()
39
+ parser.add_argument(
40
+ "--use_lite", default=False, action="store_true"
41
+ )
42
+ parser.add_argument(
43
+ "--mv23d_cfg_path", default="./svrm/configs/svrm.yaml", type=str
44
+ )
45
+ parser.add_argument(
46
+ "--mv23d_ckt_path", default="weights/svrm/svrm.safetensors", type=str
47
+ )
48
+ parser.add_argument(
49
+ "--text2image_path", default="weights/hunyuanDiT", type=str
50
+ )
51
+ parser.add_argument(
52
+ "--save_folder", default="./outputs/test/", type=str
53
+ )
54
+ parser.add_argument(
55
+ "--text_prompt", default="", type=str,
56
+ )
57
+ parser.add_argument(
58
+ "--image_prompt", default="", type=str
59
+ )
60
+ parser.add_argument(
61
+ "--device", default="cuda:0", type=str
62
+ )
63
+ parser.add_argument(
64
+ "--t2i_seed", default=0, type=int
65
+ )
66
+ parser.add_argument(
67
+ "--t2i_steps", default=25, type=int
68
+ )
69
+ parser.add_argument(
70
+ "--gen_seed", default=0, type=int
71
+ )
72
+ parser.add_argument(
73
+ "--gen_steps", default=50, type=int
74
+ )
75
+ parser.add_argument(
76
+ "--max_faces_num", default=80000, type=int,
77
+ help="max num of face, suggest 80000 for effect, 10000 for speed"
78
+ )
79
+ parser.add_argument(
80
+ "--save_memory", default=False, action="store_true"
81
+ )
82
+ parser.add_argument(
83
+ "--do_texture_mapping", default=False, action="store_true"
84
+ )
85
+ parser.add_argument(
86
+ "--do_render", default=False, action="store_true"
87
+ )
88
+ return parser.parse_args()
89
+
90
+
91
+ if __name__ == "__main__":
92
+ args = get_args()
93
+
94
+ assert not (args.text_prompt and args.image_prompt), "Text and image can only be given to one"
95
+ assert args.text_prompt or args.image_prompt, "Text and image can only be given to one"
96
+
97
+ # init model
98
+ rembg_model = Removebg()
99
+ image_to_views_model = Image2Views(
100
+ device=args.device,
101
+ use_lite=args.use_lite,
102
+ save_memory=args.save_memory
103
+ )
104
+
105
+ views_to_mesh_model = Views2Mesh(
106
+ args.mv23d_cfg_path,
107
+ args.mv23d_ckt_path,
108
+ args.device,
109
+ use_lite=args.use_lite,
110
+ save_memory=args.save_memory
111
+ )
112
+
113
+ if args.text_prompt:
114
+ text_to_image_model = Text2Image(
115
+ pretrain = args.text2image_path,
116
+ device = args.device,
117
+ save_memory = args.save_memory
118
+ )
119
+ if args.do_render:
120
+ gif_renderer = GifRenderer(device=args.device)
121
+
122
+ # ---- ----- ---- ---- ---- ----
123
+
124
+ os.makedirs(args.save_folder, exist_ok=True)
125
+
126
+ # stage 1, text to image
127
+ if args.text_prompt:
128
+ res_rgb_pil = text_to_image_model(
129
+ args.text_prompt,
130
+ seed=args.t2i_seed,
131
+ steps=args.t2i_steps
132
+ )
133
+ res_rgb_pil.save(os.path.join(args.save_folder, "img.jpg"))
134
+ elif args.image_prompt:
135
+ res_rgb_pil = Image.open(args.image_prompt)
136
+
137
+ # stage 2, remove back ground
138
+ res_rgba_pil = rembg_model(res_rgb_pil)
139
+ res_rgb_pil.save(os.path.join(args.save_folder, "img_nobg.png"))
140
+
141
+ # stage 3, image to views
142
+ (views_grid_pil, cond_img), view_pil_list = image_to_views_model(
143
+ res_rgba_pil,
144
+ seed = args.gen_seed,
145
+ steps = args.gen_steps
146
+ )
147
+ views_grid_pil.save(os.path.join(args.save_folder, "views.jpg"))
148
+
149
+ # stage 4, views to mesh
150
+ views_to_mesh_model(
151
+ views_grid_pil,
152
+ cond_img,
153
+ seed = args.gen_seed,
154
+ target_face_count = args.max_faces_num,
155
+ save_folder = args.save_folder,
156
+ do_texture_mapping = args.do_texture_mapping
157
+ )
158
+
159
+ # stage 5, render gif
160
+ if args.do_render:
161
+ gif_renderer(
162
+ os.path.join(args.save_folder, 'mesh.obj'),
163
+ gif_dst_path = os.path.join(args.save_folder, 'output.gif'),
164
+ )
.ipynb_checkpoints/requirements-checkpoint.txt ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ --find-links https://download.pytorch.org/whl/cu118
2
+ torch==2.2.0
3
+ torchvision==0.17.0
4
+ diffusers
5
+ numpy==1.26.4
6
+ transformers
7
+ rembg
8
+ tqdm
9
+ omegaconf
10
+ matplotlib
11
+ opencv-python
12
+ imageio
13
+ jaxtyping
14
+ einops
15
+ SentencePiece
16
+ accelerate
17
+ trimesh
18
+ PyMCubes
19
+ xatlas
20
+ libigl
21
+ git+https://github.com/facebookresearch/pytorch3d@stable
22
+ git+https://github.com/NVlabs/nvdiffrast
23
+ open3d
24
+ ninja
README.md CHANGED
@@ -1,14 +1,3 @@
1
- ---
2
- title: Hunyuan3D-1.0
3
- emoji: 😻
4
- colorFrom: purple
5
- colorTo: red
6
- sdk: gradio
7
- sdk_version: 5.5.0
8
- app_file: app_hg.py
9
- pinned: false
10
- short_description: Text-to-3D and Image-to-3D Generation
11
- ---
12
  <!-- ## **Hunyuan3D-1.0** -->
13
 
14
  <p align="center">
 
 
 
 
 
 
 
 
 
 
 
 
1
  <!-- ## **Hunyuan3D-1.0** -->
2
 
3
  <p align="center">
app.py CHANGED
@@ -1,5 +1,7 @@
1
- # Open Source Model Licensed under the Apache License Version 2.0 and Other Licenses of the Third-Party Components therein:
2
- # The below Model in this distribution may have been modified by THL A29 Limited ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
 
 
3
 
4
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
5
  # The below software and/or models in this distribution may have been
@@ -47,6 +49,8 @@ parser.add_argument("--save_memory", default=False, action="store_true")
47
  parser.add_argument("--device", default="cuda:0", type=str)
48
  args = parser.parse_args()
49
 
 
 
50
  ################################################################
51
 
52
  CONST_PORT = 8080
@@ -82,6 +86,8 @@ If you find our work useful for your research or applications, please cite using
82
  ```
83
  """
84
 
 
 
85
  ################################################################
86
 
87
  def get_example_img_list():
@@ -97,6 +103,9 @@ def get_example_txt_list():
97
 
98
  example_is = get_example_img_list()
99
  example_ts = get_example_txt_list()
 
 
 
100
  ################################################################
101
 
102
  worker_xbg = Removebg()
@@ -196,8 +205,9 @@ def stage_4_gif(obj_dst, save_folder, do_render_gif=True):
196
  gif_dst_path = gif_dst
197
  )
198
  return gif_dst
199
-
200
- #===============================================================
 
201
  with gr.Blocks() as demo:
202
  gr.Markdown(CONST_HEADER)
203
  with gr.Row(variant="panel"):
@@ -236,7 +246,12 @@ with gr.Blocks() as demo:
236
  imggen_do_render_gif = gr.Checkbox(label="Render gif", value=False, interactive=True)
237
  imggen_submit = gr.Button("Generate", variant="primary")
238
  with gr.Row():
239
- gr.Examples(examples=example_is, inputs=[input_image], label="Img examples", examples_per_page=10)
 
 
 
 
 
240
 
241
  with gr.Column(scale=3):
242
  with gr.Row():
@@ -269,9 +284,12 @@ with gr.Blocks() as demo:
269
  with gr.Row():
270
  gr.Markdown("""
271
  We recommend downloading and opening Glb with 3D software, such as Blender, MeshLab, etc.
 
272
  Limited by gradio, Obj file here only be shown as vertex shading, but Glb can be texture shading.
273
  """)
274
 
 
 
275
  #===============================================================
276
 
277
  none = gr.State(None)
@@ -287,7 +305,9 @@ with gr.Blocks() as demo:
287
  fn=stage_2_i2v, inputs=[rem_bg_image, textgen_SEED, textgen_STEP, save_folder],
288
  outputs=[views_image, cond_image, result_image],
289
  ).success(
290
- fn=stage_3_v23, inputs=[views_image, cond_image, textgen_SEED, save_folder, textgen_max_faces, textgen_do_texture_mapping, textgen_do_render_gif],
 
 
291
  outputs=[result_3dobj, result_3dglb],
292
  ).success(
293
  fn=stage_4_gif, inputs=[result_3dglb, save_folder, textgen_do_render_gif],
@@ -304,13 +324,17 @@ with gr.Blocks() as demo:
304
  fn=stage_2_i2v, inputs=[rem_bg_image, imggen_SEED, imggen_STEP, save_folder],
305
  outputs=[views_image, cond_image, result_image],
306
  ).success(
307
- fn=stage_3_v23, inputs=[views_image, cond_image, imggen_SEED, save_folder, imggen_max_faces, imggen_do_texture_mapping, imggen_do_render_gif],
 
 
308
  outputs=[result_3dobj, result_3dglb],
309
  ).success(
310
  fn=stage_4_gif, inputs=[result_3dglb, save_folder, imggen_do_render_gif],
311
  outputs=[result_gif],
312
  ).success(lambda: print('Image_to_3D Done ...'))
313
 
 
 
314
  #===============================================================
315
 
316
  gr.Markdown(CONST_CITATION)
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
 
6
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
  # The below software and/or models in this distribution may have been
 
49
  parser.add_argument("--device", default="cuda:0", type=str)
50
  args = parser.parse_args()
51
 
52
+ ################################################################
53
+ # initial setting
54
  ################################################################
55
 
56
  CONST_PORT = 8080
 
86
  ```
87
  """
88
 
89
+ ################################################################
90
+ # prepare text examples and image examples
91
  ################################################################
92
 
93
  def get_example_img_list():
 
103
 
104
  example_is = get_example_img_list()
105
  example_ts = get_example_txt_list()
106
+
107
+ ################################################################
108
+ # initial models
109
  ################################################################
110
 
111
  worker_xbg = Removebg()
 
205
  gif_dst_path = gif_dst
206
  )
207
  return gif_dst
208
+ # ===============================================================
209
+ # gradio display
210
+ # ===============================================================
211
  with gr.Blocks() as demo:
212
  gr.Markdown(CONST_HEADER)
213
  with gr.Row(variant="panel"):
 
246
  imggen_do_render_gif = gr.Checkbox(label="Render gif", value=False, interactive=True)
247
  imggen_submit = gr.Button("Generate", variant="primary")
248
  with gr.Row():
249
+ gr.Examples(
250
+ examples=example_is,
251
+ inputs=[input_image],
252
+ label="Img examples",
253
+ examples_per_page=10
254
+ )
255
 
256
  with gr.Column(scale=3):
257
  with gr.Row():
 
284
  with gr.Row():
285
  gr.Markdown("""
286
  We recommend downloading and opening Glb with 3D software, such as Blender, MeshLab, etc.
287
+
288
  Limited by gradio, Obj file here only be shown as vertex shading, but Glb can be texture shading.
289
  """)
290
 
291
+ #===============================================================
292
+ # gradio running code
293
  #===============================================================
294
 
295
  none = gr.State(None)
 
305
  fn=stage_2_i2v, inputs=[rem_bg_image, textgen_SEED, textgen_STEP, save_folder],
306
  outputs=[views_image, cond_image, result_image],
307
  ).success(
308
+ fn=stage_3_v23, inputs=[views_image, cond_image, textgen_SEED, save_folder,
309
+ textgen_max_faces, textgen_do_texture_mapping,
310
+ textgen_do_render_gif],
311
  outputs=[result_3dobj, result_3dglb],
312
  ).success(
313
  fn=stage_4_gif, inputs=[result_3dglb, save_folder, textgen_do_render_gif],
 
324
  fn=stage_2_i2v, inputs=[rem_bg_image, imggen_SEED, imggen_STEP, save_folder],
325
  outputs=[views_image, cond_image, result_image],
326
  ).success(
327
+ fn=stage_3_v23, inputs=[views_image, cond_image, imggen_SEED, save_folder,
328
+ imggen_max_faces, imggen_do_texture_mapping,
329
+ imggen_do_render_gif],
330
  outputs=[result_3dobj, result_3dglb],
331
  ).success(
332
  fn=stage_4_gif, inputs=[result_3dglb, save_folder, imggen_do_render_gif],
333
  outputs=[result_gif],
334
  ).success(lambda: print('Image_to_3D Done ...'))
335
 
336
+ #===============================================================
337
+ # start gradio server
338
  #===============================================================
339
 
340
  gr.Markdown(CONST_CITATION)
infer/.ipynb_checkpoints/__init__-checkpoint.py ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
+
6
+ # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
+ # The below software and/or models in this distribution may have been
8
+ # modified by THL A29 Limited ("Tencent Modifications").
9
+ # All Tencent Modifications are Copyright (C) THL A29 Limited.
10
+
11
+ # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
12
+ # except for the third-party components listed below.
13
+ # Hunyuan 3D does not impose any additional limitations beyond what is outlined
14
+ # in the repsective licenses of these third-party components.
15
+ # Users must comply with all terms and conditions of original licenses of these third-party
16
+ # components and must ensure that the usage of the third party components adheres to
17
+ # all relevant laws and regulations.
18
+
19
+ # For avoidance of doubts, Hunyuan 3D means the large language models and
20
+ # their software and algorithms, including trained model weights, parameters (including
21
+ # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
22
+ # fine-tuning enabling code and other elements of the foregoing made publicly available
23
+ # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
24
+
25
+ from .removebg import Removebg
26
+ from .text_to_image import Text2Image
27
+ from .image_to_views import Image2Views, save_gif
28
+ from .views_to_mesh import Views2Mesh
29
+ from .gif_render import GifRenderer
30
+
31
+ from .utils import seed_everything, auto_amp_inference
32
+ from .utils import get_parameter_number, set_parameter_grad_false
infer/.ipynb_checkpoints/gif_render-checkpoint.py ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
+
6
+ # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
+ # The below software and/or models in this distribution may have been
8
+ # modified by THL A29 Limited ("Tencent Modifications").
9
+ # All Tencent Modifications are Copyright (C) THL A29 Limited.
10
+
11
+ # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
12
+ # except for the third-party components listed below.
13
+ # Hunyuan 3D does not impose any additional limitations beyond what is outlined
14
+ # in the repsective licenses of these third-party components.
15
+ # Users must comply with all terms and conditions of original licenses of these third-party
16
+ # components and must ensure that the usage of the third party components adheres to
17
+ # all relevant laws and regulations.
18
+
19
+ # For avoidance of doubts, Hunyuan 3D means the large language models and
20
+ # their software and algorithms, including trained model weights, parameters (including
21
+ # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
22
+ # fine-tuning enabling code and other elements of the foregoing made publicly available
23
+ # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
24
+
25
+ import os, sys
26
+ sys.path.insert(0, f"{os.path.dirname(os.path.dirname(os.path.abspath(__file__)))}")
27
+
28
+ from svrm.ldm.vis_util import render
29
+ from infer.utils import seed_everything, timing_decorator
30
+
31
+ class GifRenderer():
32
+ '''
33
+ render frame(s) of mesh using pytorch3d
34
+ '''
35
+ def __init__(self, device="cuda:0"):
36
+ self.device = device
37
+
38
+ @timing_decorator("gif render")
39
+ def __call__(
40
+ self,
41
+ obj_filename,
42
+ elev=0,
43
+ azim=0,
44
+ resolution=512,
45
+ gif_dst_path='',
46
+ n_views=120,
47
+ fps=30,
48
+ rgb=True
49
+ ):
50
+ render(
51
+ obj_filename,
52
+ elev=elev,
53
+ azim=azim,
54
+ resolution=resolution,
55
+ gif_dst_path=gif_dst_path,
56
+ n_views=n_views,
57
+ fps=fps,
58
+ device=self.device,
59
+ rgb=rgb
60
+ )
61
+
62
+ if __name__ == "__main__":
63
+ import argparse
64
+
65
+ def get_args():
66
+ parser = argparse.ArgumentParser()
67
+ parser.add_argument("--mesh_path", type=str, required=True)
68
+ parser.add_argument("--output_gif_path", type=str, required=True)
69
+ parser.add_argument("--device", default="cuda:0", type=str)
70
+ return parser.parse_args()
71
+
72
+ args = get_args()
73
+
74
+ gif_renderer = GifRenderer(device=args.device)
75
+
76
+ gif_renderer(
77
+ args.mesh_path,
78
+ gif_dst_path = args.output_gif_path
79
+ )
infer/.ipynb_checkpoints/image_to_views-checkpoint.py ADDED
@@ -0,0 +1,126 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
+
6
+ # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
+ # The below software and/or models in this distribution may have been
8
+ # modified by THL A29 Limited ("Tencent Modifications").
9
+ # All Tencent Modifications are Copyright (C) THL A29 Limited.
10
+
11
+ # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
12
+ # except for the third-party components listed below.
13
+ # Hunyuan 3D does not impose any additional limitations beyond what is outlined
14
+ # in the repsective licenses of these third-party components.
15
+ # Users must comply with all terms and conditions of original licenses of these third-party
16
+ # components and must ensure that the usage of the third party components adheres to
17
+ # all relevant laws and regulations.
18
+
19
+ # For avoidance of doubts, Hunyuan 3D means the large language models and
20
+ # their software and algorithms, including trained model weights, parameters (including
21
+ # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
22
+ # fine-tuning enabling code and other elements of the foregoing made publicly available
23
+ # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
24
+
25
+ import os, sys
26
+ sys.path.insert(0, f"{os.path.dirname(os.path.dirname(os.path.abspath(__file__)))}")
27
+
28
+ import time
29
+ import torch
30
+ import random
31
+ import numpy as np
32
+ from PIL import Image
33
+ from einops import rearrange
34
+ from PIL import Image, ImageSequence
35
+
36
+ from infer.utils import seed_everything, timing_decorator, auto_amp_inference
37
+ from infer.utils import get_parameter_number, set_parameter_grad_false, str_to_bool
38
+ from mvd.hunyuan3d_mvd_std_pipeline import HunYuan3D_MVD_Std_Pipeline
39
+ from mvd.hunyuan3d_mvd_lite_pipeline import Hunyuan3d_MVD_Lite_Pipeline
40
+
41
+
42
+ def save_gif(pils, save_path, df=False):
43
+ # save a list of PIL.Image to gif
44
+ spf = 4000 / len(pils)
45
+ os.makedirs(os.path.dirname(save_path), exist_ok=True)
46
+ pils[0].save(save_path, format="GIF", save_all=True, append_images=pils[1:], duration=spf, loop=0)
47
+ return save_path
48
+
49
+
50
+ class Image2Views():
51
+ def __init__(self, device="cuda:0", use_lite=False, save_memory=False):
52
+ self.device = device
53
+ if use_lite:
54
+ self.pipe = Hunyuan3d_MVD_Lite_Pipeline.from_pretrained(
55
+ "./weights/mvd_lite",
56
+ torch_dtype = torch.float16,
57
+ use_safetensors = True,
58
+ )
59
+ else:
60
+ self.pipe = HunYuan3D_MVD_Std_Pipeline.from_pretrained(
61
+ "./weights/mvd_std",
62
+ torch_dtype = torch.float16,
63
+ use_safetensors = True,
64
+ )
65
+ self.pipe = self.pipe.to(device)
66
+ self.order = [0, 1, 2, 3, 4, 5] if use_lite else [0, 2, 4, 5, 3, 1]
67
+ self.save_memory = save_memory
68
+ set_parameter_grad_false(self.pipe.unet)
69
+ print('image2views unet model', get_parameter_number(self.pipe.unet))
70
+
71
+ @torch.no_grad()
72
+ @timing_decorator("image to views")
73
+ @auto_amp_inference
74
+ def __call__(self, *args, **kwargs):
75
+ if self.save_memory:
76
+ self.pipe = self.pipe.to(self.device)
77
+ torch.cuda.empty_cache()
78
+ res = self.call(*args, **kwargs)
79
+ self.pipe = self.pipe.to("cpu")
80
+ else:
81
+ res = self.call(*args, **kwargs)
82
+ torch.cuda.empty_cache()
83
+ return res
84
+
85
+ def call(self, pil_img, seed=0, steps=50, guidance_scale=2.0):
86
+ seed_everything(seed)
87
+ generator = torch.Generator(device=self.device)
88
+ res_img = self.pipe(pil_img,
89
+ num_inference_steps=steps,
90
+ guidance_scale=guidance_scale,
91
+ generat=generator).images
92
+ show_image = rearrange(np.asarray(res_img[0], dtype=np.uint8), '(n h) (m w) c -> (n m) h w c', n=3, m=2)
93
+ pils = [res_img[1]]+[Image.fromarray(show_image[idx]) for idx in self.order]
94
+ torch.cuda.empty_cache()
95
+ return res_img, pils
96
+
97
+
98
+ if __name__ == "__main__":
99
+ import argparse
100
+
101
+ def get_args():
102
+ parser = argparse.ArgumentParser()
103
+ parser.add_argument("--rgba_path", type=str, required=True)
104
+ parser.add_argument("--output_views_path", type=str, required=True)
105
+ parser.add_argument("--output_cond_path", type=str, required=True)
106
+ parser.add_argument("--seed", default=0, type=int)
107
+ parser.add_argument("--steps", default=50, type=int)
108
+ parser.add_argument("--device", default="cuda:0", type=str)
109
+ parser.add_argument("--use_lite", default='false', type=str)
110
+ return parser.parse_args()
111
+
112
+ args = get_args()
113
+
114
+ args.use_lite = str_to_bool(args.use_lite)
115
+
116
+ rgba_pil = Image.open(args.rgba_path)
117
+
118
+ assert rgba_pil.mode == "RGBA", "rgba_pil must be RGBA mode"
119
+
120
+ model = Image2Views(device=args.device, use_lite=args.use_lite)
121
+
122
+ (views_pil, cond), _ = model(rgba_pil, seed=args.seed, steps=args.steps)
123
+
124
+ views_pil.save(args.output_views_path)
125
+ cond.save(args.output_cond_path)
126
+
infer/.ipynb_checkpoints/removebg-checkpoint.py ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os, sys
2
+ sys.path.insert(0, f"{os.path.dirname(os.path.dirname(os.path.abspath(__file__)))}")
3
+
4
+ import numpy as np
5
+ from PIL import Image
6
+ from rembg import remove, new_session
7
+ from infer.utils import timing_decorator
8
+
9
+ class Removebg():
10
+ def __init__(self, name="u2net"):
11
+ self.session = new_session(name)
12
+
13
+ @timing_decorator("remove background")
14
+ def __call__(self, rgb_maybe, force=True):
15
+ '''
16
+ args:
17
+ rgb_maybe: PIL.Image, with RGB mode or RGBA mode
18
+ force: bool, if input is RGBA mode, covert to RGB then remove bg
19
+ return:
20
+ rgba_img: PIL.Image, with RGBA mode
21
+ '''
22
+ if rgb_maybe.mode == "RGBA":
23
+ if force:
24
+ rgb_maybe = rgb_maybe.convert("RGB")
25
+ rgba_img = remove(rgb_maybe, session=self.session)
26
+ else:
27
+ rgba_img = rgb_maybe
28
+ else:
29
+ rgba_img = remove(rgb_maybe, session=self.session)
30
+
31
+ rgba_img = white_out_background(rgba_img)
32
+
33
+ rgba_img = preprocess(rgba_img)
34
+
35
+ return rgba_img
36
+
37
+
38
+ def white_out_background(pil_img):
39
+ data = pil_img.getdata()
40
+ new_data = []
41
+ for r, g, b, a in data:
42
+ if a < 16: # background
43
+ new_data.append((255, 255, 255, 0)) # full white color
44
+ else:
45
+ is_white = (r>235) and (g>235) and (b>235)
46
+ new_r = 235 if is_white else r
47
+ new_g = 235 if is_white else g
48
+ new_b = 235 if is_white else b
49
+ new_data.append((new_r, new_g, new_b, a))
50
+ pil_img.putdata(new_data)
51
+ return pil_img
52
+
53
+ def preprocess(rgba_img, size=(512,512), ratio=1.15):
54
+ image = np.asarray(rgba_img)
55
+ rgb, alpha = image[:,:,:3] / 255., image[:,:,3:] / 255.
56
+
57
+ # crop
58
+ coords = np.nonzero(alpha > 0.1)
59
+ x_min, x_max = coords[0].min(), coords[0].max()
60
+ y_min, y_max = coords[1].min(), coords[1].max()
61
+ rgb = (rgb[x_min:x_max, y_min:y_max, :] * 255).astype("uint8")
62
+ alpha = (alpha[x_min:x_max, y_min:y_max, 0] * 255).astype("uint8")
63
+
64
+ # padding
65
+ h, w = rgb.shape[:2]
66
+ resize_side = int(max(h, w) * ratio)
67
+ pad_h, pad_w = resize_side - h, resize_side - w
68
+ start_h, start_w = pad_h // 2, pad_w // 2
69
+ new_rgb = np.ones((resize_side, resize_side, 3), dtype=np.uint8) * 255
70
+ new_alpha = np.zeros((resize_side, resize_side), dtype=np.uint8)
71
+ new_rgb[start_h:start_h + h, start_w:start_w + w] = rgb
72
+ new_alpha[start_h:start_h + h, start_w:start_w + w] = alpha
73
+ rgba_array = np.concatenate((new_rgb, new_alpha[:,:,None]), axis=-1)
74
+
75
+ rgba_image = Image.fromarray(rgba_array, 'RGBA')
76
+ rgba_image = rgba_image.resize(size)
77
+ return rgba_image
78
+
79
+
80
+ if __name__ == "__main__":
81
+
82
+ import argparse
83
+
84
+ def get_args():
85
+ parser = argparse.ArgumentParser()
86
+ parser.add_argument("--rgb_path", type=str, required=True)
87
+ parser.add_argument("--output_rgba_path", type=str, required=True)
88
+ parser.add_argument("--force", default=False, action="store_true")
89
+ return parser.parse_args()
90
+
91
+ args = get_args()
92
+
93
+ rgb_maybe = Image.open(args.rgb_path)
94
+
95
+ model = Removebg()
96
+
97
+ rgba_pil = model(rgb_maybe, args.force)
98
+
99
+ rgba_pil.save(args.output_rgba_path)
100
+
101
+
infer/.ipynb_checkpoints/text_to_image-checkpoint.py ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
+
6
+ # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
+ # The below software and/or models in this distribution may have been
8
+ # modified by THL A29 Limited ("Tencent Modifications").
9
+ # All Tencent Modifications are Copyright (C) THL A29 Limited.
10
+
11
+ # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
12
+ # except for the third-party components listed below.
13
+ # Hunyuan 3D does not impose any additional limitations beyond what is outlined
14
+ # in the repsective licenses of these third-party components.
15
+ # Users must comply with all terms and conditions of original licenses of these third-party
16
+ # components and must ensure that the usage of the third party components adheres to
17
+ # all relevant laws and regulations.
18
+
19
+ # For avoidance of doubts, Hunyuan 3D means the large language models and
20
+ # their software and algorithms, including trained model weights, parameters (including
21
+ # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
22
+ # fine-tuning enabling code and other elements of the foregoing made publicly available
23
+ # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
24
+ import os , sys
25
+ sys.path.insert(0, f"{os.path.dirname(os.path.dirname(os.path.abspath(__file__)))}")
26
+
27
+ import torch
28
+ from diffusers import HunyuanDiTPipeline, AutoPipelineForText2Image
29
+
30
+ from infer.utils import seed_everything, timing_decorator, auto_amp_inference
31
+ from infer.utils import get_parameter_number, set_parameter_grad_false
32
+
33
+
34
+ class Text2Image():
35
+ def __init__(self, pretrain="weights/hunyuanDiT", device="cuda:0", save_memory=None):
36
+ '''
37
+ save_memory: if GPU memory is low, can set it
38
+ '''
39
+ self.save_memory = save_memory
40
+ self.device = device
41
+ self.pipe = AutoPipelineForText2Image.from_pretrained(
42
+ pretrain,
43
+ torch_dtype = torch.float16,
44
+ enable_pag = True,
45
+ pag_applied_layers = ["blocks.(16|17|18|19)"]
46
+ )
47
+ set_parameter_grad_false(self.pipe.transformer)
48
+ print('text2image transformer model', get_parameter_number(self.pipe.transformer))
49
+ if not save_memory:
50
+ self.pipe = self.pipe.to(device)
51
+ self.neg_txt = "文本,特写,裁剪,出框,最差质量,低质量,JPEG伪影,PGLY,重复,病态,残缺,多余的手指,变异的手," \
52
+ "画得不好的手,画得不好的脸,变异,畸形,模糊,脱水,糟糕的解剖学,糟糕的比例,多余的肢体,克隆的脸," \
53
+ "毁容,恶心的比例,畸形的肢体,缺失的手臂,缺失的腿,额外的手臂,额外的腿,融合的手指,手指太多,长脖子"
54
+
55
+ @torch.no_grad()
56
+ @timing_decorator('text to image')
57
+ @auto_amp_inference
58
+ def __call__(self, *args, **kwargs):
59
+ if self.save_memory:
60
+ self.pipe = self.pipe.to(self.device)
61
+ torch.cuda.empty_cache()
62
+ res = self.call(*args, **kwargs)
63
+ self.pipe = self.pipe.to("cpu")
64
+ else:
65
+ res = self.call(*args, **kwargs)
66
+ torch.cuda.empty_cache()
67
+ return res
68
+
69
+ def call(self, prompt, seed=0, steps=25):
70
+ '''
71
+ args:
72
+ prompr: str
73
+ seed: int
74
+ steps: int
75
+ return:
76
+ rgb: PIL.Image
77
+ '''
78
+ print("prompt is:", prompt)
79
+ prompt = prompt + ",白色背景,3D风格,最佳质量"
80
+ seed_everything(seed)
81
+ generator = torch.Generator(device=self.device)
82
+ if seed is not None: generator = generator.manual_seed(int(seed))
83
+ rgb = self.pipe(prompt=prompt, negative_prompt=self.neg_txt, num_inference_steps=steps,
84
+ pag_scale=1.3, width=1024, height=1024, generator=generator, return_dict=False)[0][0]
85
+ torch.cuda.empty_cache()
86
+ return rgb
87
+
88
+ if __name__ == "__main__":
89
+ import argparse
90
+
91
+ def get_args():
92
+ parser = argparse.ArgumentParser()
93
+ parser.add_argument("--text2image_path", default="weights/hunyuanDiT", type=str)
94
+ parser.add_argument("--text_prompt", default="", type=str)
95
+ parser.add_argument("--output_img_path", default="./outputs/test/img.jpg", type=str)
96
+ parser.add_argument("--device", default="cuda:0", type=str)
97
+ parser.add_argument("--seed", default=0, type=int)
98
+ parser.add_argument("--steps", default=25, type=int)
99
+ return parser.parse_args()
100
+ args = get_args()
101
+
102
+ text2image_model = Text2Image(device=args.device)
103
+ rgb_img = text2image_model(args.text_prompt, seed=args.seed, steps=args.steps)
104
+ rgb_img.save(args.output_img_path)
105
+
infer/.ipynb_checkpoints/utils-checkpoint.py ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
+
6
+ # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
+ # The below software and/or models in this distribution may have been
8
+ # modified by THL A29 Limited ("Tencent Modifications").
9
+ # All Tencent Modifications are Copyright (C) THL A29 Limited.
10
+
11
+ # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
12
+ # except for the third-party components listed below.
13
+ # Hunyuan 3D does not impose any additional limitations beyond what is outlined
14
+ # in the repsective licenses of these third-party components.
15
+ # Users must comply with all terms and conditions of original licenses of these third-party
16
+ # components and must ensure that the usage of the third party components adheres to
17
+ # all relevant laws and regulations.
18
+
19
+ # For avoidance of doubts, Hunyuan 3D means the large language models and
20
+ # their software and algorithms, including trained model weights, parameters (including
21
+ # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
22
+ # fine-tuning enabling code and other elements of the foregoing made publicly available
23
+ # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
24
+
25
+ import os
26
+ import time
27
+ import random
28
+ import numpy as np
29
+ import torch
30
+ from torch.cuda.amp import autocast, GradScaler
31
+ from functools import wraps
32
+
33
+ def seed_everything(seed):
34
+ '''
35
+ seed everthing
36
+ '''
37
+ random.seed(seed)
38
+ np.random.seed(seed)
39
+ torch.manual_seed(seed)
40
+ os.environ["PL_GLOBAL_SEED"] = str(seed)
41
+
42
+ def timing_decorator(category: str):
43
+ '''
44
+ timing_decorator: record time
45
+ '''
46
+ def decorator(func):
47
+ func.call_count = 0
48
+ @wraps(func)
49
+ def wrapper(*args, **kwargs):
50
+ start_time = time.time()
51
+ result = func(*args, **kwargs)
52
+ end_time = time.time()
53
+ elapsed_time = end_time - start_time
54
+ func.call_count += 1
55
+ print(f"[HunYuan3D]-[{category}], cost time: {elapsed_time:.4f}s") # huiwen
56
+ return result
57
+ return wrapper
58
+ return decorator
59
+
60
+ def auto_amp_inference(func):
61
+ '''
62
+ with torch.cuda.amp.autocast()"
63
+ xxx
64
+ '''
65
+ @wraps(func)
66
+ def wrapper(*args, **kwargs):
67
+ with autocast():
68
+ output = func(*args, **kwargs)
69
+ return output
70
+ return wrapper
71
+
72
+ def get_parameter_number(model):
73
+ total_num = sum(p.numel() for p in model.parameters())
74
+ trainable_num = sum(p.numel() for p in model.parameters() if p.requires_grad)
75
+ return {'Total': total_num, 'Trainable': trainable_num}
76
+
77
+ def set_parameter_grad_false(model):
78
+ for p in model.parameters():
79
+ p.requires_grad = False
80
+
81
+ def str_to_bool(s):
82
+ if s.lower() in ['true', 't', 'yes', 'y', '1']:
83
+ return True
84
+ elif s.lower() in ['false', 'f', 'no', 'n', '0']:
85
+ return False
86
+ else:
87
+ raise f"bool arg must one of ['true', 't', 'yes', 'y', '1', 'false', 'f', 'no', 'n', '0']"
infer/.ipynb_checkpoints/views_to_mesh-checkpoint.py ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
+
6
+ # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
+ # The below software and/or models in this distribution may have been
8
+ # modified by THL A29 Limited ("Tencent Modifications").
9
+ # All Tencent Modifications are Copyright (C) THL A29 Limited.
10
+
11
+ # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
12
+ # except for the third-party components listed below.
13
+ # Hunyuan 3D does not impose any additional limitations beyond what is outlined
14
+ # in the repsective licenses of these third-party components.
15
+ # Users must comply with all terms and conditions of original licenses of these third-party
16
+ # components and must ensure that the usage of the third party components adheres to
17
+ # all relevant laws and regulations.
18
+
19
+ # For avoidance of doubts, Hunyuan 3D means the large language models and
20
+ # their software and algorithms, including trained model weights, parameters (including
21
+ # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
22
+ # fine-tuning enabling code and other elements of the foregoing made publicly available
23
+ # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
24
+
25
+ import os, sys
26
+ sys.path.insert(0, f"{os.path.dirname(os.path.dirname(os.path.abspath(__file__)))}")
27
+
28
+ import time
29
+ import torch
30
+ import random
31
+ import numpy as np
32
+ from PIL import Image
33
+ from einops import rearrange
34
+ from PIL import Image, ImageSequence
35
+
36
+ from infer.utils import seed_everything, timing_decorator, auto_amp_inference
37
+ from infer.utils import get_parameter_number, set_parameter_grad_false, str_to_bool
38
+ from svrm.predictor import MV23DPredictor
39
+
40
+
41
+ class Views2Mesh():
42
+ def __init__(self, mv23d_cfg_path, mv23d_ckt_path,
43
+ device="cuda:0", use_lite=False, save_memory=False):
44
+ '''
45
+ mv23d_cfg_path: config yaml file
46
+ mv23d_ckt_path: path to ckpt
47
+ use_lite: lite version
48
+ save_memory: cpu auto
49
+ '''
50
+ self.mv23d_predictor = MV23DPredictor(mv23d_ckt_path, mv23d_cfg_path, device=device)
51
+ self.mv23d_predictor.model.eval()
52
+ self.order = [0, 1, 2, 3, 4, 5] if use_lite else [0, 2, 4, 5, 3, 1]
53
+ self.device = device
54
+ self.save_memory = save_memory
55
+ set_parameter_grad_false(self.mv23d_predictor.model)
56
+ print('view2mesh model', get_parameter_number(self.mv23d_predictor.model))
57
+
58
+ @torch.no_grad()
59
+ @timing_decorator("views to mesh")
60
+ @auto_amp_inference
61
+ def __call__(self, *args, **kwargs):
62
+ if self.save_memory:
63
+ self.mv23d_predictor.model = self.mv23d_predictor.model.to(self.device)
64
+ torch.cuda.empty_cache()
65
+ res = self.call(*args, **kwargs)
66
+ self.mv23d_predictor.model = self.mv23d_predictor.model.to("cpu")
67
+ else:
68
+ res = self.call(*args, **kwargs)
69
+ torch.cuda.empty_cache()
70
+ return res
71
+
72
+ def call(
73
+ self,
74
+ views_pil=None,
75
+ cond_pil=None,
76
+ gif_pil=None,
77
+ seed=0,
78
+ target_face_count = 10000,
79
+ do_texture_mapping = True,
80
+ save_folder='./outputs/test'
81
+ ):
82
+ '''
83
+ can set views_pil, cond_pil simutaously or set gif_pil only
84
+ seed: int
85
+ target_face_count: int
86
+ save_folder: path to save mesh files
87
+ '''
88
+ save_dir = save_folder
89
+ os.makedirs(save_dir, exist_ok=True)
90
+
91
+ if views_pil is not None and cond_pil is not None:
92
+ show_image = rearrange(np.asarray(views_pil, dtype=np.uint8),
93
+ '(n h) (m w) c -> (n m) h w c', n=3, m=2)
94
+ views = [Image.fromarray(show_image[idx]) for idx in self.order]
95
+ image_list = [cond_pil]+ views
96
+ image_list = [img.convert('RGB') for img in image_list]
97
+ elif gif_pil is not None:
98
+ image_list = [img.convert('RGB') for img in ImageSequence.Iterator(gif_pil)]
99
+
100
+ image_input = image_list[0]
101
+ image_list = image_list[1:] + image_list[:1]
102
+
103
+ seed_everything(seed)
104
+ self.mv23d_predictor.predict(
105
+ image_list,
106
+ save_dir = save_dir,
107
+ image_input = image_input,
108
+ target_face_count = target_face_count,
109
+ do_texture_mapping = do_texture_mapping
110
+ )
111
+ torch.cuda.empty_cache()
112
+ return save_dir
113
+
114
+
115
+ if __name__ == "__main__":
116
+
117
+ import argparse
118
+
119
+ def get_args():
120
+ parser = argparse.ArgumentParser()
121
+ parser.add_argument("--views_path", type=str, required=True)
122
+ parser.add_argument("--cond_path", type=str, required=True)
123
+ parser.add_argument("--save_folder", default="./outputs/test/", type=str)
124
+ parser.add_argument("--mv23d_cfg_path", default="./svrm/configs/svrm.yaml", type=str)
125
+ parser.add_argument("--mv23d_ckt_path", default="weights/svrm/svrm.safetensors", type=str)
126
+ parser.add_argument("--max_faces_num", default=90000, type=int,
127
+ help="max num of face, suggest 90000 for effect, 10000 for speed")
128
+ parser.add_argument("--device", default="cuda:0", type=str)
129
+ parser.add_argument("--use_lite", default='false', type=str)
130
+ parser.add_argument("--do_texture_mapping", default='false', type=str)
131
+
132
+ return parser.parse_args()
133
+
134
+ args = get_args()
135
+ args.use_lite = str_to_bool(args.use_lite)
136
+ args.do_texture_mapping = str_to_bool(args.do_texture_mapping)
137
+
138
+ views = Image.open(args.views_path)
139
+ cond = Image.open(args.cond_path)
140
+
141
+ views_to_mesh_model = Views2Mesh(
142
+ args.mv23d_cfg_path,
143
+ args.mv23d_ckt_path,
144
+ device = args.device,
145
+ use_lite = args.use_lite
146
+ )
147
+
148
+ views_to_mesh_model(
149
+ views, cond, 0,
150
+ target_face_count = args.max_faces_num,
151
+ save_folder = args.save_folder,
152
+ do_texture_mapping = args.do_texture_mapping
153
+ )
154
+
infer/__init__.py CHANGED
@@ -1,5 +1,7 @@
1
- # Open Source Model Licensed under the Apache License Version 2.0 and Other Licenses of the Third-Party Components therein:
2
- # The below Model in this distribution may have been modified by THL A29 Limited ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
 
 
3
 
4
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
5
  # The below software and/or models in this distribution may have been
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
 
6
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
  # The below software and/or models in this distribution may have been
infer/__pycache__/__init__.cpython-38.pyc CHANGED
Binary files a/infer/__pycache__/__init__.cpython-38.pyc and b/infer/__pycache__/__init__.cpython-38.pyc differ
 
infer/__pycache__/gif_render.cpython-38.pyc CHANGED
Binary files a/infer/__pycache__/gif_render.cpython-38.pyc and b/infer/__pycache__/gif_render.cpython-38.pyc differ
 
infer/__pycache__/image_to_views.cpython-38.pyc CHANGED
Binary files a/infer/__pycache__/image_to_views.cpython-38.pyc and b/infer/__pycache__/image_to_views.cpython-38.pyc differ
 
infer/__pycache__/removebg.cpython-38.pyc CHANGED
Binary files a/infer/__pycache__/removebg.cpython-38.pyc and b/infer/__pycache__/removebg.cpython-38.pyc differ
 
infer/__pycache__/text_to_image.cpython-38.pyc CHANGED
Binary files a/infer/__pycache__/text_to_image.cpython-38.pyc and b/infer/__pycache__/text_to_image.cpython-38.pyc differ
 
infer/__pycache__/utils.cpython-38.pyc CHANGED
Binary files a/infer/__pycache__/utils.cpython-38.pyc and b/infer/__pycache__/utils.cpython-38.pyc differ
 
infer/__pycache__/views_to_mesh.cpython-38.pyc CHANGED
Binary files a/infer/__pycache__/views_to_mesh.cpython-38.pyc and b/infer/__pycache__/views_to_mesh.cpython-38.pyc differ
 
infer/gif_render.py CHANGED
@@ -1,5 +1,7 @@
1
- # Open Source Model Licensed under the Apache License Version 2.0 and Other Licenses of the Third-Party Components therein:
2
- # The below Model in this distribution may have been modified by THL A29 Limited ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
 
 
3
 
4
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
5
  # The below software and/or models in this distribution may have been
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
 
6
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
  # The below software and/or models in this distribution may have been
infer/image_to_views.py CHANGED
@@ -1,5 +1,7 @@
1
- # Open Source Model Licensed under the Apache License Version 2.0 and Other Licenses of the Third-Party Components therein:
2
- # The below Model in this distribution may have been modified by THL A29 Limited ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
 
 
3
 
4
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
5
  # The below software and/or models in this distribution may have been
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
 
6
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
  # The below software and/or models in this distribution may have been
infer/text_to_image.py CHANGED
@@ -1,5 +1,7 @@
1
- # Open Source Model Licensed under the Apache License Version 2.0 and Other Licenses of the Third-Party Components therein:
2
- # The below Model in this distribution may have been modified by THL A29 Limited ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
 
 
3
 
4
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
5
  # The below software and/or models in this distribution may have been
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
 
6
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
  # The below software and/or models in this distribution may have been
infer/utils.py CHANGED
@@ -1,5 +1,7 @@
1
- # Open Source Model Licensed under the Apache License Version 2.0 and Other Licenses of the Third-Party Components therein:
2
- # The below Model in this distribution may have been modified by THL A29 Limited ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
 
 
3
 
4
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
5
  # The below software and/or models in this distribution may have been
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
 
6
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
  # The below software and/or models in this distribution may have been
infer/views_to_mesh.py CHANGED
@@ -1,5 +1,7 @@
1
- # Open Source Model Licensed under the Apache License Version 2.0 and Other Licenses of the Third-Party Components therein:
2
- # The below Model in this distribution may have been modified by THL A29 Limited ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
 
 
3
 
4
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
5
  # The below software and/or models in this distribution may have been
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
 
6
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
  # The below software and/or models in this distribution may have been
main.py CHANGED
@@ -1,5 +1,7 @@
1
- # Open Source Model Licensed under the Apache License Version 2.0 and Other Licenses of the Third-Party Components therein:
2
- # The below Model in this distribution may have been modified by THL A29 Limited ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
 
 
3
 
4
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
5
  # The below software and/or models in this distribution may have been
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
 
6
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
  # The below software and/or models in this distribution may have been
mvd/.ipynb_checkpoints/hunyuan3d_mvd_lite_pipeline-checkpoint.py ADDED
@@ -0,0 +1,392 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
+
6
+ # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
+ # The below software and/or models in this distribution may have been
8
+ # modified by THL A29 Limited ("Tencent Modifications").
9
+ # All Tencent Modifications are Copyright (C) THL A29 Limited.
10
+
11
+ # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
12
+ # except for the third-party components listed below.
13
+ # Hunyuan 3D does not impose any additional limitations beyond what is outlined
14
+ # in the repsective licenses of these third-party components.
15
+ # Users must comply with all terms and conditions of original licenses of these third-party
16
+ # components and must ensure that the usage of the third party components adheres to
17
+ # all relevant laws and regulations.
18
+
19
+ # For avoidance of doubts, Hunyuan 3D means the large language models and
20
+ # their software and algorithms, including trained model weights, parameters (including
21
+ # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
22
+ # fine-tuning enabling code and other elements of the foregoing made publicly available
23
+ # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
24
+
25
+ import math
26
+ import numpy
27
+ import torch
28
+ import inspect
29
+ import warnings
30
+ from PIL import Image
31
+ from einops import rearrange
32
+ import torch.nn.functional as F
33
+ from diffusers.utils.torch_utils import randn_tensor
34
+ from diffusers.configuration_utils import FrozenDict
35
+ from diffusers.image_processor import VaeImageProcessor
36
+ from typing import Any, Callable, Dict, List, Optional, Union
37
+ from diffusers.models import AutoencoderKL, UNet2DConditionModel
38
+ from diffusers.schedulers import KarrasDiffusionSchedulers
39
+ from diffusers.pipelines.pipeline_utils import DiffusionPipeline
40
+ from diffusers.pipelines.stable_diffusion import StableDiffusionPipelineOutput
41
+ from diffusers import DDPMScheduler, EulerAncestralDiscreteScheduler, ImagePipelineOutput
42
+ from diffusers.loaders import (
43
+ FromSingleFileMixin,
44
+ LoraLoaderMixin,
45
+ TextualInversionLoaderMixin
46
+ )
47
+ from transformers import (
48
+ CLIPImageProcessor,
49
+ CLIPTextModel,
50
+ CLIPTokenizer,
51
+ CLIPVisionModelWithProjection
52
+ )
53
+ from diffusers.models.attention_processor import (
54
+ Attention,
55
+ AttnProcessor,
56
+ XFormersAttnProcessor,
57
+ AttnProcessor2_0
58
+ )
59
+
60
+ from .utils import to_rgb_image, white_out_background, recenter_img
61
+
62
+
63
+ EXAMPLE_DOC_STRING = """
64
+ Examples:
65
+ ```py
66
+ >>> import torch
67
+ >>> from here import Hunyuan3d_MVD_Lite_Pipeline
68
+
69
+ >>> pipe = Hunyuan3d_MVD_Lite_Pipeline.from_pretrained(
70
+ ... "weights/mvd_lite", torch_dtype=torch.float16
71
+ ... )
72
+ >>> pipe.to("cuda")
73
+
74
+ >>> img = Image.open("demo.png")
75
+ >>> res_img = pipe(img).images[0]
76
+ """
77
+
78
+ def unscale_latents(latents): return latents / 0.75 + 0.22
79
+ def unscale_image (image ): return image / 0.50 * 0.80
80
+
81
+
82
+ def rescale_noise_cfg(noise_cfg, noise_pred_text, guidance_rescale=0.0):
83
+ std_text = noise_pred_text.std(dim=list(range(1, noise_pred_text.ndim)), keepdim=True)
84
+ std_cfg = noise_cfg.std(dim=list(range(1, noise_cfg.ndim)), keepdim=True)
85
+ noise_pred_rescaled = noise_cfg * (std_text / std_cfg)
86
+ noise_cfg = guidance_rescale * noise_pred_rescaled + (1 - guidance_rescale) * noise_cfg
87
+ return noise_cfg
88
+
89
+
90
+
91
+ class ReferenceOnlyAttnProc(torch.nn.Module):
92
+ # reference attention
93
+ def __init__(self, chained_proc, enabled=False, name=None):
94
+ super().__init__()
95
+ self.enabled = enabled
96
+ self.chained_proc = chained_proc
97
+ self.name = name
98
+
99
+ def __call__(self, attn, hidden_states, encoder_hidden_states=None, attention_mask=None, mode="w", ref_dict=None):
100
+ if encoder_hidden_states is None: encoder_hidden_states = hidden_states
101
+ if self.enabled:
102
+ if mode == 'w':
103
+ ref_dict[self.name] = encoder_hidden_states
104
+ elif mode == 'r':
105
+ encoder_hidden_states = torch.cat([encoder_hidden_states, ref_dict.pop(self.name)], dim=1)
106
+ res = self.chained_proc(attn, hidden_states, encoder_hidden_states, attention_mask)
107
+ return res
108
+
109
+
110
+ class RefOnlyNoisedUNet(torch.nn.Module):
111
+ def __init__(self, unet, train_sched, val_sched):
112
+ super().__init__()
113
+ self.unet = unet
114
+ self.train_sched = train_sched
115
+ self.val_sched = val_sched
116
+
117
+ unet_lora_attn_procs = dict()
118
+ for name, _ in unet.attn_processors.items():
119
+ unet_lora_attn_procs[name] = ReferenceOnlyAttnProc(AttnProcessor2_0(),
120
+ enabled=name.endswith("attn1.processor"),
121
+ name=name)
122
+ unet.set_attn_processor(unet_lora_attn_procs)
123
+
124
+ def __getattr__(self, name: str):
125
+ try:
126
+ return super().__getattr__(name)
127
+ except AttributeError:
128
+ return getattr(self.unet, name)
129
+
130
+ def forward(self, sample, timestep, encoder_hidden_states, *args, cross_attention_kwargs, **kwargs):
131
+ cond_lat = cross_attention_kwargs['cond_lat']
132
+ noise = torch.randn_like(cond_lat)
133
+ if self.training:
134
+ noisy_cond_lat = self.train_sched.add_noise(cond_lat, noise, timestep)
135
+ noisy_cond_lat = self.train_sched.scale_model_input(noisy_cond_lat, timestep)
136
+ else:
137
+ noisy_cond_lat = self.val_sched.add_noise(cond_lat, noise, timestep.reshape(-1))
138
+ noisy_cond_lat = self.val_sched.scale_model_input(noisy_cond_lat, timestep.reshape(-1))
139
+
140
+ ref_dict = {}
141
+ self.unet(noisy_cond_lat,
142
+ timestep,
143
+ encoder_hidden_states,
144
+ *args,
145
+ cross_attention_kwargs=dict(mode="w", ref_dict=ref_dict),
146
+ **kwargs)
147
+ return self.unet(sample,
148
+ timestep,
149
+ encoder_hidden_states,
150
+ *args,
151
+ cross_attention_kwargs=dict(mode="r", ref_dict=ref_dict),
152
+ **kwargs)
153
+
154
+
155
+ class Hunyuan3d_MVD_Lite_Pipeline(DiffusionPipeline, TextualInversionLoaderMixin, LoraLoaderMixin, FromSingleFileMixin):
156
+ def __init__(
157
+ self,
158
+ vae: AutoencoderKL,
159
+ text_encoder: CLIPTextModel,
160
+ tokenizer: CLIPTokenizer,
161
+ unet: UNet2DConditionModel,
162
+ scheduler: KarrasDiffusionSchedulers,
163
+ vision_encoder: CLIPVisionModelWithProjection,
164
+ feature_extractor_clip: CLIPImageProcessor,
165
+ feature_extractor_vae: CLIPImageProcessor,
166
+ ramping_coefficients: Optional[list] = None,
167
+ safety_checker=None,
168
+ ):
169
+ DiffusionPipeline.__init__(self)
170
+ self.register_modules(
171
+ vae=vae,
172
+ unet=unet,
173
+ tokenizer=tokenizer,
174
+ scheduler=scheduler,
175
+ text_encoder=text_encoder,
176
+ vision_encoder=vision_encoder,
177
+ feature_extractor_vae=feature_extractor_vae,
178
+ feature_extractor_clip=feature_extractor_clip
179
+ )
180
+ # rewrite the stable diffusion pipeline
181
+ # vae: vae
182
+ # unet: unet
183
+ # tokenizer: tokenizer
184
+ # scheduler: scheduler
185
+ # text_encoder: text_encoder
186
+ # vision_encoder: vision_encoder
187
+ # feature_extractor_vae: feature_extractor_vae
188
+ # feature_extractor_clip: feature_extractor_clip
189
+ self.register_to_config(ramping_coefficients=ramping_coefficients)
190
+ self.vae_scale_factor = 2 ** (len(self.vae.config.block_out_channels) - 1)
191
+ self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor)
192
+
193
+ def prepare_extra_step_kwargs(self, generator, eta):
194
+ extra_step_kwargs = {}
195
+ accepts_eta = "eta" in set(inspect.signature(self.scheduler.step).parameters.keys())
196
+ if accepts_eta: extra_step_kwargs["eta"] = eta
197
+
198
+ accepts_generator = "generator" in set(inspect.signature(self.scheduler.step).parameters.keys())
199
+ if accepts_generator: extra_step_kwargs["generator"] = generator
200
+ return extra_step_kwargs
201
+
202
+ def prepare_latents(self, batch_size, num_channels_latents, height, width, dtype, device, generator, latents=None):
203
+ shape = (batch_size, num_channels_latents, height // self.vae_scale_factor, width // self.vae_scale_factor)
204
+ latents = randn_tensor(shape, generator=generator, device=device, dtype=dtype)
205
+ latents = latents * self.scheduler.init_noise_sigma
206
+ return latents
207
+
208
+ @torch.no_grad()
209
+ def _encode_prompt(
210
+ self,
211
+ prompt,
212
+ device,
213
+ num_images_per_prompt,
214
+ do_classifier_free_guidance,
215
+ negative_prompt=None,
216
+ prompt_embeds: Optional[torch.FloatTensor] = None,
217
+ negative_prompt_embeds: Optional[torch.FloatTensor] = None,
218
+ lora_scale: Optional[float] = None,
219
+ ):
220
+ if lora_scale is not None and isinstance(self, LoraLoaderMixin):
221
+ self._lora_scale = lora_scale
222
+
223
+ if prompt is not None and isinstance(prompt, str):
224
+ batch_size = 1
225
+ elif prompt is not None and isinstance(prompt, list):
226
+ batch_size = len(prompt)
227
+ else:
228
+ batch_size = prompt_embeds.shape[0]
229
+
230
+ if prompt_embeds is None:
231
+ if isinstance(self, TextualInversionLoaderMixin):
232
+ prompt = self.maybe_convert_prompt(prompt, self.tokenizer)
233
+
234
+ text_inputs = self.tokenizer(
235
+ prompt,
236
+ padding="max_length",
237
+ max_length=self.tokenizer.model_max_length,
238
+ truncation=True,
239
+ return_tensors="pt",
240
+ )
241
+ text_input_ids = text_inputs.input_ids
242
+
243
+ if hasattr(self.text_encoder.config, "use_attention_mask") and self.text_encoder.config.use_attention_mask:
244
+ attention_mask = text_inputs.attention_mask.to(device)
245
+ else:
246
+ attention_mask = None
247
+
248
+ prompt_embeds = self.text_encoder(text_input_ids.to(device), attention_mask=attention_mask)[0]
249
+
250
+ if self.text_encoder is not None:
251
+ prompt_embeds_dtype = self.text_encoder.dtype
252
+ elif self.unet is not None:
253
+ prompt_embeds_dtype = self.unet.dtype
254
+ else:
255
+ prompt_embeds_dtype = prompt_embeds.dtype
256
+
257
+ prompt_embeds = prompt_embeds.to(dtype=prompt_embeds_dtype, device=device)
258
+ bs_embed, seq_len, _ = prompt_embeds.shape
259
+ prompt_embeds = prompt_embeds.repeat(1, num_images_per_prompt, 1)
260
+ prompt_embeds = prompt_embeds.view(bs_embed * num_images_per_prompt, seq_len, -1)
261
+
262
+ if do_classifier_free_guidance and negative_prompt_embeds is None:
263
+ uncond_tokens: List[str]
264
+ if negative_prompt is None: uncond_tokens = [""] * batch_size
265
+ elif prompt is not None and type(prompt) is not type(negative_prompt): raise TypeError()
266
+ elif isinstance(negative_prompt, str): uncond_tokens = [negative_prompt]
267
+ elif batch_size != len(negative_prompt): raise ValueError()
268
+ else: uncond_tokens = negative_prompt
269
+ if isinstance(self, TextualInversionLoaderMixin):
270
+ uncond_tokens = self.maybe_convert_prompt(uncond_tokens, self.tokenizer)
271
+
272
+ max_length = prompt_embeds.shape[1]
273
+ uncond_input = self.tokenizer(uncond_tokens,
274
+ padding="max_length",
275
+ max_length=max_length,
276
+ truncation=True,
277
+ return_tensors="pt")
278
+
279
+ if hasattr(self.text_encoder.config, "use_attention_mask") and self.text_encoder.config.use_attention_mask:
280
+ attention_mask = uncond_input.attention_mask.to(device)
281
+ else:
282
+ attention_mask = None
283
+
284
+ negative_prompt_embeds = self.text_encoder(uncond_input.input_ids.to(device), attention_mask=attention_mask)
285
+ negative_prompt_embeds = negative_prompt_embeds[0]
286
+
287
+ if do_classifier_free_guidance:
288
+ seq_len = negative_prompt_embeds.shape[1]
289
+ negative_prompt_embeds = negative_prompt_embeds.to(dtype=prompt_embeds_dtype, device=device)
290
+ negative_prompt_embeds = negative_prompt_embeds.repeat(1, num_images_per_prompt, 1)
291
+ negative_prompt_embeds = negative_prompt_embeds.view(batch_size * num_images_per_prompt, seq_len, -1)
292
+ prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds])
293
+
294
+ return prompt_embeds
295
+
296
+ @torch.no_grad()
297
+ def encode_condition_image(self, image: torch.Tensor): return self.vae.encode(image).latent_dist.sample()
298
+
299
+ @torch.no_grad()
300
+ def __call__(self, image=None,
301
+ width=640,
302
+ height=960,
303
+ num_inference_steps=75,
304
+ return_dict=True,
305
+ generator=None,
306
+ **kwargs):
307
+ batch_size = 1
308
+ num_images_per_prompt = 1
309
+ output_type = 'pil'
310
+ do_classifier_free_guidance = True
311
+ guidance_rescale = 0.
312
+ if isinstance(self.unet, UNet2DConditionModel):
313
+ self.unet = RefOnlyNoisedUNet(self.unet, None, self.scheduler).eval()
314
+
315
+ cond_image = recenter_img(image)
316
+ cond_image = to_rgb_image(image)
317
+ image = cond_image
318
+ image_1 = self.feature_extractor_vae(images=image, return_tensors="pt").pixel_values
319
+ image_2 = self.feature_extractor_clip(images=image, return_tensors="pt").pixel_values
320
+ image_1 = image_1.to(device=self.vae.device, dtype=self.vae.dtype)
321
+ image_2 = image_2.to(device=self.vae.device, dtype=self.vae.dtype)
322
+
323
+ cond_lat = self.encode_condition_image(image_1)
324
+ negative_lat = self.encode_condition_image(torch.zeros_like(image_1))
325
+ cond_lat = torch.cat([negative_lat, cond_lat])
326
+ cross_attention_kwargs = dict(cond_lat=cond_lat)
327
+
328
+ global_embeds = self.vision_encoder(image_2, output_hidden_states=False).image_embeds.unsqueeze(-2)
329
+ encoder_hidden_states = self._encode_prompt('', self.device, num_images_per_prompt, False)
330
+ ramp = global_embeds.new_tensor(self.config.ramping_coefficients).unsqueeze(-1)
331
+ prompt_embeds = torch.cat([encoder_hidden_states, encoder_hidden_states + global_embeds * ramp])
332
+
333
+ device = self._execution_device
334
+ self.scheduler.set_timesteps(num_inference_steps, device=device)
335
+ timesteps = self.scheduler.timesteps
336
+ num_channels_latents = self.unet.config.in_channels
337
+ latents = self.prepare_latents(batch_size * num_images_per_prompt,
338
+ num_channels_latents,
339
+ height,
340
+ width,
341
+ prompt_embeds.dtype,
342
+ device,
343
+ generator,
344
+ None)
345
+ extra_step_kwargs = self.prepare_extra_step_kwargs(generator, 0.0)
346
+ num_warmup_steps = len(timesteps) - num_inference_steps * self.scheduler.order
347
+
348
+ # set adaptive cfg
349
+ # the image order is:
350
+ # [0, 60,
351
+ # 120, 180,
352
+ # 240, 300]
353
+ # the cfg is set as 3, 2.5, 2, 1.5
354
+
355
+ tmp_guidance_scale = torch.ones_like(latents)
356
+ tmp_guidance_scale[:, :, :40, :40] = 3
357
+ tmp_guidance_scale[:, :, :40, 40:] = 2.5
358
+ tmp_guidance_scale[:, :, 40:80, :40] = 2
359
+ tmp_guidance_scale[:, :, 40:80, 40:] = 1.5
360
+ tmp_guidance_scale[:, :, 80:120, :40] = 2
361
+ tmp_guidance_scale[:, :, 80:120, 40:] = 2.5
362
+
363
+ with self.progress_bar(total=num_inference_steps) as progress_bar:
364
+ for i, t in enumerate(timesteps):
365
+ latent_model_input = torch.cat([latents] * 2) if do_classifier_free_guidance else latents
366
+ latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)
367
+
368
+ noise_pred = self.unet(latent_model_input, t,
369
+ encoder_hidden_states=prompt_embeds,
370
+ cross_attention_kwargs=cross_attention_kwargs,
371
+ return_dict=False)[0]
372
+
373
+ adaptive_guidance_scale = (2 + 16 * (t / 1000) ** 5) / 3
374
+ if do_classifier_free_guidance:
375
+ noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
376
+ noise_pred = noise_pred_uncond + \
377
+ tmp_guidance_scale * adaptive_guidance_scale * \
378
+ (noise_pred_text - noise_pred_uncond)
379
+
380
+ if do_classifier_free_guidance and guidance_rescale > 0.0:
381
+ noise_pred = rescale_noise_cfg(noise_pred, noise_pred_text, guidance_rescale=guidance_rescale)
382
+
383
+ latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs, return_dict=False)[0]
384
+ if i==len(timesteps)-1 or ((i+1)>num_warmup_steps and (i+1)%self.scheduler.order==0):
385
+ progress_bar.update()
386
+
387
+ latents = unscale_latents(latents)
388
+ image = unscale_image(self.vae.decode(latents / self.vae.config.scaling_factor, return_dict=False)[0])
389
+ image = self.image_processor.postprocess(image, output_type='pil')[0]
390
+ image = [image, cond_image]
391
+ return ImagePipelineOutput(images=image) if return_dict else (image,)
392
+
mvd/.ipynb_checkpoints/hunyuan3d_mvd_std_pipeline-checkpoint.py ADDED
@@ -0,0 +1,473 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
+
6
+ # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
+ # The below software and/or models in this distribution may have been
8
+ # modified by THL A29 Limited ("Tencent Modifications").
9
+ # All Tencent Modifications are Copyright (C) THL A29 Limited.
10
+
11
+ # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
12
+ # except for the third-party components listed below.
13
+ # Hunyuan 3D does not impose any additional limitations beyond what is outlined
14
+ # in the repsective licenses of these third-party components.
15
+ # Users must comply with all terms and conditions of original licenses of these third-party
16
+ # components and must ensure that the usage of the third party components adheres to
17
+ # all relevant laws and regulations.
18
+
19
+ # For avoidance of doubts, Hunyuan 3D means the large language models and
20
+ # their software and algorithms, including trained model weights, parameters (including
21
+ # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
22
+ # fine-tuning enabling code and other elements of the foregoing made publicly available
23
+ # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
24
+
25
+ import inspect
26
+ from typing import Any, Dict, Optional
27
+ from typing import Any, Dict, List, Optional, Tuple, Union
28
+
29
+ import os
30
+ import torch
31
+ import numpy as np
32
+ from PIL import Image
33
+
34
+ import diffusers
35
+ from diffusers.image_processor import VaeImageProcessor
36
+ from diffusers.utils.import_utils import is_xformers_available
37
+ from diffusers.schedulers import KarrasDiffusionSchedulers
38
+ from diffusers.utils.torch_utils import randn_tensor
39
+ from diffusers.utils.import_utils import is_xformers_available
40
+ from diffusers.models.attention_processor import (
41
+ Attention,
42
+ AttnProcessor,
43
+ XFormersAttnProcessor,
44
+ AttnProcessor2_0
45
+ )
46
+ from diffusers import (
47
+ AutoencoderKL,
48
+ DDPMScheduler,
49
+ DiffusionPipeline,
50
+ EulerAncestralDiscreteScheduler,
51
+ UNet2DConditionModel,
52
+ ImagePipelineOutput
53
+ )
54
+ import transformers
55
+ from transformers import (
56
+ CLIPImageProcessor,
57
+ CLIPTextModel,
58
+ CLIPTokenizer,
59
+ CLIPVisionModelWithProjection,
60
+ CLIPTextModelWithProjection
61
+ )
62
+
63
+ from .utils import to_rgb_image, white_out_background, recenter_img
64
+
65
+ EXAMPLE_DOC_STRING = """
66
+ Examples:
67
+ ```py
68
+ >>> import torch
69
+ >>> from diffusers import Hunyuan3d_MVD_XL_Pipeline
70
+
71
+ >>> pipe = Hunyuan3d_MVD_XL_Pipeline.from_pretrained(
72
+ ... "Tencent-Hunyuan-3D/MVD-XL", torch_dtype=torch.float16
73
+ ... )
74
+ >>> pipe.to("cuda")
75
+
76
+ >>> img = Image.open("demo.png")
77
+ >>> res_img = pipe(img).images[0]
78
+ ```
79
+ """
80
+
81
+
82
+
83
+ def scale_latents(latents): return (latents - 0.22) * 0.75
84
+ def unscale_latents(latents): return (latents / 0.75) + 0.22
85
+ def scale_image(image): return (image - 0.5) / 0.5
86
+ def scale_image_2(image): return (image * 0.5) / 0.8
87
+ def unscale_image(image): return (image * 0.5) + 0.5
88
+ def unscale_image_2(image): return (image * 0.8) / 0.5
89
+
90
+
91
+
92
+
93
+ class ReferenceOnlyAttnProc(torch.nn.Module):
94
+ def __init__(self, chained_proc, enabled=False, name=None):
95
+ super().__init__()
96
+ self.enabled = enabled
97
+ self.chained_proc = chained_proc
98
+ self.name = name
99
+
100
+ def __call__(self, attn, hidden_states, encoder_hidden_states=None, attention_mask=None, mode="w", ref_dict=None):
101
+ encoder_hidden_states = hidden_states if encoder_hidden_states is None else encoder_hidden_states
102
+ if self.enabled:
103
+ if mode == 'w': ref_dict[self.name] = encoder_hidden_states
104
+ elif mode == 'r': encoder_hidden_states = torch.cat([encoder_hidden_states, ref_dict.pop(self.name)], dim=1)
105
+ else: raise Exception(f"mode should not be {mode}")
106
+ return self.chained_proc(attn, hidden_states, encoder_hidden_states, attention_mask)
107
+
108
+
109
+ class RefOnlyNoisedUNet(torch.nn.Module):
110
+ def __init__(self, unet, scheduler) -> None:
111
+ super().__init__()
112
+ self.unet = unet
113
+ self.scheduler = scheduler
114
+
115
+ unet_attn_procs = dict()
116
+ for name, _ in unet.attn_processors.items():
117
+ if torch.__version__ >= '2.0': default_attn_proc = AttnProcessor2_0()
118
+ elif is_xformers_available(): default_attn_proc = XFormersAttnProcessor()
119
+ else: default_attn_proc = AttnProcessor()
120
+ unet_attn_procs[name] = ReferenceOnlyAttnProc(
121
+ default_attn_proc, enabled=name.endswith("attn1.processor"), name=name
122
+ )
123
+ unet.set_attn_processor(unet_attn_procs)
124
+
125
+ def __getattr__(self, name: str):
126
+ try:
127
+ return super().__getattr__(name)
128
+ except AttributeError:
129
+ return getattr(self.unet, name)
130
+
131
+ def forward(
132
+ self,
133
+ sample: torch.FloatTensor,
134
+ timestep: Union[torch.Tensor, float, int],
135
+ encoder_hidden_states: torch.Tensor,
136
+ cross_attention_kwargs: Optional[Dict[str, Any]] = None,
137
+ class_labels: Optional[torch.Tensor] = None,
138
+ down_block_res_samples: Optional[Tuple[torch.Tensor]] = None,
139
+ mid_block_res_sample: Optional[Tuple[torch.Tensor]] = None,
140
+ added_cond_kwargs: Optional[Dict[str, torch.Tensor]] = None,
141
+ return_dict: bool = True,
142
+ **kwargs
143
+ ):
144
+
145
+ dtype = self.unet.dtype
146
+
147
+ # cond_lat add same level noise
148
+ cond_lat = cross_attention_kwargs['cond_lat']
149
+ noise = torch.randn_like(cond_lat)
150
+
151
+ noisy_cond_lat = self.scheduler.add_noise(cond_lat, noise, timestep.reshape(-1))
152
+ noisy_cond_lat = self.scheduler.scale_model_input(noisy_cond_lat, timestep.reshape(-1))
153
+
154
+ ref_dict = {}
155
+
156
+ _ = self.unet(
157
+ noisy_cond_lat,
158
+ timestep,
159
+ encoder_hidden_states = encoder_hidden_states,
160
+ class_labels = class_labels,
161
+ cross_attention_kwargs = dict(mode="w", ref_dict=ref_dict),
162
+ added_cond_kwargs = added_cond_kwargs,
163
+ return_dict = return_dict,
164
+ **kwargs
165
+ )
166
+
167
+ res = self.unet(
168
+ sample,
169
+ timestep,
170
+ encoder_hidden_states,
171
+ class_labels=class_labels,
172
+ cross_attention_kwargs = dict(mode="r", ref_dict=ref_dict),
173
+ down_block_additional_residuals = [
174
+ sample.to(dtype=dtype) for sample in down_block_res_samples
175
+ ] if down_block_res_samples is not None else None,
176
+ mid_block_additional_residual = (
177
+ mid_block_res_sample.to(dtype=dtype)
178
+ if mid_block_res_sample is not None else None),
179
+ added_cond_kwargs = added_cond_kwargs,
180
+ return_dict = return_dict,
181
+ **kwargs
182
+ )
183
+ return res
184
+
185
+
186
+
187
+ class HunYuan3D_MVD_Std_Pipeline(diffusers.DiffusionPipeline):
188
+ def __init__(
189
+ self,
190
+ vae: AutoencoderKL,
191
+ unet: UNet2DConditionModel,
192
+ scheduler: KarrasDiffusionSchedulers,
193
+ feature_extractor_vae: CLIPImageProcessor,
194
+ vision_processor: CLIPImageProcessor,
195
+ vision_encoder: CLIPVisionModelWithProjection,
196
+ vision_encoder_2: CLIPVisionModelWithProjection,
197
+ ramping_coefficients: Optional[list] = None,
198
+ add_watermarker: Optional[bool] = None,
199
+ safety_checker = None,
200
+ ):
201
+ DiffusionPipeline.__init__(self)
202
+
203
+ self.register_modules(
204
+ vae=vae, unet=unet, scheduler=scheduler, safety_checker=None, feature_extractor_vae=feature_extractor_vae,
205
+ vision_processor=vision_processor, vision_encoder=vision_encoder, vision_encoder_2=vision_encoder_2,
206
+ )
207
+ self.register_to_config( ramping_coefficients = ramping_coefficients)
208
+ self.vae_scale_factor = 2 ** (len(self.vae.config.block_out_channels) - 1)
209
+ self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor)
210
+ self.default_sample_size = self.unet.config.sample_size
211
+ self.watermark = None
212
+ self.prepare_init = False
213
+
214
+ def prepare(self):
215
+ assert isinstance(self.unet, UNet2DConditionModel), "unet should be UNet2DConditionModel"
216
+ self.unet = RefOnlyNoisedUNet(self.unet, self.scheduler).eval()
217
+ self.prepare_init = True
218
+
219
+ def encode_image(self, image: torch.Tensor, scale_factor: bool = False):
220
+ latent = self.vae.encode(image).latent_dist.sample()
221
+ return (latent * self.vae.config.scaling_factor) if scale_factor else latent
222
+
223
+ # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_latents
224
+ def prepare_latents(self, batch_size, num_channels_latents, height, width, dtype, device, generator, latents=None):
225
+ shape = (
226
+ batch_size,
227
+ num_channels_latents,
228
+ int(height) // self.vae_scale_factor,
229
+ int(width) // self.vae_scale_factor,
230
+ )
231
+ if isinstance(generator, list) and len(generator) != batch_size:
232
+ raise ValueError(
233
+ f"You have passed a list of generators of length {len(generator)}, but requested an effective batch"
234
+ f" size of {batch_size}. Make sure the batch size matches the length of the generators."
235
+ )
236
+
237
+ if latents is None:
238
+ latents = randn_tensor(shape, generator=generator, device=device, dtype=dtype)
239
+ else:
240
+ latents = latents.to(device)
241
+
242
+ # scale the initial noise by the standard deviation required by the scheduler
243
+ latents = latents * self.scheduler.init_noise_sigma
244
+ return latents
245
+
246
+ def _get_add_time_ids(
247
+ self, original_size, crops_coords_top_left, target_size, dtype, text_encoder_projection_dim=None
248
+ ):
249
+ add_time_ids = list(original_size + crops_coords_top_left + target_size)
250
+
251
+ passed_add_embed_dim = (
252
+ self.unet.config.addition_time_embed_dim * len(add_time_ids) + text_encoder_projection_dim
253
+ )
254
+ expected_add_embed_dim = self.unet.add_embedding.linear_1.in_features
255
+
256
+ if expected_add_embed_dim != passed_add_embed_dim:
257
+ raise ValueError(
258
+ f"Model expects an added time embedding vector of length {expected_add_embed_dim}, " \
259
+ f"but a vector of {passed_add_embed_dim} was created. The model has an incorrect config." \
260
+ f" Please check `unet.config.time_embedding_type` and `text_encoder_2.config.projection_dim`."
261
+ )
262
+
263
+ add_time_ids = torch.tensor([add_time_ids], dtype=dtype)
264
+ return add_time_ids
265
+
266
+ def prepare_extra_step_kwargs(self, generator, eta):
267
+ # prepare extra kwargs for the scheduler step, since not all schedulers have the same signature
268
+ # eta (η) is only used with the DDIMScheduler, it will be ignored for other schedulers.
269
+ # eta corresponds to η in DDIM paper: https://arxiv.org/abs/2010.02502
270
+ # and should be between [0, 1]
271
+
272
+ accepts_eta = "eta" in set(inspect.signature(self.scheduler.step).parameters.keys())
273
+ extra_step_kwargs = {}
274
+ if accepts_eta: extra_step_kwargs["eta"] = eta
275
+
276
+ # check if the scheduler accepts generator
277
+ accepts_generator = "generator" in set(inspect.signature(self.scheduler.step).parameters.keys())
278
+ if accepts_generator: extra_step_kwargs["generator"] = generator
279
+ return extra_step_kwargs
280
+
281
+ @property
282
+ def guidance_scale(self):
283
+ return self._guidance_scale
284
+
285
+ @property
286
+ def interrupt(self):
287
+ return self._interrupt
288
+
289
+ @property
290
+ def do_classifier_free_guidance(self):
291
+ return self._guidance_scale > 1 and self.unet.config.time_cond_proj_dim is None
292
+
293
+ @torch.no_grad()
294
+ def __call__(
295
+ self,
296
+ image: Image.Image = None,
297
+ guidance_scale = 2.0,
298
+ output_type: Optional[str] = "pil",
299
+ num_inference_steps: int = 50,
300
+ return_dict: bool = True,
301
+ eta: float = 0.0,
302
+ generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
303
+ crops_coords_top_left: Tuple[int, int] = (0, 0),
304
+ cross_attention_kwargs: Optional[Dict[str, Any]] = None,
305
+ latent: torch.Tensor = None,
306
+ guidance_curve = None,
307
+ **kwargs
308
+ ):
309
+ if not self.prepare_init:
310
+ self.prepare()
311
+
312
+ here = dict(device=self.vae.device, dtype=self.vae.dtype)
313
+
314
+ batch_size = 1
315
+ num_images_per_prompt = 1
316
+ width, height = 512 * 2, 512 * 3
317
+ target_size = original_size = (height, width)
318
+
319
+ self._guidance_scale = guidance_scale
320
+ self._cross_attention_kwargs = cross_attention_kwargs
321
+ self._interrupt = False
322
+
323
+ device = self._execution_device
324
+
325
+ # Prepare timesteps
326
+ self.scheduler.set_timesteps(num_inference_steps, device=device)
327
+ timesteps = self.scheduler.timesteps
328
+
329
+ # Prepare latent variables
330
+ num_channels_latents = self.unet.config.in_channels
331
+ latents = self.prepare_latents(
332
+ batch_size * num_images_per_prompt,
333
+ num_channels_latents,
334
+ height,
335
+ width,
336
+ self.vae.dtype,
337
+ device,
338
+ generator,
339
+ latents=latent,
340
+ )
341
+
342
+ # Prepare extra step kwargs. TODO: Logic should ideally just be moved out of the pipeline
343
+ extra_step_kwargs = self.prepare_extra_step_kwargs(generator, eta)
344
+
345
+
346
+ # Prepare added time ids & embeddings
347
+ text_encoder_projection_dim = 1280
348
+ add_time_ids = self._get_add_time_ids(
349
+ original_size,
350
+ crops_coords_top_left,
351
+ target_size,
352
+ dtype=self.vae.dtype,
353
+ text_encoder_projection_dim=text_encoder_projection_dim,
354
+ )
355
+ negative_add_time_ids = add_time_ids
356
+
357
+ # hw: preprocess
358
+ cond_image = recenter_img(image)
359
+ cond_image = to_rgb_image(image)
360
+ image_vae = self.feature_extractor_vae(images=cond_image, return_tensors="pt").pixel_values.to(**here)
361
+ image_clip = self.vision_processor(images=cond_image, return_tensors="pt").pixel_values.to(**here)
362
+
363
+ # hw: get cond_lat from cond_img using vae
364
+ cond_lat = self.encode_image(image_vae, scale_factor=False)
365
+ negative_lat = self.encode_image(torch.zeros_like(image_vae), scale_factor=False)
366
+ cond_lat = torch.cat([negative_lat, cond_lat])
367
+
368
+ # hw: get visual global embedding using clip
369
+ global_embeds_1 = self.vision_encoder(image_clip, output_hidden_states=False).image_embeds.unsqueeze(-2)
370
+ global_embeds_2 = self.vision_encoder_2(image_clip, output_hidden_states=False).image_embeds.unsqueeze(-2)
371
+ global_embeds = torch.concat([global_embeds_1, global_embeds_2], dim=-1)
372
+
373
+ ramp = global_embeds.new_tensor(self.config.ramping_coefficients).unsqueeze(-1)
374
+ prompt_embeds = self.uc_text_emb.to(**here)
375
+ pooled_prompt_embeds = self.uc_text_emb_2.to(**here)
376
+
377
+ prompt_embeds = prompt_embeds + global_embeds * ramp
378
+ add_text_embeds = pooled_prompt_embeds
379
+
380
+ if self.do_classifier_free_guidance:
381
+ negative_prompt_embeds = torch.zeros_like(prompt_embeds)
382
+ negative_pooled_prompt_embeds = torch.zeros_like(pooled_prompt_embeds)
383
+ prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds], dim=0)
384
+ add_text_embeds = torch.cat([negative_pooled_prompt_embeds, add_text_embeds], dim=0)
385
+ add_time_ids = torch.cat([negative_add_time_ids, add_time_ids], dim=0)
386
+
387
+ prompt_embeds = prompt_embeds.to(device)
388
+ add_text_embeds = add_text_embeds.to(device)
389
+ add_time_ids = add_time_ids.to(device).repeat(batch_size * num_images_per_prompt, 1)
390
+
391
+ # Denoising loop
392
+ num_warmup_steps = max(len(timesteps) - num_inference_steps * self.scheduler.order, 0)
393
+ timestep_cond = None
394
+ self._num_timesteps = len(timesteps)
395
+
396
+ if guidance_curve is None:
397
+ guidance_curve = lambda t: guidance_scale
398
+
399
+ with self.progress_bar(total=num_inference_steps) as progress_bar:
400
+ for i, t in enumerate(timesteps):
401
+ if self.interrupt:
402
+ continue
403
+
404
+ # expand the latents if we are doing classifier free guidance
405
+ latent_model_input = torch.cat([latents] * 2) if self.do_classifier_free_guidance else latents
406
+
407
+ latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)
408
+
409
+ # predict the noise residual
410
+ added_cond_kwargs = {"text_embeds": add_text_embeds, "time_ids": add_time_ids}
411
+
412
+ noise_pred = self.unet(
413
+ latent_model_input,
414
+ t,
415
+ encoder_hidden_states=prompt_embeds,
416
+ timestep_cond=timestep_cond,
417
+ cross_attention_kwargs=dict(cond_lat=cond_lat),
418
+ added_cond_kwargs=added_cond_kwargs,
419
+ return_dict=False,
420
+ )[0]
421
+
422
+ # perform guidance
423
+
424
+ # cur_guidance_scale = self.guidance_scale
425
+ cur_guidance_scale = guidance_curve(t) # 1.5 + 2.5 * ((t/1000)**2)
426
+
427
+ if self.do_classifier_free_guidance:
428
+ noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
429
+ noise_pred = noise_pred_uncond + cur_guidance_scale * (noise_pred_text - noise_pred_uncond)
430
+
431
+ # cur_guidance_scale_topleft = (cur_guidance_scale - 1.0) * 4 + 1.0
432
+ # noise_pred_top_left = noise_pred_uncond +
433
+ # cur_guidance_scale_topleft * (noise_pred_text - noise_pred_uncond)
434
+ # _, _, h, w = noise_pred.shape
435
+ # noise_pred[:, :, :h//3, :w//2] = noise_pred_top_left[:, :, :h//3, :w//2]
436
+
437
+ # compute the previous noisy sample x_t -> x_t-1
438
+ latents_dtype = latents.dtype
439
+ latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs, return_dict=False)[0]
440
+
441
+ # call the callback, if provided
442
+ if i == len(timesteps) - 1 or ((i + 1) > num_warmup_steps and (i + 1) % self.scheduler.order == 0):
443
+ progress_bar.update()
444
+
445
+ latents = unscale_latents(latents)
446
+
447
+ if output_type=="latent":
448
+ image = latents
449
+ else:
450
+ image = self.vae.decode(latents / self.vae.config.scaling_factor, return_dict=False)[0]
451
+ image = unscale_image(unscale_image_2(image)).clamp(0, 1)
452
+ image = [
453
+ Image.fromarray((image[0]*255+0.5).clamp_(0, 255).permute(1, 2, 0).cpu().numpy().astype("uint8")),
454
+ # self.image_processor.postprocess(image, output_type=output_type)[0],
455
+ cond_image.resize((512, 512))
456
+ ]
457
+
458
+ if not return_dict: return (image,)
459
+ return ImagePipelineOutput(images=image)
460
+
461
+ def save_pretrained(self, save_directory):
462
+ # uc_text_emb.pt and uc_text_emb_2.pt are inferenced and saved in advance
463
+ super().save_pretrained(save_directory)
464
+ torch.save(self.uc_text_emb, os.path.join(save_directory, "uc_text_emb.pt"))
465
+ torch.save(self.uc_text_emb_2, os.path.join(save_directory, "uc_text_emb_2.pt"))
466
+
467
+ @classmethod
468
+ def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs):
469
+ # uc_text_emb.pt and uc_text_emb_2.pt are inferenced and saved in advance
470
+ pipeline = super().from_pretrained(pretrained_model_name_or_path, *model_args, **kwargs)
471
+ pipeline.uc_text_emb = torch.load(os.path.join(pretrained_model_name_or_path, "uc_text_emb.pt"))
472
+ pipeline.uc_text_emb_2 = torch.load(os.path.join(pretrained_model_name_or_path, "uc_text_emb_2.pt"))
473
+ return pipeline
mvd/.ipynb_checkpoints/utils-checkpoint.py ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
+
6
+ # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
+ # The below software and/or models in this distribution may have been
8
+ # modified by THL A29 Limited ("Tencent Modifications").
9
+ # All Tencent Modifications are Copyright (C) THL A29 Limited.
10
+
11
+ # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
12
+ # except for the third-party components listed below.
13
+ # Hunyuan 3D does not impose any additional limitations beyond what is outlined
14
+ # in the repsective licenses of these third-party components.
15
+ # Users must comply with all terms and conditions of original licenses of these third-party
16
+ # components and must ensure that the usage of the third party components adheres to
17
+ # all relevant laws and regulations.
18
+
19
+ # For avoidance of doubts, Hunyuan 3D means the large language models and
20
+ # their software and algorithms, including trained model weights, parameters (including
21
+ # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
22
+ # fine-tuning enabling code and other elements of the foregoing made publicly available
23
+ # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
24
+
25
+ import numpy as np
26
+ from PIL import Image
27
+
28
+ def to_rgb_image(maybe_rgba: Image.Image):
29
+ '''
30
+ convert a PIL.Image to rgb mode with white background
31
+ maybe_rgba: PIL.Image
32
+ return: PIL.Image
33
+ '''
34
+ if maybe_rgba.mode == 'RGB':
35
+ return maybe_rgba
36
+ elif maybe_rgba.mode == 'RGBA':
37
+ rgba = maybe_rgba
38
+ img = np.random.randint(255, 256, size=[rgba.size[1], rgba.size[0], 3], dtype=np.uint8)
39
+ img = Image.fromarray(img, 'RGB')
40
+ img.paste(rgba, mask=rgba.getchannel('A'))
41
+ return img
42
+ else:
43
+ raise ValueError("Unsupported image type.", maybe_rgba.mode)
44
+
45
+ def white_out_background(pil_img, is_gray_fg=True):
46
+ data = pil_img.getdata()
47
+ new_data = []
48
+ # convert fore-ground white to gray
49
+ for r, g, b, a in data:
50
+ if a < 16:
51
+ new_data.append((255, 255, 255, 0)) # back-ground to be black
52
+ else:
53
+ is_white = is_gray_fg and (r>235) and (g>235) and (b>235)
54
+ new_r = 235 if is_white else r
55
+ new_g = 235 if is_white else g
56
+ new_b = 235 if is_white else b
57
+ new_data.append((new_r, new_g, new_b, a))
58
+ pil_img.putdata(new_data)
59
+ return pil_img
60
+
61
+ def recenter_img(img, size=512, color=(255,255,255)):
62
+ img = white_out_background(img)
63
+ mask = np.array(img)[..., 3]
64
+ image = np.array(img)[..., :3]
65
+
66
+ H, W, C = image.shape
67
+ coords = np.nonzero(mask)
68
+ x_min, x_max = coords[0].min(), coords[0].max()
69
+ y_min, y_max = coords[1].min(), coords[1].max()
70
+ h = x_max - x_min
71
+ w = y_max - y_min
72
+ if h == 0 or w == 0: raise ValueError
73
+ roi = image[x_min:x_max, y_min:y_max]
74
+
75
+ border_ratio = 0.15 # 0.2
76
+ pad_h = int(h * border_ratio)
77
+ pad_w = int(w * border_ratio)
78
+
79
+ result_tmp = np.full((h + pad_h, w + pad_w, C), color, dtype=np.uint8)
80
+ result_tmp[pad_h // 2: pad_h // 2 + h, pad_w // 2: pad_w // 2 + w] = roi
81
+
82
+ cur_h, cur_w = result_tmp.shape[:2]
83
+ side = max(cur_h, cur_w)
84
+ result = np.full((side, side, C), color, dtype=np.uint8)
85
+ result[(side-cur_h)//2:(side-cur_h)//2+cur_h, (side-cur_w)//2:(side - cur_w)//2+cur_w,:] = result_tmp
86
+ result = Image.fromarray(result)
87
+ return result.resize((size, size), Image.LANCZOS) if size else result
mvd/__pycache__/hunyuan3d_mvd_lite_pipeline.cpython-38.pyc CHANGED
Binary files a/mvd/__pycache__/hunyuan3d_mvd_lite_pipeline.cpython-38.pyc and b/mvd/__pycache__/hunyuan3d_mvd_lite_pipeline.cpython-38.pyc differ
 
mvd/__pycache__/hunyuan3d_mvd_std_pipeline.cpython-38.pyc CHANGED
Binary files a/mvd/__pycache__/hunyuan3d_mvd_std_pipeline.cpython-38.pyc and b/mvd/__pycache__/hunyuan3d_mvd_std_pipeline.cpython-38.pyc differ
 
mvd/hunyuan3d_mvd_lite_pipeline.py CHANGED
@@ -1,5 +1,7 @@
1
- # Open Source Model Licensed under the Apache License Version 2.0 and Other Licenses of the Third-Party Components therein:
2
- # The below Model in this distribution may have been modified by THL A29 Limited ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
 
 
3
 
4
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
5
  # The below software and/or models in this distribution may have been
@@ -62,10 +64,10 @@ EXAMPLE_DOC_STRING = """
62
  Examples:
63
  ```py
64
  >>> import torch
65
- >>> from here import Hunyuan3d_MVD_Qing_Pipeline
66
 
67
- >>> pipe = Hunyuan3d_MVD_Qing_Pipeline.from_pretrained(
68
- ... "Tencent-Hunyuan-3D/MVD-Qing", torch_dtype=torch.float16
69
  ... )
70
  >>> pipe.to("cuda")
71
 
@@ -173,18 +175,17 @@ class Hunyuan3d_MVD_Lite_Pipeline(DiffusionPipeline, TextualInversionLoaderMixin
173
  text_encoder=text_encoder,
174
  vision_encoder=vision_encoder,
175
  feature_extractor_vae=feature_extractor_vae,
176
- feature_extractor_clip=feature_extractor_clip)
177
- '''
178
- rewrite the stable diffusion pipeline
179
- vae: vae
180
- unet: unet
181
- tokenizer: tokenizer
182
- scheduler: scheduler
183
- text_encoder: text_encoder
184
- vision_encoder: vision_encoder
185
- feature_extractor_vae: feature_extractor_vae
186
- feature_extractor_clip: feature_extractor_clip
187
- '''
188
  self.register_to_config(ramping_coefficients=ramping_coefficients)
189
  self.vae_scale_factor = 2 ** (len(self.vae.config.block_out_channels) - 1)
190
  self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor)
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
 
6
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
  # The below software and/or models in this distribution may have been
 
64
  Examples:
65
  ```py
66
  >>> import torch
67
+ >>> from here import Hunyuan3d_MVD_Lite_Pipeline
68
 
69
+ >>> pipe = Hunyuan3d_MVD_Lite_Pipeline.from_pretrained(
70
+ ... "weights/mvd_lite", torch_dtype=torch.float16
71
  ... )
72
  >>> pipe.to("cuda")
73
 
 
175
  text_encoder=text_encoder,
176
  vision_encoder=vision_encoder,
177
  feature_extractor_vae=feature_extractor_vae,
178
+ feature_extractor_clip=feature_extractor_clip
179
+ )
180
+ # rewrite the stable diffusion pipeline
181
+ # vae: vae
182
+ # unet: unet
183
+ # tokenizer: tokenizer
184
+ # scheduler: scheduler
185
+ # text_encoder: text_encoder
186
+ # vision_encoder: vision_encoder
187
+ # feature_extractor_vae: feature_extractor_vae
188
+ # feature_extractor_clip: feature_extractor_clip
 
189
  self.register_to_config(ramping_coefficients=ramping_coefficients)
190
  self.vae_scale_factor = 2 ** (len(self.vae.config.block_out_channels) - 1)
191
  self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor)
mvd/hunyuan3d_mvd_std_pipeline.py CHANGED
@@ -1,5 +1,7 @@
1
- # Open Source Model Licensed under the Apache License Version 2.0 and Other Licenses of the Third-Party Components therein:
2
- # The below Model in this distribution may have been modified by THL A29 Limited ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
 
 
3
 
4
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
5
  # The below software and/or models in this distribution may have been
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
 
6
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
  # The below software and/or models in this distribution may have been
mvd/utils.py CHANGED
@@ -1,5 +1,7 @@
1
- # Open Source Model Licensed under the Apache License Version 2.0 and Other Licenses of the Third-Party Components therein:
2
- # The below Model in this distribution may have been modified by THL A29 Limited ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
 
 
3
 
4
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
5
  # The below software and/or models in this distribution may have been
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
 
6
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
  # The below software and/or models in this distribution may have been
svrm/.ipynb_checkpoints/predictor-checkpoint.py ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
+
6
+ # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
+ # The below software and/or models in this distribution may have been
8
+ # modified by THL A29 Limited ("Tencent Modifications").
9
+ # All Tencent Modifications are Copyright (C) THL A29 Limited.
10
+
11
+ # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
12
+ # except for the third-party components listed below.
13
+ # Hunyuan 3D does not impose any additional limitations beyond what is outlined
14
+ # in the repsective licenses of these third-party components.
15
+ # Users must comply with all terms and conditions of original licenses of these third-party
16
+ # components and must ensure that the usage of the third party components adheres to
17
+ # all relevant laws and regulations.
18
+
19
+ # For avoidance of doubts, Hunyuan 3D means the large language models and
20
+ # their software and algorithms, including trained model weights, parameters (including
21
+ # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
22
+ # fine-tuning enabling code and other elements of the foregoing made publicly available
23
+ # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
24
+
25
+ import os
26
+ import math
27
+ import time
28
+ import torch
29
+ import numpy as np
30
+ from tqdm import tqdm
31
+ from PIL import Image, ImageSequence
32
+ from omegaconf import OmegaConf
33
+ from torchvision import transforms
34
+ from safetensors.torch import save_file, load_file
35
+ from .ldm.util import instantiate_from_config
36
+ from .ldm.vis_util import render
37
+
38
+ class MV23DPredictor(object):
39
+ def __init__(self, ckpt_path, cfg_path, elevation=15, number_view=60,
40
+ render_size=256, device="cuda:0") -> None:
41
+ self.device = device
42
+ self.elevation = elevation
43
+ self.number_view = number_view
44
+ self.render_size = render_size
45
+
46
+ self.elevation_list = [0, 0, 0, 0, 0, 0, 0]
47
+ self.azimuth_list = [0, 60, 120, 180, 240, 300, 0]
48
+
49
+ st = time.time()
50
+ self.model = self.init_model(ckpt_path, cfg_path)
51
+ print(f"=====> mv23d model init time: {time.time() - st}")
52
+
53
+ self.input_view_transform = transforms.Compose([
54
+ transforms.Resize(504, interpolation=Image.BICUBIC),
55
+ transforms.ToTensor(),
56
+ ])
57
+ self.final_input_view_transform = transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
58
+
59
+ def init_model(self, ckpt_path, cfg_path):
60
+ config = OmegaConf.load(cfg_path)
61
+ model = instantiate_from_config(config.model)
62
+
63
+ weights = load_file("./weights/svrm/svrm.safetensors")
64
+ model.load_state_dict(weights)
65
+
66
+ model.to(self.device)
67
+ model = model.eval()
68
+ model.render.half()
69
+ print(f'Load model successfully')
70
+ return model
71
+
72
+ def create_camera_to_world_matrix(self, elevation, azimuth, cam_dis=1.5):
73
+ # elevation azimuth are radians
74
+ # Convert elevation and azimuth angles to Cartesian coordinates on a unit sphere
75
+ x = np.cos(elevation) * np.cos(azimuth)
76
+ y = np.cos(elevation) * np.sin(azimuth)
77
+ z = np.sin(elevation)
78
+
79
+ # Calculate camera position, target, and up vectors
80
+ camera_pos = np.array([x, y, z]) * cam_dis
81
+ target = np.array([0, 0, 0])
82
+ up = np.array([0, 0, 1])
83
+
84
+ # Construct view matrix
85
+ forward = target - camera_pos
86
+ forward /= np.linalg.norm(forward)
87
+ right = np.cross(forward, up)
88
+ right /= np.linalg.norm(right)
89
+ new_up = np.cross(right, forward)
90
+ new_up /= np.linalg.norm(new_up)
91
+ cam2world = np.eye(4)
92
+ cam2world[:3, :3] = np.array([right, new_up, -forward]).T
93
+ cam2world[:3, 3] = camera_pos
94
+ return cam2world
95
+
96
+ def refine_mask(self, mask, k=16):
97
+ mask /= 255.0
98
+ boder_mask = (mask >= -math.pi / 2.0 / k + 0.5) & (mask <= math.pi / 2.0 / k + 0.5)
99
+ mask[boder_mask] = 0.5 * np.sin(k * (mask[boder_mask] - 0.5)) + 0.5
100
+ mask[mask < -math.pi / 2.0 / k + 0.5] = 0.0
101
+ mask[mask > math.pi / 2.0 / k + 0.5] = 1.0
102
+ return (mask * 255.0).astype(np.uint8)
103
+
104
+ def load_images_and_cameras(self, input_imgs, elevation_list, azimuth_list):
105
+ input_image_list = []
106
+ input_cam_list = []
107
+ for input_view_image, elevation, azimuth in zip(input_imgs, elevation_list, azimuth_list):
108
+ input_view_image = self.input_view_transform(input_view_image)
109
+ input_image_list.append(input_view_image)
110
+
111
+ input_view_cam_pos = self.create_camera_to_world_matrix(np.radians(elevation), np.radians(azimuth))
112
+ input_view_cam_intrinsic = np.array([35. / 32, 35. /32, 0.5, 0.5])
113
+ input_view_cam = torch.from_numpy(
114
+ np.concatenate([input_view_cam_pos.reshape(-1), input_view_cam_intrinsic], 0)
115
+ ).float()
116
+ input_cam_list.append(input_view_cam)
117
+
118
+ pixels_input = torch.stack(input_image_list, dim=0)
119
+ input_images = self.final_input_view_transform(pixels_input)
120
+ input_cams = torch.stack(input_cam_list, dim=0)
121
+ return input_images, input_cams
122
+
123
+ def load_data(self, intput_imgs):
124
+ assert (6+1) == len(intput_imgs)
125
+
126
+ input_images, input_cams = self.load_images_and_cameras(intput_imgs, self.elevation_list, self.azimuth_list)
127
+ input_cams[-1, :] = 0 # for user input view
128
+
129
+ data = {}
130
+ data["input_view"] = input_images.unsqueeze(0).to(self.device) # 1 4 3 512 512
131
+ data["input_view_cam"] = input_cams.unsqueeze(0).to(self.device) # 1 4 20
132
+ return data
133
+
134
+ @torch.no_grad()
135
+ def predict(
136
+ self,
137
+ intput_imgs,
138
+ save_dir = "outputs/",
139
+ image_input = None,
140
+ target_face_count = 10000,
141
+ do_texture_mapping = True,
142
+ ):
143
+ os.makedirs(save_dir, exist_ok=True)
144
+ print(save_dir)
145
+
146
+ with torch.cuda.amp.autocast():
147
+ self.model.export_mesh_with_uv(
148
+ data = self.load_data(intput_imgs),
149
+ out_dir = save_dir,
150
+ target_face_count = target_face_count,
151
+ do_texture_mapping = do_texture_mapping
152
+ )
svrm/__pycache__/predictor.cpython-38.pyc CHANGED
Binary files a/svrm/__pycache__/predictor.cpython-38.pyc and b/svrm/__pycache__/predictor.cpython-38.pyc differ
 
svrm/ldm/.ipynb_checkpoints/util-checkpoint.py ADDED
@@ -0,0 +1,252 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import importlib
3
+ from inspect import isfunction
4
+ import cv2
5
+ import time
6
+ import numpy as np
7
+ from PIL import Image, ImageDraw, ImageFont
8
+ import matplotlib.pyplot as plt
9
+ import torch
10
+ from torch import optim
11
+ import torchvision
12
+
13
+
14
+ def pil_rectangle_crop(im):
15
+ width, height = im.size # Get dimensions
16
+
17
+ if width <= height:
18
+ left = 0
19
+ right = width
20
+ top = (height - width)/2
21
+ bottom = (height + width)/2
22
+ else:
23
+
24
+ top = 0
25
+ bottom = height
26
+ left = (width - height) / 2
27
+ bottom = (width + height) / 2
28
+
29
+ # Crop the center of the image
30
+ im = im.crop((left, top, right, bottom))
31
+ return im
32
+
33
+
34
+ def add_margin(pil_img, color, size=256):
35
+ width, height = pil_img.size
36
+ result = Image.new(pil_img.mode, (size, size), color)
37
+ result.paste(pil_img, ((size - width) // 2, (size - height) // 2))
38
+ return result
39
+
40
+
41
+ def load_and_preprocess(interface, input_im):
42
+ '''
43
+ :param input_im (PIL Image).
44
+ :return image (H, W, 3) array in [0, 1].
45
+ '''
46
+ # See https://github.com/Ir1d/image-background-remove-tool
47
+ image = input_im.convert('RGB')
48
+
49
+ image_without_background = interface([image])[0]
50
+ image_without_background = np.array(image_without_background)
51
+ est_seg = image_without_background > 127
52
+ image = np.array(image)
53
+ foreground = est_seg[:, : , -1].astype(np.bool_)
54
+ image[~foreground] = [255., 255., 255.]
55
+ x, y, w, h = cv2.boundingRect(foreground.astype(np.uint8))
56
+ image = image[y:y+h, x:x+w, :]
57
+ image = Image.fromarray(np.array(image))
58
+
59
+ # resize image such that long edge is 512
60
+ image.thumbnail([200, 200], Image.Resampling.LANCZOS)
61
+ image = add_margin(image, (255, 255, 255), size=256)
62
+ image = np.array(image)
63
+ return image
64
+
65
+
66
+ def log_txt_as_img(wh, xc, size=10):
67
+ # wh a tuple of (width, height)
68
+ # xc a list of captions to plot
69
+ b = len(xc)
70
+ txts = list()
71
+ for bi in range(b):
72
+ txt = Image.new("RGB", wh, color="white")
73
+ draw = ImageDraw.Draw(txt)
74
+ font = ImageFont.truetype('data/DejaVuSans.ttf', size=size)
75
+ nc = int(40 * (wh[0] / 256))
76
+ lines = "\n".join(xc[bi][start:start + nc] for start in range(0, len(xc[bi]), nc))
77
+
78
+ try:
79
+ draw.text((0, 0), lines, fill="black", font=font)
80
+ except UnicodeEncodeError:
81
+ print("Cant encode string for logging. Skipping.")
82
+
83
+ txt = np.array(txt).transpose(2, 0, 1) / 127.5 - 1.0
84
+ txts.append(txt)
85
+ txts = np.stack(txts)
86
+ txts = torch.tensor(txts)
87
+ return txts
88
+
89
+
90
+ def ismap(x):
91
+ if not isinstance(x, torch.Tensor):
92
+ return False
93
+ return (len(x.shape) == 4) and (x.shape[1] > 3)
94
+
95
+
96
+ def isimage(x):
97
+ if not isinstance(x,torch.Tensor):
98
+ return False
99
+ return (len(x.shape) == 4) and (x.shape[1] == 3 or x.shape[1] == 1)
100
+
101
+
102
+ def exists(x):
103
+ return x is not None
104
+
105
+
106
+ def default(val, d):
107
+ if exists(val):
108
+ return val
109
+ return d() if isfunction(d) else d
110
+
111
+
112
+ def mean_flat(tensor):
113
+ """
114
+ https://github.com/openai/guided-diffusion/blob/27c20a8fab9cb472df5d6bdd6c8d11c8f430b924/guided_diffusion/nn.py#L86
115
+ Take the mean over all non-batch dimensions.
116
+ """
117
+ return tensor.mean(dim=list(range(1, len(tensor.shape))))
118
+
119
+
120
+ def count_params(model, verbose=False):
121
+ total_params = sum(p.numel() for p in model.parameters())
122
+ if verbose:
123
+ print(f"{model.__class__.__name__} has {total_params*1.e-6:.2f} M params.")
124
+ return total_params
125
+
126
+
127
+ def instantiate_from_config(config):
128
+ if not "target" in config:
129
+ if config == '__is_first_stage__':
130
+ return None
131
+ elif config == "__is_unconditional__":
132
+ return None
133
+ raise KeyError("Expected key `target` to instantiate.")
134
+ return get_obj_from_str(config["target"])(**config.get("params", dict()))
135
+
136
+
137
+ def get_obj_from_str(string, reload=False):
138
+ module, cls = string.rsplit(".", 1)
139
+ if reload:
140
+ module_imp = importlib.import_module(module)
141
+ importlib.reload(module_imp)
142
+ return getattr(importlib.import_module(module, package=None), cls)
143
+
144
+
145
+ class AdamWwithEMAandWings(optim.Optimizer):
146
+ # credit to https://gist.github.com/crowsonkb/65f7265353f403714fce3b2595e0b298
147
+ def __init__(self, params, lr=1.e-3, betas=(0.9, 0.999), eps=1.e-8, # TODO: check hyperparameters before using
148
+ weight_decay=1.e-2, amsgrad=False, ema_decay=0.9999, # ema decay to match previous code
149
+ ema_power=1., param_names=()):
150
+ """AdamW that saves EMA versions of the parameters."""
151
+ if not 0.0 <= lr:
152
+ raise ValueError("Invalid learning rate: {}".format(lr))
153
+ if not 0.0 <= eps:
154
+ raise ValueError("Invalid epsilon value: {}".format(eps))
155
+ if not 0.0 <= betas[0] < 1.0:
156
+ raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0]))
157
+ if not 0.0 <= betas[1] < 1.0:
158
+ raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1]))
159
+ if not 0.0 <= weight_decay:
160
+ raise ValueError("Invalid weight_decay value: {}".format(weight_decay))
161
+ if not 0.0 <= ema_decay <= 1.0:
162
+ raise ValueError("Invalid ema_decay value: {}".format(ema_decay))
163
+ defaults = dict(lr=lr, betas=betas, eps=eps,
164
+ weight_decay=weight_decay, amsgrad=amsgrad, ema_decay=ema_decay,
165
+ ema_power=ema_power, param_names=param_names)
166
+ super().__init__(params, defaults)
167
+
168
+ def __setstate__(self, state):
169
+ super().__setstate__(state)
170
+ for group in self.param_groups:
171
+ group.setdefault('amsgrad', False)
172
+
173
+ @torch.no_grad()
174
+ def step(self, closure=None):
175
+ """Performs a single optimization step.
176
+ Args:
177
+ closure (callable, optional): A closure that reevaluates the model
178
+ and returns the loss.
179
+ """
180
+ loss = None
181
+ if closure is not None:
182
+ with torch.enable_grad():
183
+ loss = closure()
184
+
185
+ for group in self.param_groups:
186
+ params_with_grad = []
187
+ grads = []
188
+ exp_avgs = []
189
+ exp_avg_sqs = []
190
+ ema_params_with_grad = []
191
+ state_sums = []
192
+ max_exp_avg_sqs = []
193
+ state_steps = []
194
+ amsgrad = group['amsgrad']
195
+ beta1, beta2 = group['betas']
196
+ ema_decay = group['ema_decay']
197
+ ema_power = group['ema_power']
198
+
199
+ for p in group['params']:
200
+ if p.grad is None:
201
+ continue
202
+ params_with_grad.append(p)
203
+ if p.grad.is_sparse:
204
+ raise RuntimeError('AdamW does not support sparse gradients')
205
+ grads.append(p.grad)
206
+
207
+ state = self.state[p]
208
+
209
+ # State initialization
210
+ if len(state) == 0:
211
+ state['step'] = 0
212
+ # Exponential moving average of gradient values
213
+ state['exp_avg'] = torch.zeros_like(p, memory_format=torch.preserve_format)
214
+ # Exponential moving average of squared gradient values
215
+ state['exp_avg_sq'] = torch.zeros_like(p, memory_format=torch.preserve_format)
216
+ if amsgrad:
217
+ # Maintains max of all exp. moving avg. of sq. grad. values
218
+ state['max_exp_avg_sq'] = torch.zeros_like(p, memory_format=torch.preserve_format)
219
+ # Exponential moving average of parameter values
220
+ state['param_exp_avg'] = p.detach().float().clone()
221
+
222
+ exp_avgs.append(state['exp_avg'])
223
+ exp_avg_sqs.append(state['exp_avg_sq'])
224
+ ema_params_with_grad.append(state['param_exp_avg'])
225
+
226
+ if amsgrad:
227
+ max_exp_avg_sqs.append(state['max_exp_avg_sq'])
228
+
229
+ # update the steps for each param group update
230
+ state['step'] += 1
231
+ # record the step after step update
232
+ state_steps.append(state['step'])
233
+
234
+ optim._functional.adamw(params_with_grad,
235
+ grads,
236
+ exp_avgs,
237
+ exp_avg_sqs,
238
+ max_exp_avg_sqs,
239
+ state_steps,
240
+ amsgrad=amsgrad,
241
+ beta1=beta1,
242
+ beta2=beta2,
243
+ lr=group['lr'],
244
+ weight_decay=group['weight_decay'],
245
+ eps=group['eps'],
246
+ maximize=False)
247
+
248
+ cur_ema_decay = min(ema_decay, 1 - state['step'] ** -ema_power)
249
+ for param, ema_param in zip(params_with_grad, ema_params_with_grad):
250
+ ema_param.mul_(cur_ema_decay).add_(param.float(), alpha=1 - cur_ema_decay)
251
+
252
+ return loss
svrm/ldm/models/.ipynb_checkpoints/svrm-checkpoint.py ADDED
@@ -0,0 +1,281 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
+
6
+ # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
+ # The below software and/or models in this distribution may have been
8
+ # modified by THL A29 Limited ("Tencent Modifications").
9
+ # All Tencent Modifications are Copyright (C) THL A29 Limited.
10
+
11
+ # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
12
+ # except for the third-party components listed below.
13
+ # Hunyuan 3D does not impose any additional limitations beyond what is outlined
14
+ # in the repsective licenses of these third-party components.
15
+ # Users must comply with all terms and conditions of original licenses of these third-party
16
+ # components and must ensure that the usage of the third party components adheres to
17
+ # all relevant laws and regulations.
18
+
19
+ # For avoidance of doubts, Hunyuan 3D means the large language models and
20
+ # their software and algorithms, including trained model weights, parameters (including
21
+ # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
22
+ # fine-tuning enabling code and other elements of the foregoing made publicly available
23
+ # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
24
+
25
+ import os
26
+ import time
27
+ import math
28
+ import cv2
29
+ import numpy as np
30
+ import itertools
31
+ import shutil
32
+ from tqdm import tqdm
33
+ import torch
34
+ import torch.nn.functional as F
35
+ from einops import rearrange
36
+ try:
37
+ import trimesh
38
+ import mcubes
39
+ import xatlas
40
+ import open3d as o3d
41
+ except:
42
+ raise "failed to import 3d libraries "
43
+
44
+ from ..modules.rendering_neus.mesh import Mesh
45
+ from ..modules.rendering_neus.rasterize import NVDiffRasterizerContext
46
+
47
+ from ..utils.ops import scale_tensor
48
+ from ..util import count_params, instantiate_from_config
49
+ from ..vis_util import render
50
+
51
+
52
+ def unwrap_uv(v_pos, t_pos_idx):
53
+ print("Using xatlas to perform UV unwrapping, may take a while ...")
54
+ atlas = xatlas.Atlas()
55
+ atlas.add_mesh(v_pos, t_pos_idx)
56
+ atlas.generate(xatlas.ChartOptions(), xatlas.PackOptions())
57
+ _, indices, uvs = atlas.get_mesh(0)
58
+ indices = indices.astype(np.int64, casting="same_kind")
59
+ return uvs, indices
60
+
61
+
62
+ def uv_padding(image, hole_mask, uv_padding_size = 2):
63
+ return cv2.inpaint(
64
+ (image.detach().cpu().numpy() * 255).astype(np.uint8),
65
+ (hole_mask.detach().cpu().numpy() * 255).astype(np.uint8),
66
+ uv_padding_size,
67
+ cv2.INPAINT_TELEA
68
+ )
69
+
70
+ def refine_mesh(vtx_refine, faces_refine):
71
+ mesh = o3d.geometry.TriangleMesh(
72
+ vertices=o3d.utility.Vector3dVector(vtx_refine),
73
+ triangles=o3d.utility.Vector3iVector(faces_refine)
74
+ )
75
+
76
+ mesh = mesh.remove_unreferenced_vertices()
77
+ mesh = mesh.remove_duplicated_triangles()
78
+ mesh = mesh.remove_duplicated_vertices()
79
+
80
+ voxel_size = max(mesh.get_max_bound() - mesh.get_min_bound())
81
+
82
+ mesh = mesh.simplify_vertex_clustering(
83
+ voxel_size=0.007, # 0.005
84
+ contraction=o3d.geometry.SimplificationContraction.Average)
85
+
86
+ mesh = mesh.filter_smooth_simple(number_of_iterations=2)
87
+
88
+ vtx_refine = np.asarray(mesh.vertices).astype(np.float32)
89
+ faces_refine = np.asarray(mesh.triangles)
90
+ return vtx_refine, faces_refine, mesh
91
+
92
+
93
+ class SVRMModel(torch.nn.Module):
94
+ def __init__(
95
+ self,
96
+ img_encoder_config,
97
+ img_to_triplane_config,
98
+ render_config,
99
+ device = "cuda:0",
100
+ **kwargs
101
+ ):
102
+ super(SVRMModel, self).__init__()
103
+ self.img_encoder = instantiate_from_config(img_encoder_config).half()
104
+ self.img_to_triplane_decoder = instantiate_from_config(img_to_triplane_config).half()
105
+ self.render = instantiate_from_config(render_config).half()
106
+ self.device = device
107
+ count_params(self, verbose=True)
108
+
109
+
110
+ @torch.no_grad()
111
+ def export_mesh_with_uv(
112
+ self,
113
+ data,
114
+ mesh_size: int = 384,
115
+ ctx = None,
116
+ context_type = 'cuda',
117
+ texture_res = 1024,
118
+ target_face_count = 10000,
119
+ do_texture_mapping = True,
120
+ out_dir = 'outputs/test'
121
+ ):
122
+ """
123
+ color_type: 0 for ray texture, 1 for vertices texture
124
+ """
125
+
126
+ obj_vertext_path = os.path.join(out_dir, 'mesh_with_colors.obj')
127
+ obj_path = os.path.join(out_dir, 'mesh.obj')
128
+ obj_texture_path = os.path.join(out_dir, 'texture.png')
129
+ obj_mtl_path = os.path.join(out_dir, 'texture.mtl')
130
+ glb_path = os.path.join(out_dir, 'mesh.glb')
131
+
132
+ st = time.time()
133
+
134
+ here = {'device': self.device, 'dtype': torch.float16}
135
+ input_view_image = data["input_view"].to(**here) # [b, m, c, h, w]
136
+ input_view_cam = data["input_view_cam"].to(**here) # [b, m, 20]
137
+
138
+ batch_size, input_view_num, *_ = input_view_image.shape
139
+ assert batch_size == 1, "batch size should be 1"
140
+
141
+ input_view_image = rearrange(input_view_image, 'b m c h w -> (b m) c h w')
142
+ input_view_cam = rearrange(input_view_cam, 'b m d -> (b m) d')
143
+ input_view_feat = self.img_encoder(input_view_image, input_view_cam)
144
+ input_view_feat = rearrange(input_view_feat, '(b m) l d -> b (l m) d', m=input_view_num)
145
+
146
+ # -- decoder
147
+ torch.cuda.empty_cache()
148
+ triplane_gen = self.img_to_triplane_decoder(input_view_feat) # [b, 3, tri_dim, h, w]
149
+ del input_view_feat
150
+ torch.cuda.empty_cache()
151
+
152
+ # --- triplane nerf render
153
+
154
+ cur_triplane = triplane_gen[0:1]
155
+
156
+ aabb = torch.tensor([[-0.6, -0.6, -0.6], [0.6, 0.6, 0.6]]).unsqueeze(0).to(**here)
157
+ grid_out = self.render.forward_grid(planes=cur_triplane, grid_size=mesh_size, aabb=aabb)
158
+
159
+ print(f"=====> Triplane forward time: {time.time() - st}")
160
+ st = time.time()
161
+
162
+ vtx, faces = mcubes.marching_cubes(0. - grid_out['sdf'].squeeze(0).squeeze(-1).cpu().float().numpy(), 0)
163
+
164
+ bbox = aabb[0].cpu().numpy()
165
+ vtx = vtx / (mesh_size - 1)
166
+ vtx = vtx * (bbox[1] - bbox[0]) + bbox[0]
167
+
168
+ # refine mesh
169
+ vtx_refine, faces_refine, mesh = refine_mesh(vtx, faces)
170
+
171
+ # reduce faces
172
+ if faces_refine.shape[0] > target_face_count:
173
+ print(f"reduce face: {faces_refine.shape[0]} -> {target_face_count}")
174
+ mesh = o3d.geometry.TriangleMesh(
175
+ vertices = o3d.utility.Vector3dVector(vtx_refine),
176
+ triangles = o3d.utility.Vector3iVector(faces_refine)
177
+ )
178
+
179
+ # Function to simplify mesh using Quadric Error Metric Decimation by Garland and Heckbert
180
+ mesh = mesh.simplify_quadric_decimation(target_face_count, boundary_weight=1.0)
181
+
182
+ mesh = Mesh(
183
+ v_pos = torch.from_numpy(np.asarray(mesh.vertices)).to(self.device),
184
+ t_pos_idx = torch.from_numpy(np.asarray(mesh.triangles)).to(self.device),
185
+ v_rgb = torch.from_numpy(np.asarray(mesh.vertex_colors)).to(self.device)
186
+ )
187
+ vtx_refine = mesh.v_pos.cpu().numpy()
188
+ faces_refine = mesh.t_pos_idx.cpu().numpy()
189
+
190
+ vtx_colors = self.render.forward_points(cur_triplane, torch.tensor(vtx_refine).unsqueeze(0).to(**here))
191
+ vtx_colors = vtx_colors['rgb'].float().squeeze(0).cpu().numpy()
192
+
193
+ color_ratio = 0.8 # increase brightness
194
+ with open(obj_vertext_path, 'w') as fid:
195
+ verts = vtx_refine[:, [1,2,0]]
196
+ for pidx, pp in enumerate(verts):
197
+ color = vtx_colors[pidx]
198
+ color = [color[0]**color_ratio, color[1]**color_ratio, color[2]**color_ratio]
199
+ fid.write('v %f %f %f %f %f %f\n' % (pp[0], pp[1], pp[2], color[0], color[1], color[2]))
200
+ for i, f in enumerate(faces_refine):
201
+ f1 = f + 1
202
+ fid.write('f %d %d %d\n' % (f1[0], f1[1], f1[2]))
203
+
204
+ mesh = trimesh.load_mesh(obj_vertext_path)
205
+ print(f"=====> generate mesh with vertex shading time: {time.time() - st}")
206
+ st = time.time()
207
+
208
+ if not do_texture_mapping:
209
+ shutil.copy(obj_vertext_path, obj_path)
210
+ mesh.export(glb_path, file_type='glb')
211
+ return None
212
+
213
+
214
+ ########## export texture ########
215
+
216
+
217
+ st = time.time()
218
+
219
+ # uv unwrap
220
+ vtx_tex, t_tex_idx = unwrap_uv(vtx_refine, faces_refine)
221
+ vtx_refine = torch.from_numpy(vtx_refine).to(self.device)
222
+ faces_refine = torch.from_numpy(faces_refine).to(self.device)
223
+ t_tex_idx = torch.from_numpy(t_tex_idx).to(self.device)
224
+ uv_clip = torch.from_numpy(vtx_tex * 2.0 - 1.0).to(self.device)
225
+
226
+ # rasterize
227
+ ctx = NVDiffRasterizerContext(context_type, cur_triplane.device) if ctx is None else ctx
228
+ rast = ctx.rasterize_one(
229
+ torch.cat([
230
+ uv_clip,
231
+ torch.zeros_like(uv_clip[..., 0:1]),
232
+ torch.ones_like(uv_clip[..., 0:1])
233
+ ], dim=-1),
234
+ t_tex_idx,
235
+ (texture_res, texture_res)
236
+ )[0]
237
+ hole_mask = ~(rast[:, :, 3] > 0)
238
+
239
+ # Interpolate world space position
240
+ gb_pos = ctx.interpolate_one(vtx_refine, rast[None, ...], faces_refine)[0][0]
241
+
242
+ with torch.no_grad():
243
+ gb_mask_pos_scale = scale_tensor(gb_pos.unsqueeze(0).view(1, -1, 3), (-1, 1), (-1, 1))
244
+
245
+ tex_map = self.render.forward_points(cur_triplane, gb_mask_pos_scale)['rgb']
246
+
247
+ tex_map = tex_map.float().squeeze(0) # (0, 1)
248
+ tex_map = tex_map.view((texture_res, texture_res, 3))
249
+ img = uv_padding(tex_map, hole_mask)
250
+ img = ((img/255.0) ** color_ratio) * 255 # increase brightness
251
+ img = img.clip(0, 255).astype(np.uint8)
252
+
253
+ verts = vtx_refine.cpu().numpy()[:, [1,2,0]]
254
+ faces = faces_refine.cpu().numpy()
255
+
256
+ with open(obj_mtl_path, 'w') as fid:
257
+ fid.write('newmtl material_0\n')
258
+ fid.write("Ka 1.000 1.000 1.000\n")
259
+ fid.write("Kd 1.000 1.000 1.000\n")
260
+ fid.write("Ks 0.000 0.000 0.000\n")
261
+ fid.write("d 1.0\n")
262
+ fid.write("illum 2\n")
263
+ fid.write(f'map_Kd texture.png\n')
264
+
265
+ with open(obj_path, 'w') as fid:
266
+ fid.write(f'mtllib texture.mtl\n')
267
+ for pidx, pp in enumerate(verts):
268
+ fid.write('v %f %f %f\n' % (pp[0], pp[1], pp[2]))
269
+ for pidx, pp in enumerate(vtx_tex):
270
+ fid.write('vt %f %f\n' % (pp[0], 1 - pp[1]))
271
+ fid.write('usemtl material_0\n')
272
+ for i, f in enumerate(faces):
273
+ f1 = f + 1
274
+ f2 = t_tex_idx[i] + 1
275
+ fid.write('f %d/%d %d/%d %d/%d\n' % (f1[0], f2[0], f1[1], f2[1], f1[2], f2[2],))
276
+
277
+ cv2.imwrite(obj_texture_path, img[..., [2, 1, 0]])
278
+ mesh = trimesh.load_mesh(obj_path)
279
+ mesh.export(glb_path, file_type='glb')
280
+ print(f"=====> generate mesh with texture shading time: {time.time() - st}")
281
+
svrm/ldm/models/__pycache__/svrm.cpython-38.pyc CHANGED
Binary files a/svrm/ldm/models/__pycache__/svrm.cpython-38.pyc and b/svrm/ldm/models/__pycache__/svrm.cpython-38.pyc differ
 
svrm/ldm/models/svrm.py CHANGED
@@ -1,5 +1,7 @@
1
- # Open Source Model Licensed under the Apache License Version 2.0 and Other Licenses of the Third-Party Components therein:
2
- # The below Model in this distribution may have been modified by THL A29 Limited ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
 
 
3
 
4
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
5
  # The below software and/or models in this distribution may have been
@@ -68,7 +70,8 @@ def uv_padding(image, hole_mask, uv_padding_size = 2):
68
  def refine_mesh(vtx_refine, faces_refine):
69
  mesh = o3d.geometry.TriangleMesh(
70
  vertices=o3d.utility.Vector3dVector(vtx_refine),
71
- triangles=o3d.utility.Vector3iVector(faces_refine))
 
72
 
73
  mesh = mesh.remove_unreferenced_vertices()
74
  mesh = mesh.remove_duplicated_triangles()
@@ -235,9 +238,12 @@ class SVRMModel(torch.nn.Module):
235
 
236
  # Interpolate world space position
237
  gb_pos = ctx.interpolate_one(vtx_refine, rast[None, ...], faces_refine)[0][0]
 
238
  with torch.no_grad():
239
  gb_mask_pos_scale = scale_tensor(gb_pos.unsqueeze(0).view(1, -1, 3), (-1, 1), (-1, 1))
 
240
  tex_map = self.render.forward_points(cur_triplane, gb_mask_pos_scale)['rgb']
 
241
  tex_map = tex_map.float().squeeze(0) # (0, 1)
242
  tex_map = tex_map.view((texture_res, texture_res, 3))
243
  img = uv_padding(tex_map, hole_mask)
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
 
6
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
  # The below software and/or models in this distribution may have been
 
70
  def refine_mesh(vtx_refine, faces_refine):
71
  mesh = o3d.geometry.TriangleMesh(
72
  vertices=o3d.utility.Vector3dVector(vtx_refine),
73
+ triangles=o3d.utility.Vector3iVector(faces_refine)
74
+ )
75
 
76
  mesh = mesh.remove_unreferenced_vertices()
77
  mesh = mesh.remove_duplicated_triangles()
 
238
 
239
  # Interpolate world space position
240
  gb_pos = ctx.interpolate_one(vtx_refine, rast[None, ...], faces_refine)[0][0]
241
+
242
  with torch.no_grad():
243
  gb_mask_pos_scale = scale_tensor(gb_pos.unsqueeze(0).view(1, -1, 3), (-1, 1), (-1, 1))
244
+
245
  tex_map = self.render.forward_points(cur_triplane, gb_mask_pos_scale)['rgb']
246
+
247
  tex_map = tex_map.float().squeeze(0) # (0, 1)
248
  tex_map = tex_map.view((texture_res, texture_res, 3))
249
  img = uv_padding(tex_map, hole_mask)
svrm/predictor.py CHANGED
@@ -1,5 +1,7 @@
1
- # Open Source Model Licensed under the Apache License Version 2.0 and Other Licenses of the Third-Party Components therein:
2
- # The below Model in this distribution may have been modified by THL A29 Limited ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
 
 
3
 
4
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
5
  # The below software and/or models in this distribution may have been
 
1
+ # Open Source Model Licensed under the Apache License Version 2.0
2
+ # and Other Licenses of the Third-Party Components therein:
3
+ # The below Model in this distribution may have been modified by THL A29 Limited
4
+ # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
 
6
  # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
  # The below software and/or models in this distribution may have been