English
wala
unconditional-3d-generation
Hooman commited on
Commit
0d56025
·
verified ·
1 Parent(s): 448daca

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +124 -5
README.md CHANGED
@@ -1,5 +1,124 @@
1
- ---
2
- license: other
3
- license_name: autodesk-non-commercial-3d-generative-v1.0
4
- license_link: LICENSE
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: other
5
+ license_name: autodesk-non-commercial-3d-generative-v1.0
6
+ tags:
7
+ - wala
8
+ - unconditional-3d-generation
9
+ ---
10
+
11
+ # Model Card for WaLa-UN-1B
12
+
13
+ This model is part of the Wavelet Latent Diffusion (WaLa) paper, capable of generating high-quality 3D shapes unconditionally with detailed geometry and complex structures.
14
+
15
+ ## Model Details
16
+
17
+ ### Model Description
18
+
19
+ WaLa-UN-1B is a large-scale 3D generative model trained on a massive dataset of over 10 million publicly-available 3D shapes. It can efficiently generate a wide range of high-quality 3D shapes unconditionally in just 2-4 seconds. The model uses a wavelet-based compact latent encoding and a billion-parameter architecture to achieve superior performance in terms of geometric detail and structural plausibility.
20
+
21
+ - **Developed by:** Aditya Sanghi, Aliasghar Khani, Chinthala Pradyumna Reddy, Arianna Rampini, Derek Cheung, Kamal Rahimi Malekshan, Kanika Madan, Hooman Shayani
22
+ - **Model type:** 3D Generative Model
23
+ - **License:** Autodesk Non-Commercial (3D Generative) v1.0
24
+
25
+ For more information please look at the [Project](TBD) [Page](TBD) and [the paper](TBD).
26
+
27
+ ### Model Sources
28
+
29
+ - **Repository:** [Github](https://github.com/AutodeskAILab/WaLa)
30
+ - **Paper:** [ArXiv:TBD](TBD)
31
+ - **Demo:** [TBD](TBD)
32
+
33
+ ## Uses
34
+
35
+ ### Direct Use
36
+
37
+ This model is released by Autodesk and intended for academic and research purposes only for the theoretical exploration and demonstration of the WaLa 3D generative framework. Please see [here](TBD) for inferencing instructions.
38
+
39
+ ### Out-of-Scope Use
40
+
41
+ The model should not be used for:
42
+
43
+ - Commercial purposes
44
+
45
+ - Creation of load-bearing physical objects the failure of which could cause property damage or personal injury
46
+
47
+ - Any usage not in compliance with the [license](https://huggingface.co/ADSKAILab/WaLa-UN-1B/blob/main/LICENSE.md), in particular, the "Acceptable Use" section.
48
+
49
+ ## Bias, Risks, and Limitations
50
+
51
+ ### Bias
52
+
53
+ - The model may inherit biases present in the publicly-available training datasets, which could lead to uneven representation of certain object types or styles.
54
+
55
+ - The model's performance may vary across different object categories, potentially favoring those that are more prevalent in the training data.
56
+
57
+ ### Risks and Limitations
58
+
59
+ - As an unconditional model, there's no direct control over the type or characteristics of the generated shapes.
60
+ - The model may occasionally generate implausible or unrealistic shapes. Even theoretically plausible shapes should not be relied upon for real-world structural soundness.
61
+
62
+ ## How to Get Started with the Model
63
+
64
+ Please refer to the instructions [here](TBD)
65
+
66
+ ## Training Details
67
+
68
+ ### Training Data
69
+
70
+ The model was trained on a dataset of over 10 million 3D shapes aggregated from 19 different publicly-available sub-datasets, including ModelNet, ShapeNet, SMLP, Thingi10K, SMAL, COMA, House3D, ABC, Fusion 360, 3D-FUTURE, BuildingNet, DeformingThings4D, FG3D, Toys4K, ABO, Infinigen, Objaverse, and two subsets of ObjaverseXL (Thingiverse and GitHub).
71
+
72
+ ### Training Procedure
73
+
74
+ #### Preprocessing
75
+
76
+ Each 3D shape in the dataset was converted into a truncated signed distance function (TSDF) with a resolution of 256³. The TSDF was then decomposed using a discrete wavelet transform to create the wavelet-tree representation used by the model.
77
+
78
+ #### Training Hyperparameters
79
+
80
+ - **Training regime:** Please refer to the paper.
81
+
82
+ #### Speeds, Sizes, Times
83
+
84
+ - The model contains approximately 1.1 billion parameters.
85
+ - The model can generate shapes within 2-4 seconds.
86
+
87
+ ## Evaluation
88
+
89
+ ### Testing Data, Factors & Metrics
90
+
91
+ #### Testing Data
92
+
93
+ The model was evaluated on the Google Scanned Objects (GSO) dataset and a validation set from the training data (MAS validation data).
94
+
95
+ #### Factors
96
+
97
+ The evaluation considered various factors such as the quality of generated shapes, the ability to capture fine details and complex structures, and the model's performance across different object categories.
98
+
99
+ #### Metrics
100
+
101
+ The model was evaluated using the following metrics:
102
+ - Intersection over Union (IoU)
103
+ - Light Field Distance (LFD)
104
+ - Chamfer Distance (CD)
105
+
106
+ ### Results
107
+
108
+ [Specific results for the unconditional model to be added when available]
109
+
110
+ ## Technical Specifications
111
+
112
+ ### Model Architecture and Objective
113
+
114
+ The model uses a U-ViT architecture with modifications. It employs a wavelet-based compact latent encoding to effectively capture both coarse and fine details of 3D shapes. Unlike the conditional models, this unconditional model does not use any input conditions and generates 3D shapes solely based on random noise inputs.
115
+
116
+ ### Compute Infrastructure
117
+
118
+ #### Hardware
119
+
120
+ [TBD]
121
+
122
+ ## Citation
123
+
124
+ [Citation information to be added after paper publication]