metadata

license: apache-2.0
pipeline_tag: image-to-3d
library_name: 3dtopia-xl
tags:
  - text-to-3d
  - image-to-3d

3DTopia-XL

This repo contains the pretrained weights for 3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion.

Project Page | Arxiv | Weights | Code

Introduction

3DTopia-XL scales high-quality 3D asset generation using Diffusion Transformer (DiT) built upon an expressive and efficient 3D representation, PrimX. The denoising process takes 5 seconds to generate a 3D PBR asset from text/image input which is ready for the graphics pipeline to use.

Model Details

The model is trained on a ~256K subset of Objaverse. For more details, please refer to our paper.

Usage

To download the model:

from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(repo_id="frozenburning/3DTopia-XL", filename="model_sview_dit_fp16.pt")
vae_ckpt_path = hf_hub_download(repo_id="frozenburning/3DTopia-XL", filename="model_vae_fp16.pt")

Please refer to our repo for more details on loading and inference.

Citation

@article{chen2024primx,
  title={3DTopia-XL: High-Quality 3D PBR Asset Generation via Primitive Diffusion},
  author={Chen, Zhaoxi and Tang, Jiaxiang and Dong, Yuhao and Cao, Ziang and Hong, Fangzhou and Lan, Yushi and Wang, Tengfei and Xie, Haozhe and Wu, Tong and Saito, Shunsuke and Pan, Liang and Lin, Dahua and Liu, Ziwei},
  journal={arXiv preprint arXiv:2409.12957},
  year={2024}
}