Point-E: A System for Generating 3D Point Clouds from Complex Prompts
Abstract
While recent work on text-conditional <PRE_TAG>3D object <PRE_TAG>generation</POST_TAG></POST_TAG> has shown promising results, the state-of-the-art methods typically require multiple <PRE_TAG>GPU-hours</POST_TAG> to produce a single sample. This is in stark contrast to state-of-the-art generative image models, which produce samples in a number of seconds or minutes. In this paper, we explore an alternative method for 3D object generation which produces 3D models in only 1-2 minutes on a single GPU. Our method first generates a single synthetic view using a text-to-image diffusion model, and then produces a 3D point cloud using a second diffusion model which conditions on the generated image. While our method still falls short of the state-of-the-art in terms of sample quality, it is one to two orders of magnitude faster to sample from, offering a practical trade-off for some use cases. We release our pre-trained <PRE_TAG><PRE_TAG>point cloud diffusion model</POST_TAG>s</POST_TAG>, as well as evaluation code and models, at https://github.com/openai/point-e.
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 8
Collections including this paper 0
No Collection including this paper