toshas's picture
initial commit
a45988a

A newer version of the Gradio SDK is available: 5.23.3

Upgrade

Shap-E

[[open-in-colab]]

Shap-EλŠ” λΉ„λ””μ˜€ κ²Œμž„ 개발, μΈν…Œλ¦¬μ–΄ λ””μžμΈ, 건좕에 μ‚¬μš©ν•  수 μžˆλŠ” 3D 에셋을 μƒμ„±ν•˜κΈ° μœ„ν•œ conditional λͺ¨λΈμž…λ‹ˆλ‹€. λŒ€κ·œλͺ¨ 3D 에셋 데이터셋을 ν•™μŠ΅λ˜μ—ˆκ³ , 각 였브젝트의 더 λ§Žμ€ λ·°λ₯Ό λ Œλ”λ§ν•˜κ³  4K point cloud λŒ€μ‹  16Kλ₯Ό μƒμ„±ν•˜λ„λ‘ ν›„μ²˜λ¦¬ν•©λ‹ˆλ‹€. Shap-E λͺ¨λΈμ€ 두 λ‹¨κ³„λ‘œ ν•™μŠ΅λ©λ‹ˆλ‹€:

  1. 인코더가 3D μ—μ…‹μ˜ 포인트 ν΄λΌμš°λ“œμ™€ λ Œλ”λ§λœ λ·°λ₯Ό 받아듀이고 에셋을 λ‚˜νƒ€λ‚΄λŠ” implicit functions의 νŒŒλΌλ―Έν„°λ₯Ό 좜λ ₯ν•©λ‹ˆλ‹€.
  2. 인코더가 μƒμ„±ν•œ latentsλ₯Ό λ°”νƒ•μœΌλ‘œ diffusion λͺ¨λΈμ„ ν›ˆλ ¨ν•˜μ—¬ neural radiance fields(NeRF) λ˜λŠ” textured 3D λ©”μ‹œλ₯Ό μƒμ„±ν•˜μ—¬ λ‹€μš΄μŠ€νŠΈλ¦Ό μ• ν”Œλ¦¬μΌ€μ΄μ…˜μ—μ„œ 3D 에셋을 더 μ‰½κ²Œ λ Œλ”λ§ν•˜κ³  μ‚¬μš©ν•  수 μžˆλ„λ‘ ν•©λ‹ˆλ‹€.

이 κ°€μ΄λ“œμ—μ„œλŠ” Shap-Eλ₯Ό μ‚¬μš©ν•˜μ—¬ λ‚˜λ§Œμ˜ 3D 에셋을 μƒμ„±ν•˜λŠ” 방법을 λ³΄μž…λ‹ˆλ‹€!

μ‹œμž‘ν•˜κΈ° 전에 λ‹€μŒ λΌμ΄λΈŒλŸ¬λ¦¬κ°€ μ„€μΉ˜λ˜μ–΄ μžˆλŠ”μ§€ ν™•μΈν•˜μ„Έμš”:

# Colabμ—μ„œ ν•„μš”ν•œ 라이브러리λ₯Ό μ„€μΉ˜ν•˜κΈ° μœ„ν•΄ 주석을 μ œμ™Έν•˜μ„Έμš”
#!pip install -q diffusers transformers accelerate trimesh

Text-to-3D

3D 객체의 gifλ₯Ό μƒμ„±ν•˜λ €λ©΄ ν…μŠ€νŠΈ ν”„λ‘¬ν”„νŠΈλ₯Ό [ShapEPipeline]에 μ „λ‹¬ν•©λ‹ˆλ‹€. νŒŒμ΄ν”„λΌμΈμ€ 3D 객체λ₯Ό μƒμ„±ν•˜λŠ” 데 μ‚¬μš©λ˜λŠ” 이미지 ν”„λ ˆμž„ 리슀트λ₯Ό μƒμ„±ν•©λ‹ˆλ‹€.

import torch
from diffusers import ShapEPipeline

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

pipe = ShapEPipeline.from_pretrained("openai/shap-e", torch_dtype=torch.float16, variant="fp16")
pipe = pipe.to(device)

guidance_scale = 15.0
prompt = ["A firecracker", "A birthday cupcake"]

images = pipe(
    prompt,
    guidance_scale=guidance_scale,
    num_inference_steps=64,
    frame_size=256,
).images

이제 [~utils.export_to_gif] ν•¨μˆ˜λ₯Ό μ‚¬μš©ν•˜μ—¬ 이미지 ν”„λ ˆμž„ 리슀트λ₯Ό 3D 객체의 gif둜 λ³€ν™˜ν•©λ‹ˆλ‹€.

from diffusers.utils import export_to_gif

export_to_gif(images[0], "firecracker_3d.gif")
export_to_gif(images[1], "cake_3d.gif")
prompt = "A firecracker"
prompt = "A birthday cupcake"

Image-to-3D

λ‹€λ₯Έ μ΄λ―Έμ§€λ‘œλΆ€ν„° 3D 개체λ₯Ό μƒμ„±ν•˜λ €λ©΄ [ShapEImg2ImgPipeline]을 μ‚¬μš©ν•©λ‹ˆλ‹€. κΈ°μ‘΄ 이미지λ₯Ό μ‚¬μš©ν•˜κ±°λ‚˜ μ™„μ „νžˆ μƒˆλ‘œμš΄ 이미지λ₯Ό 생성할 수 μžˆμŠ΅λ‹ˆλ‹€. Kandinsky 2.1 λͺ¨λΈμ„ μ‚¬μš©ν•˜μ—¬ μƒˆ 이미지λ₯Ό 생성해 λ³΄κ² μŠ΅λ‹ˆλ‹€.

from diffusers import DiffusionPipeline
import torch

prior_pipeline = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16, use_safetensors=True).to("cuda")
pipeline = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16, use_safetensors=True).to("cuda")

prompt = "A cheeseburger, white background"

image_embeds, negative_image_embeds = prior_pipeline(prompt, guidance_scale=1.0).to_tuple()
image = pipeline(
    prompt,
    image_embeds=image_embeds,
    negative_image_embeds=negative_image_embeds,
).images[0]

image.save("burger.png")

μΉ˜μ¦ˆλ²„κ±°λ₯Ό [ShapEImg2ImgPipeline]에 μ „λ‹¬ν•˜μ—¬ 3D representation을 μƒμ„±ν•©λ‹ˆλ‹€.

from PIL import Image
from diffusers import ShapEImg2ImgPipeline
from diffusers.utils import export_to_gif

pipe = ShapEImg2ImgPipeline.from_pretrained("openai/shap-e-img2img", torch_dtype=torch.float16, variant="fp16").to("cuda")

guidance_scale = 3.0
image = Image.open("burger.png").resize((256, 256))

images = pipe(
    image,
    guidance_scale=guidance_scale,
    num_inference_steps=64,
    frame_size=256,
).images

gif_path = export_to_gif(images[0], "burger_3d.gif")
cheeseburger
3D cheeseburger

λ©”μ‹œ μƒμ„±ν•˜κΈ°

Shap-EλŠ” λ‹€μš΄μŠ€νŠΈλ¦Ό μ• ν”Œλ¦¬μΌ€μ΄μ…˜μ— λ Œλ”λ§ν•  textured λ©”μ‹œ 좜λ ₯을 생성할 μˆ˜λ„ μžˆλŠ” μœ μ—°ν•œ λͺ¨λΈμž…λ‹ˆλ‹€. 이 μ˜ˆμ œμ—μ„œλŠ” πŸ€— Datasets λΌμ΄λΈŒλŸ¬λ¦¬μ—μ„œ Dataset viewerλ₯Ό μ‚¬μš©ν•΄ λ©”μ‹œ μ‹œκ°ν™”λ₯Ό μ§€μ›ν•˜λŠ” glb 파일둜 λ³€ν™˜ν•©λ‹ˆλ‹€.

output_type λ§€κ°œλ³€μˆ˜λ₯Ό "mesh"둜 μ§€μ •ν•¨μœΌλ‘œμ¨ [ShapEPipeline]κ³Ό [ShapEImg2ImgPipeline] λͺ¨λ‘μ— λŒ€ν•œ λ©”μ‹œ 좜λ ₯을 생성할 수 μžˆμŠ΅λ‹ˆλ‹€:

import torch
from diffusers import ShapEPipeline

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

pipe = ShapEPipeline.from_pretrained("openai/shap-e", torch_dtype=torch.float16, variant="fp16")
pipe = pipe.to(device)

guidance_scale = 15.0
prompt = "A birthday cupcake"

images = pipe(prompt, guidance_scale=guidance_scale, num_inference_steps=64, frame_size=256, output_type="mesh").images

λ©”μ‹œ 좜λ ₯을 ply 파일둜 μ €μž₯ν•˜λ €λ©΄ [~utils.export_to_ply] ν•¨μˆ˜λ₯Ό μ‚¬μš©ν•©λ‹ˆλ‹€:

μ„ νƒμ μœΌλ‘œ [~utils.export_to_obj] ν•¨μˆ˜λ₯Ό μ‚¬μš©ν•˜μ—¬ λ©”μ‹œ 좜λ ₯을 obj 파일둜 μ €μž₯ν•  수 μžˆμŠ΅λ‹ˆλ‹€. λ‹€μ–‘ν•œ ν˜•μ‹μœΌλ‘œ λ©”μ‹œ 좜λ ₯을 μ €μž₯ν•  수 μžˆμ–΄ λ‹€μš΄μŠ€νŠΈλ¦Όμ—μ„œ λ”μš± μœ μ—°ν•˜κ²Œ μ‚¬μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€!

from diffusers.utils import export_to_ply

ply_path = export_to_ply(images[0], "3d_cake.ply")
print(f"Saved to folder: {ply_path}")

κ·Έ λ‹€μŒ trimesh 라이브러리λ₯Ό μ‚¬μš©ν•˜μ—¬ ply νŒŒμΌμ„ glb 파일둜 λ³€ν™˜ν•  수 μžˆμŠ΅λ‹ˆλ‹€:

import trimesh

mesh = trimesh.load("3d_cake.ply")
mesh_export = mesh.export("3d_cake.glb", file_type="glb")

기본적으둜 λ©”μ‹œ 좜λ ₯은 μ•„λž˜μͺ½ μ‹œμ μ— 초점이 맞좰져 μžˆμ§€λ§Œ νšŒμ „ λ³€ν™˜μ„ μ μš©ν•˜μ—¬ κΈ°λ³Έ μ‹œμ μ„ λ³€κ²½ν•  수 μžˆμŠ΅λ‹ˆλ‹€:

import trimesh
import numpy as np

mesh = trimesh.load("3d_cake.ply")
rot = trimesh.transformations.rotation_matrix(-np.pi / 2, [1, 0, 0])
mesh = mesh.apply_transform(rot)
mesh_export = mesh.export("3d_cake.glb", file_type="glb")

λ©”μ‹œ νŒŒμΌμ„ 데이터셋 λ ˆν¬μ§€ν† λ¦¬μ— μ—…λ‘œλ“œν•΄ Dataset viewer둜 μ‹œκ°ν™”ν•˜μ„Έμš”!