multimodalart's picture
Upload 2025 files
22a452a verified
|
raw
history blame
7.77 kB

Shap-E

[[open-in-colab]]

Shap-E๋Š” ๋น„๋””์˜ค ๊ฒŒ์ž„ ๊ฐœ๋ฐœ, ์ธํ…Œ๋ฆฌ์–ด ๋””์ž์ธ, ๊ฑด์ถ•์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” 3D ์—์…‹์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•œ conditional ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ๋Œ€๊ทœ๋ชจ 3D ์—์…‹ ๋ฐ์ดํ„ฐ์…‹์„ ํ•™์Šต๋˜์—ˆ๊ณ , ๊ฐ ์˜ค๋ธŒ์ ํŠธ์˜ ๋” ๋งŽ์€ ๋ทฐ๋ฅผ ๋ Œ๋”๋งํ•˜๊ณ  4K point cloud ๋Œ€์‹  16K๋ฅผ ์ƒ์„ฑํ•˜๋„๋ก ํ›„์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. Shap-E ๋ชจ๋ธ์€ ๋‘ ๋‹จ๊ณ„๋กœ ํ•™์Šต๋ฉ๋‹ˆ๋‹ค:

  1. ์ธ์ฝ”๋”๊ฐ€ 3D ์—์…‹์˜ ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ์™€ ๋ Œ๋”๋ง๋œ ๋ทฐ๋ฅผ ๋ฐ›์•„๋“ค์ด๊ณ  ์—์…‹์„ ๋‚˜ํƒ€๋‚ด๋Š” implicit functions์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
  2. ์ธ์ฝ”๋”๊ฐ€ ์ƒ์„ฑํ•œ latents๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ diffusion ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜์—ฌ neural radiance fields(NeRF) ๋˜๋Š” textured 3D ๋ฉ”์‹œ๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ๋‹ค์šด์ŠคํŠธ๋ฆผ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ 3D ์—์…‹์„ ๋” ์‰ฝ๊ฒŒ ๋ Œ๋”๋งํ•˜๊ณ  ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” Shap-E๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‚˜๋งŒ์˜ 3D ์—์…‹์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์ž…๋‹ˆ๋‹ค!

์‹œ์ž‘ํ•˜๊ธฐ ์ „์— ๋‹ค์Œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”:

# Colab์—์„œ ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•˜๊ธฐ ์œ„ํ•ด ์ฃผ์„์„ ์ œ์™ธํ•˜์„ธ์š”
#!pip install -q diffusers transformers accelerate trimesh

Text-to-3D

3D ๊ฐ์ฒด์˜ gif๋ฅผ ์ƒ์„ฑํ•˜๋ ค๋ฉด ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ๋ฅผ [ShapEPipeline]์— ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. ํŒŒ์ดํ”„๋ผ์ธ์€ 3D ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ์ด๋ฏธ์ง€ ํ”„๋ ˆ์ž„ ๋ฆฌ์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

import torch
from diffusers import ShapEPipeline

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

pipe = ShapEPipeline.from_pretrained("openai/shap-e", torch_dtype=torch.float16, variant="fp16")
pipe = pipe.to(device)

guidance_scale = 15.0
prompt = ["A firecracker", "A birthday cupcake"]

images = pipe(
    prompt,
    guidance_scale=guidance_scale,
    num_inference_steps=64,
    frame_size=256,
).images

์ด์ œ [~utils.export_to_gif] ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ ํ”„๋ ˆ์ž„ ๋ฆฌ์ŠคํŠธ๋ฅผ 3D ๊ฐ์ฒด์˜ gif๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

from diffusers.utils import export_to_gif

export_to_gif(images[0], "firecracker_3d.gif")
export_to_gif(images[1], "cake_3d.gif")
prompt = "A firecracker"
prompt = "A birthday cupcake"

Image-to-3D

๋‹ค๋ฅธ ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ 3D ๊ฐœ์ฒด๋ฅผ ์ƒ์„ฑํ•˜๋ ค๋ฉด [ShapEImg2ImgPipeline]์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ์™„์ „ํžˆ ์ƒˆ๋กœ์šด ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Kandinsky 2.1 ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ƒˆ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

from diffusers import DiffusionPipeline
import torch

prior_pipeline = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16, use_safetensors=True).to("cuda")
pipeline = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16, use_safetensors=True).to("cuda")

prompt = "A cheeseburger, white background"

image_embeds, negative_image_embeds = prior_pipeline(prompt, guidance_scale=1.0).to_tuple()
image = pipeline(
    prompt,
    image_embeds=image_embeds,
    negative_image_embeds=negative_image_embeds,
).images[0]

image.save("burger.png")

์น˜์ฆˆ๋ฒ„๊ฑฐ๋ฅผ [ShapEImg2ImgPipeline]์— ์ „๋‹ฌํ•˜์—ฌ 3D representation์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

from PIL import Image
from diffusers import ShapEImg2ImgPipeline
from diffusers.utils import export_to_gif

pipe = ShapEImg2ImgPipeline.from_pretrained("openai/shap-e-img2img", torch_dtype=torch.float16, variant="fp16").to("cuda")

guidance_scale = 3.0
image = Image.open("burger.png").resize((256, 256))

images = pipe(
    image,
    guidance_scale=guidance_scale,
    num_inference_steps=64,
    frame_size=256,
).images

gif_path = export_to_gif(images[0], "burger_3d.gif")
cheeseburger
3D cheeseburger

๋ฉ”์‹œ ์ƒ์„ฑํ•˜๊ธฐ

Shap-E๋Š” ๋‹ค์šด์ŠคํŠธ๋ฆผ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ๋ Œ๋”๋งํ•  textured ๋ฉ”์‹œ ์ถœ๋ ฅ์„ ์ƒ์„ฑํ•  ์ˆ˜๋„ ์žˆ๋Š” ์œ ์—ฐํ•œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์ด ์˜ˆ์ œ์—์„œ๋Š” ๐Ÿค— Datasets ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ Dataset viewer๋ฅผ ์‚ฌ์šฉํ•ด ๋ฉ”์‹œ ์‹œ๊ฐํ™”๋ฅผ ์ง€์›ํ•˜๋Š” glb ํŒŒ์ผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

output_type ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ "mesh"๋กœ ์ง€์ •ํ•จ์œผ๋กœ์จ [ShapEPipeline]๊ณผ [ShapEImg2ImgPipeline] ๋ชจ๋‘์— ๋Œ€ํ•œ ๋ฉ”์‹œ ์ถœ๋ ฅ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

import torch
from diffusers import ShapEPipeline

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

pipe = ShapEPipeline.from_pretrained("openai/shap-e", torch_dtype=torch.float16, variant="fp16")
pipe = pipe.to(device)

guidance_scale = 15.0
prompt = "A birthday cupcake"

images = pipe(prompt, guidance_scale=guidance_scale, num_inference_steps=64, frame_size=256, output_type="mesh").images

๋ฉ”์‹œ ์ถœ๋ ฅ์„ ply ํŒŒ์ผ๋กœ ์ €์žฅํ•˜๋ ค๋ฉด [~utils.export_to_ply] ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:

์„ ํƒ์ ์œผ๋กœ [~utils.export_to_obj] ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฉ”์‹œ ์ถœ๋ ฅ์„ obj ํŒŒ์ผ๋กœ ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ํ˜•์‹์œผ๋กœ ๋ฉ”์‹œ ์ถœ๋ ฅ์„ ์ €์žฅํ•  ์ˆ˜ ์žˆ์–ด ๋‹ค์šด์ŠคํŠธ๋ฆผ์—์„œ ๋”์šฑ ์œ ์—ฐํ•˜๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!

from diffusers.utils import export_to_ply

ply_path = export_to_ply(images[0], "3d_cake.ply")
print(f"Saved to folder: {ply_path}")

๊ทธ ๋‹ค์Œ trimesh ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ply ํŒŒ์ผ์„ glb ํŒŒ์ผ๋กœ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

import trimesh

mesh = trimesh.load("3d_cake.ply")
mesh_export = mesh.export("3d_cake.glb", file_type="glb")

๊ธฐ๋ณธ์ ์œผ๋กœ ๋ฉ”์‹œ ์ถœ๋ ฅ์€ ์•„๋ž˜์ชฝ ์‹œ์ ์— ์ดˆ์ ์ด ๋งž์ถฐ์ ธ ์žˆ์ง€๋งŒ ํšŒ์ „ ๋ณ€ํ™˜์„ ์ ์šฉํ•˜์—ฌ ๊ธฐ๋ณธ ์‹œ์ ์„ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

import trimesh
import numpy as np

mesh = trimesh.load("3d_cake.ply")
rot = trimesh.transformations.rotation_matrix(-np.pi / 2, [1, 0, 0])
mesh = mesh.apply_transform(rot)
mesh_export = mesh.export("3d_cake.glb", file_type="glb")

๋ฉ”์‹œ ํŒŒ์ผ์„ ๋ฐ์ดํ„ฐ์…‹ ๋ ˆํฌ์ง€ํ† ๋ฆฌ์— ์—…๋กœ๋“œํ•ด Dataset viewer๋กœ ์‹œ๊ฐํ™”ํ•˜์„ธ์š”!