File size: 6,533 Bytes
c803147
0be5f99
 
c803147
0be5f99
 
10401fb
0be5f99
10401fb
0be5f99
10401fb
 
0be5f99
10401fb
 
0be5f99
10401fb
0be5f99
 
 
 
 
 
 
c803147
 
0be5f99
 
 
0b7644a
 
0be5f99
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
357d016
0be5f99
 
 
 
 
 
 
 
 
 
 
0b7644a
357d016
 
0b7644a
 
 
 
909d679
 
 
 
0b7644a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
909d679
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
---
base_model: THUDM/CogVideoX-5b
datasets: finetrainers/cakeify-smol
library_name: diffusers
license: other
license_link: https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE
instance_prompt: PIKA_CAKEIFY A red tea cup is placed on a wooden surface. Suddenly, a knife appears and slices through the cup, revealing a cake inside. The cake turns into a hyper-realistic prop cake, showcasing the creative transformation of everyday objects into something unexpected and delightful.
widget:
- text: PIKA_CAKEIFY A blue soap is placed on a modern table. Suddenly, a knife appears and slices through the soap, revealing a cake inside. The soap turns into a hyper-realistic prop cake, showcasing the creative transformation of everyday objects into something unexpected and delightful.
  output:
    url: "./assets/output_0.mp4"
- text: PIKA_CAKEIFY On a gleaming glass display stand, a sleek black purse quietly commands attention. Suddenly, a knife appears and slices through the shoe, revealing a fluffy vanilla sponge at its core. Immediately, it turns into a hyper-realistic prop cake, delighting the senses with its playful juxtaposition of the everyday and the extraordinary.
  output:
    url: "./assets/output_1.mp4"
- text: PIKA_CAKEIFY A red tea cup is placed on a wooden surface. Suddenly, a knife appears and slices through the cup, revealing a cake inside. The cake turns into a hyper-realistic prop cake, showcasing the creative transformation of everyday objects into something unexpected and delightful.
  output:
    url: "./assets/output_2.mp4"
tags:
- text-to-video
- diffusers-training
- diffusers
- cogvideox
- cogvideox-diffusers
- template:sd-lora
---

<Gallery />

This is a fine-tune of the [THUDM/CogVideoX-5b](https://huggingface.co/THUDM/CogVideoX-5b) model on the
[finetrainers/cakeify-smol](https://huggingface.co/datasets/finetrainers/cakeify-smol) dataset. We also provide
a LoRA variant of the params. Check it out [here](#lora).

Code: https://github.com/a-r-r-o-w/finetrainers

> [!IMPORTANT]
> This is an experimental checkpoint and its poor generalization is well-known.

Inference code:

```py
from diffusers import CogVideoXTransformer3DModel, DiffusionPipeline 
from diffusers.utils import export_to_video
import torch 

transformer = CogVideoXTransformer3DModel.from_pretrained(
    "finetrainers/cakeify-v0", torch_dtype=torch.bfloat16
)
pipeline = DiffusionPipeline.from_pretrained(
    "THUDM/CogVideoX-5b", transformer=transformer, torch_dtype=torch.bfloat16
).to("cuda")

prompt = """
PIKA_CAKEIFY On a gleaming glass display stand, a sleek black purse quietly commands attention. Suddenly, a knife appears and slices through the shoe, revealing a fluffy vanilla sponge at its core. Immediately, it turns into a hyper-realistic prop cake, delighting the senses with its playful juxtaposition of the everyday and the extraordinary.
"""
negative_prompt = "inconsistent motion, blurry motion, worse quality, degenerate outputs, deformed outputs"

video = pipeline(
    prompt=prompt, 
    negative_prompt=negative_prompt, 
    num_frames=81, 
    height=512,
    width=768,
    num_inference_steps=50
).frames[0]
export_to_video(video, "output.mp4", fps=25)
```

Training logs are available on WandB [here](https://wandb.ai/diffusion-guidance/finetrainers-cogvideox/runs/q7z660f3/).

## LoRA

We extracted a 64-rank LoRA from the finetuned checkpoint (script [here](./create_lora.py)). [This LoRA](./extracted_cakeify_lora_64.safetensors) can be used to emulate the same kind of effect:

<details>
<summary>Code</summary>

```py
from diffusers import DiffusionPipeline 
from diffusers.utils import export_to_video
import torch 

pipeline = DiffusionPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16).to("cuda")
pipeline.load_lora_weights("finetrainers/cakeify-v0", weight_name="extracted_cakeify_lora_64.safetensors")

prompt = """
PIKA_CAKEIFY On a gleaming glass display stand, a sleek black purse quietly commands attention. Suddenly, a knife appears and slices through the shoe, revealing a fluffy vanilla sponge at its core. Immediately, it turns into a hyper-realistic prop cake, delighting the senses with its playful juxtaposition of the everyday and the extraordinary.
"""
negative_prompt = "inconsistent motion, blurry motion, worse quality, degenerate outputs, deformed outputs"

video = pipeline(
    prompt=prompt, 
    negative_prompt=negative_prompt, 
    num_frames=81, 
    height=512,
    width=768,
    num_inference_steps=50
).frames[0]
export_to_video(video, "output_lora.mp4", fps=25)
```
  
</details>

Below is a comparison between the LoRA and non-LoRA outputs (under same settings and seed):

<table>
  <thead>
    <tr>
      <th>Full finetune</th>
      <th>LoRA</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>
        <video width="320" height="240" controls>
          <source src="https://huggingface.co/finetrainers/cakeify-v0/resolve/main/comparisons/original_output_0.mp4" type="video/mp4">
          Your browser does not support the video tag.
        </video>
      </td>
      <td>
        <video width="320" height="240" controls>
          <source src="https://huggingface.co/finetrainers/cakeify-v0/resolve/main/comparisons/output_0.mp4" type="video/mp4">
          Your browser does not support the video tag.
        </video>
      </td>
    </tr>
    <tr>
      <td>
        <video width="320" height="240" controls>
          <source src="https://huggingface.co/finetrainers/cakeify-v0/resolve/main/comparisons/original_output_1.mp4" type="video/mp4">
          Your browser does not support the video tag.
        </video>
      </td>
      <td>
        <video width="320" height="240" controls>
          <source src="https://huggingface.co/finetrainers/cakeify-v0/resolve/main/comparisons/output_1.mp4" type="video/mp4">
          Your browser does not support the video tag.
        </video>
      </td>
    </tr>
    <tr>
      <td>
        <video width="320" height="240" controls>
          <source src="https://huggingface.co/finetrainers/cakeify-v0/resolve/main/comparisons/original_output_2.mp4" type="video/mp4">
          Your browser does not support the video tag.
        </video>
      </td>
      <td>
        <video width="320" height="240" controls>
          <source src="https://huggingface.co/finetrainers/cakeify-v0/resolve/main/comparisons/output_2.mp4" type="video/mp4">
          Your browser does not support the video tag.
        </video>
      </td>
    </tr>
  </tbody>
</table>