File size: 1,033 Bytes
d72b1e4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
## Potat 1️⃣ 
First Open-Source 1024x576 Text To Video Model 🥳  

### Info
Prototype Model <br />
Trained with https://lambdalabs.com ❤ 1xA100 (40GB) <br />
2197 clips, 68388 tagged frames ( [salesforce/blip2-opt-6.7b-coco](https://huggingface.co/Salesforce/blip2-opt-6.7b-coco) ) <br />
train_steps: 10000 <br />


### Dataset & Config
https://huggingface.co/camenduru/potat1_dataset/tree/main

### Repos
https://github.com/Breakthrough/PySceneDetect <br />
https://github.com/ExponentialML/Video-BLIP2-Preprocessor <br />
https://github.com/ExponentialML/Text-To-Video-Finetuning <br />
https://github.com/camenduru/Text-To-Video-Finetuning-colab <br />

### Base Model
https://huggingface.co/damo-vilab/modelscope-damo-text-to-video-synthesis <br />
https://www.modelscope.cn/models/damo/text-to-video-synthesis <br />

Thanks to ModelScope ❤ ExponentialML ❤ @DiffusersLib ❤ @LambdaAPI ❤ @cerspense ❤ @CiaraRowles1 ❤ @p1atdev_art  ❤ <br />

Please try it 🐣 <br />

Potat 2️⃣ is in the oven ♨ <br />