File size: 6,857 Bytes
91fb4ef
 
 
 
 
 
ecd5028
91fb4ef
 
 
 
 
 
 
 
7595521
 
 
 
 
 
b722d84
7595521
24db093
7595521
24db093
 
 
7595521
6cf7909
de828f6
 
 
 
222f539
6cf7909
 
7595521
e0e67e9
 
 
24db093
 
 
 
 
 
 
 
 
 
7595521
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b722d84
7595521
 
 
 
 
 
 
 
 
 
 
02e94ba
 
 
 
 
7595521
 
4af8a5a
 
7595521
 
 
4af8a5a
 
 
7595521
 
 
 
 
6cf7909
 
02e94ba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6cf7909
 
 
 
 
 
7595521
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b613c3c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7595521
 
 
 
 
 
 
b613c3c
7595521
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b613c3c
7595521
 
 
 
 
 
 
 
 
 
e0e67e9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
---
title: Video Model Studio
emoji: πŸŽ₯
colorFrom: gray
colorTo: gray
sdk: gradio
sdk_version: 5.20.1
app_file: app.py
pinned: true
license: apache-2.0
short_description: All-in-one tool for AI video training
---

# πŸŽ₯ Video Model Studio (VMS)

## Presentation

### What is this project?

VMS is a Gradio app that wraps around Finetrainers, to provide a simple UI to train AI video models on Hugging Face.

You can deploy it to a private space, and start long-running training jobs in the background.

## Funding

VideoModelStudio is 100% open-source project, I develop and maintain it during both my pro and personal time. If you like it, you can tip! If not, have a good day 🫢

<a href="https://www.buymeacoffee.com/flngr" target="_blank"><img src="https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png" alt="Buy Me A Coffee" style="height: 41px !important;width: 174px !important;box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;-webkit-box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;" ></a>

## News
- πŸ”₯ **2025-03-12**: VMS now officially supports Wan!
- πŸ”₯ **2025-03-11**: I have added a tab to preview a model!
- πŸ”₯ **2025-03-10**: Various small fixes and improvements
- πŸ”₯ **2025-03-09**: I have added a basic CPU/RAM monitor (no GPU yet)
- πŸ”₯ **2025-03-02**: Made some fixes to improve Finetrainer reliability when working with big datasets
- πŸ”₯ **2025-02-18**: I am working to add better recovery in case of a failed run (this is still in beta)
- πŸ”₯ **2025-02-18**: I have added persistence of UI settings. So if you reload Gradio, you won't lose your settings!

## TODO
- Add `Aya-Vision-8B` for frame analysis (currently we use `Qwen2-VL-7B`)

### See also

#### Internally used project: Finetrainers

VMS uses Finetrainers under the hood: https://github.com/a-r-r-o-w/finetrainers

#### Similar project: diffusion-pipe-ui

I wasn't aware of its existence when I started my project, but there is also this open-source initiative (which is similar in terms of dataset management etc): https://github.com/alisson-anjos/diffusion-pipe-ui

## Features

### Run Finetrainers in the background

The main feature of VMS is the ability to run a Finetrainers training session in the background.

You can start your job, close the web browser tab, and come back the next morning to see the result.

### Automatic scene splitting

VMS uses PySceneDetect to split scenes.

### Automatic clip captioning

VMS uses `LLaVA-Video-7B-Qwen2` for captioning. You can customize the system prompt if you want to.

### Download your dataset

Not interested in using VMS for training? That's perfectly fine!

You can use VMS for video splitting and captioning, and export the data for training on another platform eg. on Replicate or Fal.

## Supported models

VMS uses `Finetrainers` under the hood. In theory any model supported by Finetrainers should work in VMS.

In practice, a PR (pull request) will be necessary to adapt the UI a bit to accomodate for each model specificities.


### Wan

I am currently testing Wan LoRA training!

### LTX-Video

I have tested training a LTX-Video LoRA model using videos (not images), on a single A100 instance.
It requires about 18/19 Gb of VRAM, depending on your settings.

### HunyuanVideo

I have tested training a HunyuanVideo LoRA model using videos (not images),, on a single A100 instance.

It requires about 47~49 Gb of VRAM, depending on your settings.

### CogVideoX

Do you want support for this one? Let me know in the comments!

## Limitations

### No AV1 on A100

If your dataset contains videos encoded using the AV1 codec, you might not be able to decode them (eg. during scene splitting) if your machine doesn't support hardware decoding.

Nvidia A100 don't support hardware AV1 decoding for instance.

It might be possible to convert them on server-side or use software decoding directly from Python, but I haven't looked into that yet (you can submit a PR if you have an idea).

My recommendation is to make sure your data comes in h264.

You can use FFmpeg to do this, eg:

```bash
ffmpeg -i input_video_in_av1.mp4 -vcodec libx264 -acodec aac output_video_in_h264.mp4
```

### One-user-per-space design

Currently CMS can only support one training job at a time, anybody with access to your Gradio app will be able to upload or delete everything etc.

This means you have to run VMS in a *PRIVATE* HF Space, or locally if you require full privacy.

## Deployment

VMS is built on top of Finetrainers and Gradio, and designed to run as a Hugging Face Space (but you can deploy it anywhere that has a NVIDIA GPU and supports Docker).

### Full installation at Hugging Face

Easy peasy: create a Space (make sure to use the `Gradio` type/template), and push the repo. No Docker needed!

That said, please see the "RUN" section for info about environement variables.

### Dev mode on Hugging Face

Enable dev mode in the space, then open VSCode in local or remote and run:

```
pip install -r requirements.txt
```

As this is not automatic, then click on "Restart" in the space dev mode UI widget.

### Full installation somewhere else

I haven't tested it, but you can try to provided Dockerfile

### Prerequisites

About Python:

I haven't tested Python 3.11 or 3.12, but I noticed some incompatibilities with Python 3.13 dependencies failing to install.

So I recommend you to install [pyenv](https://github.com/pyenv/pyenv) to switch between versions of Python.

If you are on macOS, you might already have some versions of Python installed, you can see them by typing:

```bash
% python3.10 --version
Python 3.10.16
% python3.11 --version
Python 3.11.11
% python3.12 --version
Python 3.12.9
% python3.13 --version
Python 3.13.2
```

Once pyenv is installed you can type:

```bash
pyenv install 3.10.16
```

### Full installation in local

the full installation requires:
- Linux
- CUDA 12
- Python 3.10

This is because of flash attention, which is defined in the `requirements.txt` using an URL to download a prebuilt wheel expecting this exact configuration (python bindings for a native library)

```bash
./setup.sh
```

### Degraded installation in local

If you cannot meet the requirements, you can:

- solution 1: fix requirements.txt to use another prebuilt wheel
- solution 2: manually build/install flash attention
- solution 3: don't use clip captioning

Here is how to do solution 3:
```bash
./setup_no_captions.sh
```

## Run

### Running the Gradio app

Note: please make sure you properly define the environment variables for `STORAGE_PATH` (eg. `/data/`) and `HF_HOME` (eg. `/data/huggingface/`)

```bash
python3.10 app.py
```

### Running locally

See above remarks about the environment variable.

By default `run.sh` will store stuff in `.data/` (located inside the current working directory):

```bash
./run.sh
```