jbilcke-hf HF Staff commited on
Commit
7595521
·
1 Parent(s): 947f205

fix readme

Browse files
Files changed (2) hide show
  1. README.md +128 -1
  2. README_WIP.md +0 -97
README.md CHANGED
@@ -13,4 +13,131 @@ short_description: All-in-one tool for AI video training
13
 
14
  # 🎥 Video Model Studio (VMS)
15
 
16
- This project is a work in progress, not all features are working yet (there are some issue with the automatic captioning).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  # 🎥 Video Model Studio (VMS)
15
 
16
+ ## Presentation
17
+
18
+ ### What is this project?
19
+
20
+ VMS is a Gradio app that wraps around Finetrainers, to provide a simple UI to train AI video models on Hugging Face.
21
+
22
+ You can deploy it to your private space, and start long-running trainign jobs in the background.
23
+
24
+ ### One-user-per-space design
25
+
26
+ Currently CMS can only support one training job at a time, anybody with access to your Gradio app will be able to upload or delete everything etc.
27
+
28
+ This means you have to run VMS in your own Hugging Face Space, or locally if you require full privacy.
29
+
30
+ ### Similar projects
31
+
32
+ I wasn't aware of its existence when I started my project, but there is also this open-source initiative: https://github.com/alisson-anjos/diffusion-pipe-ui
33
+
34
+ ## Features
35
+
36
+ ### Run Finetrainers in the background
37
+
38
+ The main feature of VMS is the ability to run a Finetrainers training session in the background.
39
+
40
+ You can start your job, close the web browser tab, and come back the next morning to see the result.
41
+
42
+ ### Automatic scene splitting
43
+
44
+ VMS uses PySceneDetect to split scenes.
45
+
46
+ ### Automatic clip captioning
47
+
48
+ VMS uses `LLaVA-Video-7B-Qwen2` for captioning. You can customize the system prompt if you want to.
49
+
50
+ ### Downlad your dataset
51
+
52
+ Not interested in using VMS for training? That's perfectly fine!
53
+
54
+ You can use VMS for video splitting and captioning, and export the data for training on another platform eg. on Replicate or Fal.
55
+
56
+ ## Supported models
57
+
58
+ VMS uses `Finetrainers` under the hood. In theory any model supported by Finetrainers should work in VMS.
59
+
60
+ In practice, a PR (pull request) will be necessary to adapt the UI a bit to accomodate for each model specificities.
61
+
62
+ ### LTX-Video
63
+
64
+ I have tested training a LoRA model using videos, on a single A100 instance.
65
+
66
+ ### HunyuanVideo
67
+
68
+ I haven't tested it yet, but in theory it should work out of the box.
69
+ Please keep in mind that this requires a lot of processing mower.
70
+
71
+ ### CogVideoX
72
+
73
+ Do you want support for this one? Let me know in the comments!
74
+
75
+ ## Deployment
76
+
77
+ VMS is built on top of Finetrainers and Gradio, and designed to run as a Hugging Face Space (but you can deploy it anywhere that has a NVIDIA GPU and supports Docker).
78
+
79
+ ### Full installation at Hugging Face
80
+
81
+ Easy peasy: create a Space (make sure to use the `Gradio` type/template), and push the repo. No Docker needed!
82
+
83
+ That said, please see the "RUN" section for info about environement variables.
84
+
85
+ ### Dev mode on Hugging Face
86
+
87
+ Enable dev mode in the space, then open VSCode in local or remote and run:
88
+
89
+ ```
90
+ pip install -r requirements.txt
91
+ ```
92
+
93
+ As this is not automatic, then click on "Restart" in the space dev mode UI widget.
94
+
95
+ ### Full installation somewhere else
96
+
97
+ I haven't tested it, but you can try to provided Dockerfile
98
+
99
+ ### Full installation in local
100
+
101
+ the full installation requires:
102
+ - Linux
103
+ - CUDA 12
104
+ - Python 3.10
105
+
106
+ This is because of flash attention, which is defined in the `requirements.txt` using an URL to download a prebuilt wheel (python bindings for a native library)
107
+
108
+ ```bash
109
+ ./setup.sh
110
+ ```
111
+
112
+ ### Degraded installation in local
113
+
114
+ If you cannot meet the requirements, you can:
115
+
116
+ - solution 1: fix requirements.txt to use another prebuilt wheel
117
+ - solution 2: manually build/install flash attention
118
+ - solution 3: don't use clip captioning
119
+
120
+ Here is how to do solution 3:
121
+ ```bash
122
+ ./setup_no_captions.sh
123
+ ```
124
+
125
+ ## Run
126
+
127
+ ### Running the Gradio app
128
+
129
+ Note: please make sure you properly define the environment variables for `STORAGE_PATH` (eg. `/data/`) and `HF_HOME` (eg. `/data/huggingface/`)
130
+
131
+ ```bash
132
+ python app.py
133
+ ```
134
+
135
+ ### Running locally
136
+
137
+ See above remarks about the environment variable.
138
+
139
+ By default `run.sh` will store stuff in `.data/` (located inside the current working directory):
140
+
141
+ ```bash
142
+ ./run.sh
143
+ ```
README_WIP.md DELETED
@@ -1,97 +0,0 @@
1
- README_WIP.md
2
- ---
3
- title: Video Model Studio
4
- emoji: 🎥
5
- colorFrom: gray
6
- colorTo: gray
7
- sdk: gradio
8
- sdk_version: 5.15.0
9
- app_file: app.py
10
- pinned: true
11
- license: apache-2.0
12
- short_description: All-in-one tool for AI video training
13
- ---
14
-
15
- # 🎥 Video Model Studio (VMS)
16
-
17
- ## Presentation
18
-
19
- VMS is an all-in-one tool to train LoRA models for various open-source AI video models:
20
-
21
- - Data collection from various sources
22
- - Splitting videos into short single camera shots
23
- - Automatic captioning
24
- - Training HunyuanVideo or LTX-Video
25
-
26
- ## Similar projects
27
-
28
- I wasn't aware of it when I started this project,
29
- but there is also this: https://github.com/alisson-anjos/diffusion-pipe-ui
30
-
31
- ## Installation
32
-
33
- VMS is built on top of Finetrainers and Gradio, and designed to run as a Hugging Face Space (but you can deploy it elsewhere if you want to).
34
-
35
- ### Full installation at Hugging Face
36
-
37
- Easy peasy: create a Space (make sure to use the `Gradio` type/template), and push the repo. No Docker needed!
38
-
39
- ### Dev mode on Hugging Face
40
-
41
- Enable dev mode in the space, then open VSCode in local or remote and run:
42
-
43
- ```
44
- pip install -r requirements.txt
45
- ```
46
-
47
- As this is not automatic, then click on "Restart" in the space dev mode UI widget.
48
-
49
- ### Full installation somewhere else
50
-
51
- I haven't tested it, but you can try to provided Dockerfile
52
-
53
- ### Full installation in local
54
-
55
- the full installation requires:
56
- - Linux
57
- - CUDA 12
58
- - Python 3.10
59
-
60
- This is because of flash attention, which is defined in the `requirements.txt` using an URL to download a prebuilt wheel (python bindings for a native library)
61
-
62
- ```bash
63
- ./setup.sh
64
- ```
65
-
66
- ### Degraded installation in local
67
-
68
- If you cannot meet the requirements, you can:
69
-
70
- - solution 1: fix requirements.txt to use another prebuilt wheel
71
- - solution 2: manually build/install flash attention
72
- - solution 3: don't use clip captioning
73
-
74
- Here is how to do solution 3:
75
- ```bash
76
- ./setup_no_captions.sh
77
- ```
78
-
79
- ## Run
80
-
81
- ### Running the Gradio app
82
-
83
- Note: please make sure you properly define the environment variables for `STORAGE_PATH` (eg. `/data/`) and `HF_HOME` (eg. `/data/huggingface/`)
84
-
85
- ```bash
86
- python app.py
87
- ```
88
-
89
- ### Running locally
90
-
91
- See above remarks about the environment variable.
92
-
93
- By default `run.sh` will store stuff in `.data/` (located inside the current working directory):
94
-
95
- ```bash
96
- ./run.sh
97
- ```