Spaces:

jbilcke-hf
/

VideoModelStudio

Running

App Files Files Community

jbilcke-hf HF Staff commited on Feb 19

Commit

7595521

1 Parent(s): 947f205

fix readme

Browse files

Files changed (2) hide show

README.md +128 -1
README_WIP.md +0 -97

README.md CHANGED Viewed

@@ -13,4 +13,131 @@ short_description: All-in-one tool for AI video training
 # 🎥 Video Model Studio (VMS)
-This project is a work in progress, not all features are working yet (there are some issue with the automatic captioning).

 # 🎥 Video Model Studio (VMS)
+## Presentation
+### What is this project?
+VMS is a Gradio app that wraps around Finetrainers, to provide a simple UI to train AI video models on Hugging Face.
+You can deploy it to your private space, and start long-running trainign jobs in the background.
+### One-user-per-space design
+Currently CMS can only support one training job at a time, anybody with access to your Gradio app will be able to upload or delete everything etc.
+This means you have to run VMS in your own Hugging Face Space, or locally if you require full privacy.
+### Similar projects
+I wasn't aware of its existence when I started my project, but there is also this open-source initiative: https://github.com/alisson-anjos/diffusion-pipe-ui
+## Features
+### Run Finetrainers in the background
+The main feature of VMS is the ability to run a Finetrainers training session in the background.
+You can start your job, close the web browser tab, and come back the next morning to see the result.
+### Automatic scene splitting
+VMS uses PySceneDetect to split scenes.
+### Automatic clip captioning
+VMS uses `LLaVA-Video-7B-Qwen2` for captioning. You can customize the system prompt if you want to.
+### Downlad your dataset
+Not interested in using VMS for training? That's perfectly fine!
+You can use VMS for video splitting and captioning, and export the data for training on another platform eg. on Replicate or Fal.
+## Supported models
+VMS uses `Finetrainers` under the hood. In theory any model supported by Finetrainers should work in VMS.
+In practice, a PR (pull request) will be necessary to adapt the UI a bit to accomodate for each model specificities.
+### LTX-Video
+I have tested training a LoRA model using videos, on a single A100 instance.
+### HunyuanVideo
+I haven't tested it yet, but in theory it should work out of the box.
+Please keep in mind that this requires a lot of processing mower.
+### CogVideoX
+Do you want support for this one? Let me know in the comments!
+## Deployment
+VMS is built on top of Finetrainers and Gradio, and designed to run as a Hugging Face Space (but you can deploy it anywhere that has a NVIDIA GPU and supports Docker).
+### Full installation at Hugging Face
+Easy peasy: create a Space (make sure to use the `Gradio` type/template), and push the repo. No Docker needed!
+That said, please see the "RUN" section for info about environement variables.
+### Dev mode on Hugging Face
+Enable dev mode in the space, then open VSCode in local or remote and run:
+```
+pip install -r requirements.txt
+```
+As this is not automatic, then click on "Restart" in the space dev mode UI widget.
+### Full installation somewhere else
+I haven't tested it, but you can try to provided Dockerfile
+### Full installation in local
+the full installation requires:
+- Linux
+- CUDA 12
+- Python 3.10
+This is because of flash attention, which is defined in the `requirements.txt` using an URL to download a prebuilt wheel (python bindings for a native library)
+```bash
+./setup.sh
+```
+### Degraded installation in local
+If you cannot meet the requirements, you can:
+- solution 1: fix requirements.txt to use another prebuilt wheel
+- solution 2: manually build/install flash attention
+- solution 3: don't use clip captioning
+Here is how to do solution 3:
+```bash
+./setup_no_captions.sh
+```
+## Run
+### Running the Gradio app
+Note: please make sure you properly define the environment variables for `STORAGE_PATH` (eg. `/data/`) and `HF_HOME` (eg. `/data/huggingface/`)
+```bash
+python app.py
+```
+### Running locally
+See above remarks about the environment variable.
+By default `run.sh` will store stuff in `.data/` (located inside the current working directory):
+```bash
+./run.sh
+```

README_WIP.md DELETED Viewed

@@ -1,97 +0,0 @@
-README_WIP.md
----
-title: Video Model Studio
-emoji: 🎥
-colorFrom: gray
-colorTo: gray
-sdk: gradio
-sdk_version: 5.15.0
-app_file: app.py
-pinned: true
-license: apache-2.0
-short_description: All-in-one tool for AI video training
----
-# 🎥 Video Model Studio (VMS)
-## Presentation
-VMS is an all-in-one tool to train LoRA models for various open-source AI video models:
-- Data collection from various sources
-- Splitting videos into short single camera shots
-- Automatic captioning
-- Training HunyuanVideo or LTX-Video
-## Similar projects
-I wasn't aware of it when I started this project,
-but there is also this: https://github.com/alisson-anjos/diffusion-pipe-ui
-## Installation
-VMS is built on top of Finetrainers and Gradio, and designed to run as a Hugging Face Space (but you can deploy it elsewhere if you want to).
-### Full installation at Hugging Face
-Easy peasy: create a Space (make sure to use the `Gradio` type/template), and push the repo. No Docker needed!
-### Dev mode on Hugging Face
-Enable dev mode in the space, then open VSCode in local or remote and run:
-```
-pip install -r requirements.txt
-```
-As this is not automatic, then click on "Restart" in the space dev mode UI widget.
-### Full installation somewhere else
-I haven't tested it, but you can try to provided Dockerfile
-### Full installation in local
-the full installation requires:
-- Linux
-- CUDA 12
-- Python 3.10
-This is because of flash attention, which is defined in the `requirements.txt` using an URL to download a prebuilt wheel (python bindings for a native library)
-```bash
-./setup.sh
-```
-### Degraded installation in local
-If you cannot meet the requirements, you can:
-- solution 1: fix requirements.txt to use another prebuilt wheel
-- solution 2: manually build/install flash attention
-- solution 3: don't use clip captioning
-Here is how to do solution 3:
-```bash
-./setup_no_captions.sh
-```
-## Run
-### Running the Gradio app
-Note: please make sure you properly define the environment variables for `STORAGE_PATH` (eg. `/data/`) and `HF_HOME` (eg. `/data/huggingface/`)
-```bash
-python app.py
-```
-### Running locally
-See above remarks about the environment variable.
-By default `run.sh` will store stuff in `.data/` (located inside the current working directory):
-```bash
-./run.sh
-```