We highly recommend proceeding with the VADER-VideoCrafter model first, which performs better than the other two.
βοΈ Installation
Assuming you are in the VADER/
directory, you are able to create a Conda environments for VADER-VideoCrafter using the following commands:
cd VADER-VideoCrafter
conda create -n vader_videocrafter python=3.10
conda activate vader_videocrafter
conda install pytorch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install xformers -c xformers
pip install -r requirements.txt
git clone https://github.com/tgxs002/HPSv2.git
cd HPSv2/
pip install -e .
cd ..
- We are using the pretrained Text-to-Video VideoCrafter2 model via Hugging Face. If you unfortunately find the model is not automatically downloaded when you running inference or training script, you can manually download it and put the
model.ckpt
inVADER/VADER-VideoCrafter/checkpoints/base_512_v2/model.ckpt
.
πΊ Inference
Please run accelerate config
as the first step to configure accelerator settings. If you are not familiar with the accelerator configuration, you can refer to VADER-VideoCrafter documentation.
Assuming you are in the VADER/
directory, you are able to do inference using the following commands:
cd VADER-VideoCrafter
sh scripts/run_text2video_inference.sh
- We have tested on PyTorch 2.3.0 and CUDA 12.1. The inferece script works on a single GPU with 16GBs VRAM, when we set
val_batch_size=1
and usefp16
mixed precision. It should also work with recent PyTorch and CUDA versions. VADER/VADER-VideoCrafter/scripts/main/train_t2v_lora.py
is a script for inference of the VideoCrafter2 using VADER via LoRA.- Most of the arguments are the same as the training process. The main difference is that
--inference_only
should be set toTrue
. --lora_ckpt_path
is required to set to the path of the pretrained LoRA model. Otherwise, the original VideoCrafter model will be used for inference.
- Most of the arguments are the same as the training process. The main difference is that
π§ Training
Please run accelerate config
as the first step to configure accelerator settings. If you are not familiar with the accelerator configuration, you can refer to VADER-VideoCrafter documentation.
Assuming you are in the VADER/
directory, you are able to train the model using the following commands:
cd VADER-VideoCrafter
sh scripts/run_text2video_train.sh
- Our experiments are conducted on PyTorch 2.3.0 and CUDA 12.1 while using 4 A6000s (48GB RAM). It should also work with recent PyTorch and CUDA versions. The training script have been tested on a single GPU with 16GBs VRAM, when we set
train_batch_size=1 val_batch_size=1
and usefp16
mixed precision. VADER/VADER-VideoCrafter/scripts/main/train_t2v_lora.py
is also a script for fine-tuning the VideoCrafter2 using VADER via LoRA.- You can read the VADER-VideoCrafter documentation to understand the usage of arguments.
π‘ Tutorial
This section is to provide a tutorial on how to implement the VADER method on VideoCrafter by yourself. We will provide a step-by-step guide to help you understand the implementation details. Thus, you can easily adapt the VADER method to later versions of VideCrafter. This tutorial is based on the VideoCrafter2.
Step 1: Install the dependencies
First, you need to install the dependencies according to the VideoCrafter repository. You can also follow the instructions in the repository to install the dependencies.
conda create -n vader_videocrafter python=3.8.5
conda activate vader_videocrafter
pip install -r requirements.txt
You have to download pretrained Text-to-Video VideoCrafter2 model via Hugging Face, and put the model.ckpt
in the downloaded VideoCrafter directionary as VideoCrafter/checkpoints/base_512_v2/model.ckpt
.
There are a list of extra dependencies that you need to install for VADER. You can install them by running the following command.
# Install the HPS
git clone https://github.com/tgxs002/HPSv2.git
cd HPSv2/
pip install -e .
cd ..
# Install the dependencies
pip install albumentations \
peft \
bitsandbytes \
accelerate \
inflect \
wandb \
ipdb \
pytorch_lightning
Step 2: Transfer VADER scripts
You can copy our VADER/VADER-VideoCrafter/scripts/main/train_t2v_lora.py
to the VideoCrafter/scripts/evaluation/
directory of VideoCrafter. It is better to copy our run_text2video_train.sh
and run_text2video_inference.sh
to the directionary VideoCrafter/scripts/
as well. Then, you need to copy All the files in VADER/Core/
and VADER/assets/
to the parent directory of VideoCrafter, which means Core/
, assets
and VideoCrafter/
should be in the same directory. Now, you may have a directory structure like:
.
βββ Core
β βββ ...
βββ VideoCrafter
β βββ scripts
β β βββ evaluation
β β β βββ train_t2v_lora.py
β β βββ run_text2video_train.sh
β β βββ run_text2video_inference.sh
β βββ checkpoints
β β βββ base_512_v2
β β β βββ model.ckpt
βββ assets
β βββ ...
Step 3: Modify the VideoCrafter code
You need to modify the VideoCrafter code to adapt the VADER method. You can follow the instructions below to modify the code.
- Modify the
batch_ddim_sampling()
function inVideoCrafter/scripts/evaluation/funcs.py
as our implementation inVADER/VADER-VideoCrafter/scripts/main/funcs.py
. - Modify the
DDIMSampler.__init__()
,DDIMSampler.sample()
andDDIMSampler.ddim_sampling
functions inVideoCrafter\lvdm\models\samplers\ddim.py
as our implementation inVADER/VADER-VideoCrafter\lvdm\models\samplers\ddim.py
. - Comment out the
@torch.no_grad()
beforeDDIMSampler.sample()
,DDIMSampler.ddim_sampling
, andDDIMSampler.p_sample_ddim()
inVideoCrafter\lvdm\models\samplers\ddim.py
. Also, comment out the@torch.no_grad()
beforeLatentDiffusion.decode_first_stage_2DAE()
inVideoCrafter\lvdm\models\ddpm3d.py
. - Because we have commented out the
@torch.no_grad()
, you can addwith torch.no_grad():
at some places inVideoCrater/scripts/evaluation/inference.py
to avoid the gradient calculation.
Step 4: Ready to Train
Now you have all the files in the right place and modified the VideoCrafter source code. You can run the training script by running the following command.
cd VideoCrafter
# training
sh scripts/run_text2video_train.sh
# or inference
sh scripts/run_text2video_inference.sh
Acknowledgement
Our codebase is directly built on top of VideoCrafter, Open-Sora, and Animate Anything. We would like to thank the authors for open-sourcing their code.
Citation
If you find this work useful in your research, please cite: