File size: 7,343 Bytes
824b515 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
<div align="center">
<!-- TITLE -->
# π**VADER-VideoCrafter**
</div>
We **highly recommend** proceeding with the VADER-VideoCrafter model first, which performs better than the other two.
## βοΈ Installation
Assuming you are in the `VADER/` directory, you are able to create a Conda environments for VADER-VideoCrafter using the following commands:
```bash
cd VADER-VideoCrafter
conda create -n vader_videocrafter python=3.10
conda activate vader_videocrafter
conda install pytorch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install xformers -c xformers
pip install -r requirements.txt
git clone https://github.com/tgxs002/HPSv2.git
cd HPSv2/
pip install -e .
cd ..
```
- We are using the pretrained Text-to-Video [VideoCrafter2](https://huggingface.co/VideoCrafter/VideoCrafter2/blob/main/model.ckpt) model via Hugging Face. If you unfortunately find the model is not automatically downloaded when you running inference or training script, you can manually download it and put the `model.ckpt` in `VADER/VADER-VideoCrafter/checkpoints/base_512_v2/model.ckpt`.
## πΊ Inference
Please run `accelerate config` as the first step to configure accelerator settings. If you are not familiar with the accelerator configuration, you can refer to VADER-VideoCrafter [documentation](../documentation/VADER-VideoCrafter.md).
Assuming you are in the `VADER/` directory, you are able to do inference using the following commands:
```bash
cd VADER-VideoCrafter
sh scripts/run_text2video_inference.sh
```
- We have tested on PyTorch 2.3.0 and CUDA 12.1. The inferece script works on a single GPU with 16GBs VRAM, when we set `val_batch_size=1` and use `fp16` mixed precision. It should also work with recent PyTorch and CUDA versions.
- `VADER/VADER-VideoCrafter/scripts/main/train_t2v_lora.py` is a script for inference of the VideoCrafter2 using VADER via LoRA.
- Most of the arguments are the same as the training process. The main difference is that `--inference_only` should be set to `True`.
- `--lora_ckpt_path` is required to set to the path of the pretrained LoRA model. Otherwise, the original VideoCrafter model will be used for inference.
## π§ Training
Please run `accelerate config` as the first step to configure accelerator settings. If you are not familiar with the accelerator configuration, you can refer to VADER-VideoCrafter [documentation](../documentation/VADER-VideoCrafter.md).
Assuming you are in the `VADER/` directory, you are able to train the model using the following commands:
```bash
cd VADER-VideoCrafter
sh scripts/run_text2video_train.sh
```
- Our experiments are conducted on PyTorch 2.3.0 and CUDA 12.1 while using 4 A6000s (48GB RAM). It should also work with recent PyTorch and CUDA versions. The training script have been tested on a single GPU with 16GBs VRAM, when we set `train_batch_size=1 val_batch_size=1` and use `fp16` mixed precision.
- `VADER/VADER-VideoCrafter/scripts/main/train_t2v_lora.py` is also a script for fine-tuning the VideoCrafter2 using VADER via LoRA.
- You can read the VADER-VideoCrafter [documentation](../documentation/VADER-VideoCrafter.md) to understand the usage of arguments.
## π‘ Tutorial
This section is to provide a tutorial on how to implement the VADER method on VideoCrafter by yourself. We will provide a step-by-step guide to help you understand the implementation details. Thus, you can easily adapt the VADER method to later versions of VideCrafter. This tutorial is based on the VideoCrafter2.
### Step 1: Install the dependencies
First, you need to install the dependencies according to the [VideoCrafter](https://github.com/AILab-CVC/VideoCrafter) repository. You can also follow the instructions in the repository to install the dependencies.
```bash
conda create -n vader_videocrafter python=3.8.5
conda activate vader_videocrafter
pip install -r requirements.txt
```
You have to download pretrained Text-to-Video [VideoCrafter2](https://huggingface.co/VideoCrafter/VideoCrafter2/blob/main/model.ckpt) model via Hugging Face, and put the `model.ckpt` in the downloaded VideoCrafter directionary as `VideoCrafter/checkpoints/base_512_v2/model.ckpt`.
There are a list of extra dependencies that you need to install for VADER. You can install them by running the following command.
```bash
# Install the HPS
git clone https://github.com/tgxs002/HPSv2.git
cd HPSv2/
pip install -e .
cd ..
# Install the dependencies
pip install albumentations \
peft \
bitsandbytes \
accelerate \
inflect \
wandb \
ipdb \
pytorch_lightning
```
### Step 2: Transfer VADER scripts
You can copy our `VADER/VADER-VideoCrafter/scripts/main/train_t2v_lora.py` to the `VideoCrafter/scripts/evaluation/` directory of VideoCrafter. It is better to copy our `run_text2video_train.sh` and `run_text2video_inference.sh` to the directionary `VideoCrafter/scripts/` as well. Then, you need to copy All the files in `VADER/Core/` and `VADER/assets/` to the parent directory of VideoCrafter, which means `Core/`, `assets` and `VideoCrafter/` should be in the same directory. Now, you may have a directory structure like:
```bash
.
βββ Core
β βββ ...
βββ VideoCrafter
β βββ scripts
β β βββ evaluation
β β β βββ train_t2v_lora.py
β β βββ run_text2video_train.sh
β β βββ run_text2video_inference.sh
β βββ checkpoints
β β βββ base_512_v2
β β β βββ model.ckpt
βββ assets
β βββ ...
```
### Step 3: Modify the VideoCrafter code
You need to modify the VideoCrafter code to adapt the VADER method. You can follow the instructions below to modify the code.
- Modify the `batch_ddim_sampling()` function in `VideoCrafter/scripts/evaluation/funcs.py` as our implementation in `VADER/VADER-VideoCrafter/scripts/main/funcs.py`.
- Modify the `DDIMSampler.__init__()`, `DDIMSampler.sample()` and `DDIMSampler.ddim_sampling` functions in `VideoCrafter\lvdm\models\samplers\ddim.py` as our implementation in `VADER/VADER-VideoCrafter\lvdm\models\samplers\ddim.py`.
- Comment out the `@torch.no_grad()` before `DDIMSampler.sample()`, `DDIMSampler.ddim_sampling`, and `DDIMSampler.p_sample_ddim()` in `VideoCrafter\lvdm\models\samplers\ddim.py`. Also, comment out the `@torch.no_grad()` before `LatentDiffusion.decode_first_stage_2DAE()` in `VideoCrafter\lvdm\models\ddpm3d.py`.
- Because we have commented out the `@torch.no_grad()`, you can add `with torch.no_grad():` at some places in `VideoCrater/scripts/evaluation/inference.py` to avoid the gradient calculation.
### Step 4: Ready to Train
Now you have all the files in the right place and modified the VideoCrafter source code. You can run the training script by running the following command.
```bash
cd VideoCrafter
# training
sh scripts/run_text2video_train.sh
# or inference
sh scripts/run_text2video_inference.sh
```
## Acknowledgement
Our codebase is directly built on top of [VideoCrafter](https://github.com/AILab-CVC/VideoCrafter), [Open-Sora](https://github.com/hpcaitech/Open-Sora), and [Animate Anything](https://github.com/alibaba/animate-anything/). We would like to thank the authors for open-sourcing their code.
## Citation
If you find this work useful in your research, please cite:
```bibtex
```
|