Alibaba-Research-Intelligence-Computing
/

Tora

@@ -1,16 +1,18 @@
 ---
-license: other
-language:
-- en
 base_model:
 - THUDM/CogVideoX-5b
 tags:
 - video
 - video-generation
 - cogvideox
 - alibaba
-pipeline_tag: text-to-video
 ---
 <div align="center">
 <img src="icon.jpg" width="250"/>
@@ -56,6 +58,21 @@ Recent advancements in Diffusion Transformer (DiT) have demonstrated remarkable
 - `2024/08/27` We released our v2 paper including appendix.
 - `2024/07/31` We submitted our paper on arXiv and released our project page.
 ## 🎞️ Showcases
 https://github.com/user-attachments/assets/949d5e99-18c9-49d6-b669-9003ccd44bf1
@@ -66,6 +83,79 @@ https://github.com/user-attachments/assets/4026c23d-229d-45d7-b5be-6f3eb9e4fd50
 All videos are available in this [Link](https://cloudbook-public-daily.oss-cn-hangzhou.aliyuncs.com/Tora_t2v/showcases.zip)
 ## 🤝 Acknowledgements
 We would like to express our gratitude to the following open-source projects that have been instrumental in the development of our project:

 ---
 base_model:
 - THUDM/CogVideoX-5b
+language:
+- en
+license: other
+pipeline_tag: text-to-video
 tags:
 - video
 - video-generation
 - cogvideox
 - alibaba
+library_name: pytorch
 ---
 <div align="center">
 <img src="icon.jpg" width="250"/>
 - `2024/08/27` We released our v2 paper including appendix.
 - `2024/07/31` We submitted our paper on arXiv and released our project page.
+## 📑 Table of Contents
+- [🎞️ Showcases](#%EF%B8%8F-showcases)
+- [✅ TODO List](#-todo-list)
+- [🧨 Diffusers verision](#-diffusers-verision)
+- [🐍 Installation](#-installation)
+- [📦 Model Weights](#-model-weights)
+- [🔄 Inference](#-inference)
+- [🖥️ Gradio Demo](#%EF%B8%8F-gradio-demo)
+- [🧠 Training](#-training)
+- [🎯 Troubleshooting](#-troubleshooting)
+- [🤝 Acknowledgements](#-acknowledgements)
+- [📄 Our previous work](#-our-previous-work)
+- [📚 Citation](#-citation)
 ## 🎞️ Showcases
 https://github.com/user-attachments/assets/949d5e99-18c9-49d6-b669-9003ccd44bf1
 All videos are available in this [Link](https://cloudbook-public-daily.oss-cn-hangzhou.aliyuncs.com/Tora_t2v/showcases.zip)
+## ✅ TODO List
+- [x] Release our inference code and model weights
+- [x] Provide a ModelScope Demo
+- [x] Release our training code
+- [x] Release diffusers version and optimize the GPU memory usage
+- [x] Release complete version of Tora
+## 📦 Model Weights
+### Folder Structure
+```
+Tora
+└── sat
+    └── ckpts
+        ├── t5-v1_1-xxl
+        │   ├── model-00001-of-00002.safetensors
+        │   └── ...
+        ├── vae
+        │   └── 3d-vae.pt
+        ├── tora
+        │   ├── i2v
+        │   │   └── mp_rank_00_model_states.pt
+        │   └── t2v
+        │       └── mp_rank_00_model_states.pt
+        └── CogVideoX-5b-sat # for training stage 1
+            └── mp_rank_00_model_states.pt
+```
+### Download Links
+*Note: Downloading the `tora` weights requires following the [CogVideoX License](CogVideoX_LICENSE).* You can choose one of the following options: HuggingFace, ModelScope, or native links.\
+After downloading the model weights, you can put them in the `Tora/sat/ckpts` folder.
+#### HuggingFace
+```bash
+# This can be faster
+pip install "huggingface_hub[hf_transfer]"
+HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download Alibaba-Research-Intelligence-Computing/Tora --local-dir ckpts
+```
+or
+```bash
+# use git
+git lfs install
+git clone https://huggingface.co/Alibaba-Research-Intelligence-Computing/Tora
+```
+#### ModelScope
+- SDK
+```bash
+from modelscope import snapshot_download
+model_dir = snapshot_download('xiaoche/Tora')
+```
+- Git
+```bash
+git clone https://www.modelscope.cn/xiaoche/Tora.git
+```
+#### Native
+- Download the VAE and T5 model following [CogVideo](https://github.com/THUDM/CogVideo/blob/main/sat/README.md#2-download-model-weights):\
+    - VAE: https://cloud.tsinghua.edu.cn/f/fdba7608a49c463ba754/?dl=1
+    - T5: [text_encoder](https://huggingface.co/THUDM/CogVideoX-2b/tree/main/text_encoder), [tokenizer](https://huggingface.co/THUDM/CogVideoX-2b/tree/main/tokenizer)
+- Tora t2v model weights: [Link](https://cloudbook-public-daily.oss-cn-hangzhou.aliyuncs.com/Tora_t2v/mp_rank_00_model_states.pt). Downloading this weight requires following the [CogVideoX License](CogVideoX_LICENSE).
 ## 🤝 Acknowledgements
 We would like to express our gratitude to the following open-source projects that have been instrumental in the development of our project: