File size: 1,017 Bytes
5457ca3
 
 
 
 
 
 
 
014ad9d
 
5457ca3
 
 
 
63314a1
5457ca3
 
 
 
 
 
 
 
 
e98d9ab
4434ca9
5457ca3
 
4434ca9
5457ca3
 
e64c3b6
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
---
license: apache-2.0
datasets:
- TempoFunk/webvid-10M
language:
- en
tags:
- text-to-video
base_model:
- ali-vilab/text-to-video-ms-1.7b
---
# caT text to video

Conditionally augmented text-to-video model. Uses pre-trained weights from modelscope text-to-video model, augmented with temporal conditioning transformers to extend generated clips and create a smooth transition between them.
Supports prompt interpolation as well to change scenes during clip extensions.

This project was trained at home as a hobby.

## Installation

### Clone the Repository

```bash
git clone https://github.com/motexture/caT-text-to-video.git
cd caT-text-to-video
python3 -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`
pip install -r requirements.txt
python3 run.py
```

Visit the provided URL in your browser to interact with the interface and start generating videos.

Note: Ensure that you are on the latest commit, as the positional encodings have been updated compared to the initial models.