File size: 1,634 Bytes
5457ca3 014ad9d 5457ca3 63314a1 5457ca3 2d292f8 5457ca3 e98d9ab 4434ca9 5457ca3 4434ca9 5457ca3 e64c3b6 67b6b06 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
---
license: apache-2.0
datasets:
- TempoFunk/webvid-10M
language:
- en
tags:
- text-to-video
base_model:
- ali-vilab/text-to-video-ms-1.7b
---
# caT text to video
Conditionally augmented text-to-video model. Uses pre-trained weights from modelscope text-to-video model, augmented with temporal conditioning transformers to extend generated clips and create a smooth transition between them.
Supports prompt interpolation as well to change scenes during clip extensions.
This model was trained at home as a hobby.
Do not expect high quality samples.
## Installation
### Clone the Repository
```bash
git clone https://github.com/motexture/caT-text-to-video.git
cd caT-text-to-video
python3 -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
pip install -r requirements.txt
python3 run.py
```
Visit the provided URL in your browser to interact with the interface and start generating videos.
Note: Ensure that you are on the latest commit, as the positional encodings have been updated compared to the initial models.
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/64a86f7d03835e13f95c3687/qr-NXxvmkquF_mMlx_5P-.mp4"></video>
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/64a86f7d03835e13f95c3687/32B1RPHAmieomeXWp2XvC.mp4"></video>
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/64a86f7d03835e13f95c3687/40KrBvzMf8DmPO8VvATfC.mp4"></video>
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/64a86f7d03835e13f95c3687/SEtFOILcwwNT4M8mXMNWt.mp4"></video>
|