|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- TempoFunk/webvid-10M |
|
language: |
|
- en |
|
tags: |
|
- text-to-video |
|
base_model: |
|
- ali-vilab/text-to-video-ms-1.7b |
|
--- |
|
# caT text to video |
|
|
|
Conditionally augmented text-to-video model. Uses pre-trained weights from modelscope text-to-video model, augmented with temporal conditioning transformers to extend generated clips and create a smooth transition between them. |
|
Supports prompt interpolation as well to change scenes during clip extensions. |
|
|
|
This model was trained at home as a hobby. |
|
|
|
Do not expect high quality samples. |
|
|
|
## Installation |
|
|
|
### Clone the Repository |
|
|
|
```bash |
|
git clone https://github.com/motexture/caT-text-to-video.git |
|
cd caT-text-to-video |
|
python3 -m venv venv |
|
source venv/bin/activate # On Windows use `venv\Scripts\activate` |
|
pip install -r requirements.txt |
|
python3 run.py |
|
``` |
|
|
|
Visit the provided URL in your browser to interact with the interface and start generating videos. |
|
|
|
Note: Ensure that you are on the latest commit, as the positional encodings have been updated compared to the initial models. |
|
|
|
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/64a86f7d03835e13f95c3687/qr-NXxvmkquF_mMlx_5P-.mp4"></video> |
|
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/64a86f7d03835e13f95c3687/32B1RPHAmieomeXWp2XvC.mp4"></video> |
|
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/64a86f7d03835e13f95c3687/40KrBvzMf8DmPO8VvATfC.mp4"></video> |
|
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/64a86f7d03835e13f95c3687/SEtFOILcwwNT4M8mXMNWt.mp4"></video> |
|
|