SherryX
/

STAR

video super-resolution

Model card Files Files and versions Community

STAR / README.md

SherryX's picture

Update README.md

fa65054 verified 16 days ago

|

history blame contribute delete

2.74 kB

	---
	datasets:
	- nkp37/OpenVid-1M
	base_model:
	- ali-vilab/i2vgen-xl
	- THUDM/CogVideoX-5b
	tags:
	- video super-resolution
	---
	# STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

	### Code: https://github.com/NJU-PCALab/STAR
	### Paper: https://arxiv.org/abs/2501.02976
	### Project Page: https://nju-pcalab.github.io/projects/STAR
	### Demo Video: https://youtu.be/hx0zrql-SrU


	## ⚙️ Dependencies and Installation
	```
	## git clone this repository
	git clone https://github.com/NJU-PCALab/STAR.git
	cd STAR

	## create an environment
	conda create -n star python=3.10
	conda activate star
	pip install -r requirements.txt
	sudo apt-get update && apt-get install ffmpeg libsm6 libxext6 -y
	```

	## 🚀 Inference

	### Model Weight
	\| Base Model \| Type \| URL \|
	\|------------\|--------\|-----------------------------------------------------------------------------------------------\|
	\| I2VGen-XL \| Light Degradation \| [:link:](https://huggingface.co/SherryX/STAR/resolve/main/I2VGen-XL-based/light_deg.pt?download=true) \|
	\| I2VGen-XL \| Heavy Degradation \| [:link:](https://huggingface.co/SherryX/STAR/resolve/main/I2VGen-XL-based/heavy_deg.pt?download=true) \|
	\| CogVideoX-5B \| Heavy Degradation \| [:link:](https://huggingface.co/SherryX/STAR/tree/main/CogVideoX-5B-based) \|

	### 1. I2VGen-XL-based
	#### Step 1: Download the pretrained model STAR from [HuggingFace](https://huggingface.co/SherryX/STAR).
	We provide two verisions for I2VGen-XL-based model, `heavy_deg.pt` for heavy degraded videos and `light_deg.pt` for light degraded videos (e.g., the low-resolution video downloaded from video websites).

	You can put the weight into `pretrained_weight/`.

	#### Step 2: Prepare testing data
	You can put the testing videos in the `input/video/`.

	As for the prompt, there are three options: 1. No prompt. 2. Automatically generate a prompt [using Pllava](https://github.com/hpcaitech/Open-Sora/tree/main/tools/caption#pllava-captioning). 3. Manually write the prompt. You can put the txt file in the `input/text/`.


	#### Step 3: Change the path
	You need to change the paths in `video_super_resolution/scripts/inference_sr.sh` to your local corresponding paths, including `video_folder_path`, `txt_file_path`, `model_path`, and `save_dir`.


	#### Step 4: Running inference command
	```
	bash video_super_resolution/scripts/inference_sr.sh
	```
	If you encounter an OOM problem, you can set a smaller `frame_length` in `inference_sr.sh`.

	### 2. CogVideoX-based
	Refer to these [instructions](https://github.com/NJU-PCALab/STAR/tree/main/cogvideox-based#cogvideox-based-model-inference) for inference with the CogVideX-5B-based model.

	Please note that the CogVideX-5B-based model supports only 720x480 input.