Skywork
/

SkyReels-A1

CogVideoXImageToVideoPipeline

Model card Files Files and versions Community

SkyReels-A1 / README.md

diqiu7's picture

Update README.md

6aa51f8 verified 3 days ago

|

history blame contribute delete

2.76 kB

	---
	license: apache-2.0
	base_model:
	- THUDM/CogVideoX-5b-I2V
	pipeline_tag: image-to-video
	---


	# SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers
	<p align="center">
	<img src="assets/logo2.png" alt="Skyreels Logo" width="60%">
	</p>

	<!-- <p align="center">
	<img src="assets/logo.jpg" alt="Skyreels Logo" width="200">
	</p> -->

	<p align="center">
	<a href="https://github.com/SkyworkAI/SkyReels-A1" target="_blank">🌐 Github</a> · <a href="https://www.skyreels.ai/home?utm_campaign=huggingface_A1" target="_blank">👋 Playground</a> · <a href="https://discord.gg/PwM6NYtccQ" target="_blank">Discord</a>
	</p>

	This repo contains Diffusers style model weights for Skyreels A1 models.
	You can find the inference code on [SkyReels-A1](https://github.com/SkyworkAI/SkyReels-A1) repository.

	---

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/Ysbe66shplYZw2fjkFUHL.png)
	Overview of SkyReels-A1 framework. Given an input video sequence and a reference portrait image, we extract facial expression-aware landmarks from the video, which serve as motion descriptors for transferring expressions onto the portrait. Utilizing a conditional video generation framework based on DiT, our approach directly integrates these facial expression-aware landmarks into the input latent space. In alignment with prior research, we employ a pose guidance mechanism constructed within a VAE architecture. This component encodes facial expression-aware landmarks as conditional input for the DiT framework, thereby enabling the model to capture essential low- dimensional visual attributes while preserving the semantic integrity of facial features.


	---
	Some generated results:


	<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/licoAeSaF-K8x7DO7SGUG.mp4"></video>

	<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/5q0p2jyw183fcJoeq0dvF.mp4"></video>


	<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/1aZweOIszlriQLRwSqnGq.mp4"></video>


	<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/5bfjDxGZJf-5WnGpFHppw.mp4"></video>


	## Citation

	If you find SkyReels-A1 useful for your research, welcome to cite our work using the following BibTeX:
	```bibtex
	@article{qiu2025skyreels,
	title={SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers},
	author={Qiu, Di and Fei, Zhengcong and Wang, Rui and Bai, Jialin and Yu, Changqian and Fan, Mingyuan and Chen, Guibin and Wen, Xiang},
	journal={arXiv preprint arXiv:2502.10841},
	year={2025}
	}
	```