SkyReels-A1 / README.md
diqiu7's picture
Update README.md
6aa51f8 verified
---
license: apache-2.0
base_model:
- THUDM/CogVideoX-5b-I2V
pipeline_tag: image-to-video
---
# SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers
<p align="center">
<img src="assets/logo2.png" alt="Skyreels Logo" width="60%">
</p>
<!-- <p align="center">
<img src="assets/logo.jpg" alt="Skyreels Logo" width="200">
</p> -->
<p align="center">
<a href="https://github.com/SkyworkAI/SkyReels-A1" target="_blank">馃寪 Github</a><a href="https://www.skyreels.ai/home?utm_campaign=huggingface_A1" target="_blank">馃憢 Playground</a><a href="https://discord.gg/PwM6NYtccQ" target="_blank">Discord</a>
</p>
This repo contains Diffusers style model weights for Skyreels A1 models.
You can find the inference code on [SkyReels-A1](https://github.com/SkyworkAI/SkyReels-A1) repository.
---
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/Ysbe66shplYZw2fjkFUHL.png)
Overview of SkyReels-A1 framework. Given an input video sequence and a reference portrait image, we extract facial expression-aware landmarks from the video, which serve as motion descriptors for transferring expressions onto the portrait. Utilizing a conditional video generation framework based on DiT, our approach directly integrates these facial expression-aware landmarks into the input latent space. In alignment with prior research, we employ a pose guidance mechanism constructed within a VAE architecture. This component encodes facial expression-aware landmarks as conditional input for the DiT framework, thereby enabling the model to capture essential low- dimensional visual attributes while preserving the semantic integrity of facial features.
---
Some generated results:
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/licoAeSaF-K8x7DO7SGUG.mp4"></video>
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/5q0p2jyw183fcJoeq0dvF.mp4"></video>
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/1aZweOIszlriQLRwSqnGq.mp4"></video>
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/5bfjDxGZJf-5WnGpFHppw.mp4"></video>
## Citation
If you find SkyReels-A1 useful for your research, welcome to cite our work using the following BibTeX:
```bibtex
@article{qiu2025skyreels,
title={SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers},
author={Qiu, Di and Fei, Zhengcong and Wang, Rui and Bai, Jialin and Yu, Changqian and Fan, Mingyuan and Chen, Guibin and Wen, Xiang},
journal={arXiv preprint arXiv:2502.10841},
year={2025}
}
```