File size: 2,764 Bytes
75994a3 4e116a6 75994a3 cbffa0b cd6a116 cbffa0b 1a6b09c cbffa0b 1a6b09c cbffa0b 6aa51f8 cbffa0b 02370cb 9acfbd7 bf97adb 9acfbd7 bf97adb e8f62f8 973b3e7 e8f62f8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
---
license: apache-2.0
base_model:
- THUDM/CogVideoX-5b-I2V
pipeline_tag: image-to-video
---
# SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers
<p align="center">
<img src="assets/logo2.png" alt="Skyreels Logo" width="60%">
</p>
<!-- <p align="center">
<img src="assets/logo.jpg" alt="Skyreels Logo" width="200">
</p> -->
<p align="center">
<a href="https://github.com/SkyworkAI/SkyReels-A1" target="_blank">🌐 Github</a> · <a href="https://www.skyreels.ai/home?utm_campaign=huggingface_A1" target="_blank">👋 Playground</a> · <a href="https://discord.gg/PwM6NYtccQ" target="_blank">Discord</a>
</p>
This repo contains Diffusers style model weights for Skyreels A1 models.
You can find the inference code on [SkyReels-A1](https://github.com/SkyworkAI/SkyReels-A1) repository.
---

Overview of SkyReels-A1 framework. Given an input video sequence and a reference portrait image, we extract facial expression-aware landmarks from the video, which serve as motion descriptors for transferring expressions onto the portrait. Utilizing a conditional video generation framework based on DiT, our approach directly integrates these facial expression-aware landmarks into the input latent space. In alignment with prior research, we employ a pose guidance mechanism constructed within a VAE architecture. This component encodes facial expression-aware landmarks as conditional input for the DiT framework, thereby enabling the model to capture essential low- dimensional visual attributes while preserving the semantic integrity of facial features.
---
Some generated results:
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/licoAeSaF-K8x7DO7SGUG.mp4"></video>
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/5q0p2jyw183fcJoeq0dvF.mp4"></video>
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/1aZweOIszlriQLRwSqnGq.mp4"></video>
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/5bfjDxGZJf-5WnGpFHppw.mp4"></video>
## Citation
If you find SkyReels-A1 useful for your research, welcome to cite our work using the following BibTeX:
```bibtex
@article{qiu2025skyreels,
title={SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers},
author={Qiu, Di and Fei, Zhengcong and Wang, Rui and Bai, Jialin and Yu, Changqian and Fan, Mingyuan and Chen, Guibin and Wen, Xiang},
journal={arXiv preprint arXiv:2502.10841},
year={2025}
}
```
|