|
--- |
|
license: apache-2.0 |
|
base_model: |
|
- THUDM/CogVideoX-5b-I2V |
|
pipeline_tag: image-to-video |
|
--- |
|
|
|
|
|
# SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers |
|
<p align="center"> |
|
<img src="assets/logo2.png" alt="Skyreels Logo" width="60%"> |
|
</p> |
|
|
|
<!-- <p align="center"> |
|
<img src="assets/logo.jpg" alt="Skyreels Logo" width="200"> |
|
</p> --> |
|
|
|
<p align="center"> |
|
<a href="https://github.com/SkyworkAI/SkyReels-A1" target="_blank">馃寪 Github</a> 路 <a href="https://www.skyreels.ai/home?utm_campaign=huggingface_A1" target="_blank">馃憢 Playground</a> 路 <a href="https://discord.gg/PwM6NYtccQ" target="_blank">Discord</a> |
|
</p> |
|
|
|
This repo contains Diffusers style model weights for Skyreels A1 models. |
|
You can find the inference code on [SkyReels-A1](https://github.com/SkyworkAI/SkyReels-A1) repository. |
|
|
|
--- |
|
|
|
 |
|
Overview of SkyReels-A1 framework. Given an input video sequence and a reference portrait image, we extract facial expression-aware landmarks from the video, which serve as motion descriptors for transferring expressions onto the portrait. Utilizing a conditional video generation framework based on DiT, our approach directly integrates these facial expression-aware landmarks into the input latent space. In alignment with prior research, we employ a pose guidance mechanism constructed within a VAE architecture. This component encodes facial expression-aware landmarks as conditional input for the DiT framework, thereby enabling the model to capture essential low- dimensional visual attributes while preserving the semantic integrity of facial features. |
|
|
|
|
|
--- |
|
Some generated results: |
|
|
|
|
|
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/licoAeSaF-K8x7DO7SGUG.mp4"></video> |
|
|
|
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/5q0p2jyw183fcJoeq0dvF.mp4"></video> |
|
|
|
|
|
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/1aZweOIszlriQLRwSqnGq.mp4"></video> |
|
|
|
|
|
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/5bfjDxGZJf-5WnGpFHppw.mp4"></video> |
|
|
|
|
|
## Citation |
|
|
|
If you find SkyReels-A1 useful for your research, welcome to cite our work using the following BibTeX: |
|
```bibtex |
|
@article{qiu2025skyreels, |
|
title={SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers}, |
|
author={Qiu, Di and Fei, Zhengcong and Wang, Rui and Bai, Jialin and Yu, Changqian and Fan, Mingyuan and Chen, Guibin and Wen, Xiang}, |
|
journal={arXiv preprint arXiv:2502.10841}, |
|
year={2025} |
|
} |
|
``` |
|
|