File size: 2,764 Bytes
75994a3
4e116a6
75994a3
 
 
 
 
 
cbffa0b
cd6a116
 
 
cbffa0b
1a6b09c
cbffa0b
1a6b09c
cbffa0b
 
6aa51f8
cbffa0b
 
02370cb
 
9acfbd7
 
 
 
 
 
 
 
 
 
 
bf97adb
9acfbd7
bf97adb
 
 
 
 
 
e8f62f8
 
 
 
 
 
 
973b3e7
 
 
 
 
e8f62f8
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
license: apache-2.0
base_model:
- THUDM/CogVideoX-5b-I2V
pipeline_tag: image-to-video
---


# SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers
<p align="center">
  <img src="assets/logo2.png" alt="Skyreels Logo" width="60%">
</p>

<!-- <p align="center">
  <img src="assets/logo.jpg" alt="Skyreels Logo" width="200">
</p> -->

<p align="center">
<a href="https://github.com/SkyworkAI/SkyReels-A1" target="_blank">🌐 Github</a> · <a href="https://www.skyreels.ai/home?utm_campaign=huggingface_A1" target="_blank">👋 Playground</a> · <a href="https://discord.gg/PwM6NYtccQ" target="_blank">Discord</a>
</p>

This repo contains Diffusers style model weights for Skyreels A1 models. 
You can find the inference code on [SkyReels-A1](https://github.com/SkyworkAI/SkyReels-A1) repository.

---

![image/png](https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/Ysbe66shplYZw2fjkFUHL.png)
Overview of SkyReels-A1 framework. Given an input video sequence and a reference portrait image, we extract facial expression-aware landmarks from the video, which serve as motion descriptors for transferring expressions onto the portrait. Utilizing a conditional video generation framework based on DiT, our approach directly integrates these facial expression-aware landmarks into the input latent space. In alignment with prior research, we employ a pose guidance mechanism constructed within a VAE architecture. This component encodes facial expression-aware landmarks as conditional input for the DiT framework, thereby enabling the model to capture essential low- dimensional visual attributes while preserving the semantic integrity of facial features.


---
Some generated results:


<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/licoAeSaF-K8x7DO7SGUG.mp4"></video>

<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/5q0p2jyw183fcJoeq0dvF.mp4"></video>


<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/1aZweOIszlriQLRwSqnGq.mp4"></video>


<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/5bfjDxGZJf-5WnGpFHppw.mp4"></video> 


## Citation

If you find SkyReels-A1 useful for your research, welcome to cite our work using the following BibTeX:
```bibtex
@article{qiu2025skyreels,
  title={SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers},
  author={Qiu, Di and Fei, Zhengcong and Wang, Rui and Bai, Jialin and Yu, Changqian and Fan, Mingyuan and Chen, Guibin and Wen, Xiang},
  journal={arXiv preprint arXiv:2502.10841},
  year={2025}
}
```