|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
library_name: diffusers |
|
--- |
|
|
|
# Model Card for Arc2Face |
|
|
|
<div align="center"> |
|
|
|
[**Project Page**](https://arc2face.github.io/) **|** [**Paper (ArXiv)**]() **|** [**Code**](https://github.com/foivospar/Arc2Face) |
|
|
|
|
|
</div> |
|
|
|
## Introduction |
|
|
|
Arc2Face is an ID-conditioned face model, that can generate diverse, ID-consistent photos of a person given only its ArcFace ID-embedding. |
|
It is trained on a restored version of the WebFace42M face recognition database, and is further fine-tuned on FFHQ and CelebA-HQ. |
|
|
|
<div align="center"> |
|
<img src='assets/samples_short.jpg'> |
|
</div> |
|
|
|
## Model Details |
|
|
|
It consists of 2 components: |
|
- encoder, a finetuned CLIP ViT-L/14 model |
|
- arc2face, a finetuned UNet model |
|
|
|
both of which are fine-tuned from [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5). |
|
The encoder is tailored for projecting ID-embeddings to the CLIP latent space. |
|
Arc2Face adapts the pre-trained backbone to the task of ID-to-face generation, conditioned solely on ID vectors. |
|
|
|
## Usage |
|
|
|
The models can be downloaded directly from this repository or using python: |
|
```python |
|
from huggingface_hub import hf_hub_download |
|
|
|
hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="arc2face/config.json", local_dir="./models/arc2face") |
|
hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="arc2face/diffusion_pytorch_model.safetensors", local_dir="./models/arc2face") |
|
hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="encoder/config.json", local_dir="./models/encoder") |
|
hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="encoder/pytorch_model.bin", local_dir="./models/encoder") |
|
``` |
|
|
|
Please check our [GitHub repository](https://arc2face.github.io/) for complete inference instruction. |
|
|
|
## Limitations and Bias |
|
|
|
- Only one person per image can be generated. |
|
- Poses are constrained to the frontal hemisphere, similar to FFHQ images. |
|
- The model may reflect the biases of the training data or the ID encoder. |
|
|
|
## Citation |
|
|
|
|
|
**BibTeX:** |
|
|
|
```bibtex |
|
``` |