|
--- |
|
license: mit |
|
language: |
|
- en |
|
library_name: diffusers |
|
--- |
|
|
|
# Arc2Face Model Card |
|
|
|
<div align="center"> |
|
|
|
[**Project Page**](https://arc2face.github.io/) **|** [**Paper (ArXiv)**](https://arxiv.org/abs/2403.11641) **|** [**Code**](https://github.com/foivospar/Arc2Face) **|** [🤗 **Gradio demo**](https://huggingface.co./spaces/FoivosPar/Arc2Face) |
|
|
|
|
|
|
|
</div> |
|
|
|
## Introduction |
|
|
|
Arc2Face is an ID-conditioned face model, that can generate diverse, ID-consistent photos of a person given only its ArcFace ID-embedding. |
|
It is trained on a restored version of the WebFace42M face recognition database, and is further fine-tuned on FFHQ and CelebA-HQ. |
|
|
|
<div align="center"> |
|
<img src='assets/samples_short.jpg'> |
|
</div> |
|
|
|
## Model Details |
|
|
|
It consists of 2 components: |
|
- encoder, a finetuned CLIP ViT-L/14 model |
|
- arc2face, a finetuned UNet model |
|
|
|
both of which are fine-tuned from [runwayml/stable-diffusion-v1-5](https://huggingface.co./runwayml/stable-diffusion-v1-5). |
|
The encoder is tailored for projecting ID-embeddings to the CLIP latent space. |
|
Arc2Face adapts the pre-trained backbone to the task of ID-to-face generation, conditioned solely on ID vectors. |
|
|
|
## Usage |
|
|
|
The models can be downloaded directly from this repository or using python: |
|
```python |
|
from huggingface_hub import hf_hub_download |
|
|
|
hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="arc2face/config.json", local_dir="./models") |
|
hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="arc2face/diffusion_pytorch_model.safetensors", local_dir="./models") |
|
hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="encoder/config.json", local_dir="./models") |
|
hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="encoder/pytorch_model.bin", local_dir="./models") |
|
``` |
|
|
|
Please check our [GitHub repository](https://github.com/foivospar/Arc2Face) for complete inference instructions. |
|
|
|
## Limitations and Bias |
|
|
|
- Only one person per image can be generated. |
|
- Poses are constrained to the frontal hemisphere, similar to FFHQ images. |
|
- The model may reflect the biases of the training data or the ID encoder. |
|
|
|
## Citation |
|
|
|
|
|
**BibTeX:** |
|
|
|
```bibtex |
|
@misc{paraperas2024arc2face, |
|
title={Arc2Face: A Foundation Model of Human Faces}, |
|
author={Foivos Paraperas Papantoniou and Alexandros Lattas and Stylianos Moschoglou and Jiankang Deng and Bernhard Kainz and Stefanos Zafeiriou}, |
|
year={2024}, |
|
eprint={2403.11641}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CV} |
|
} |
|
``` |