---
tags:
- text-to-image
- controlnet
---

# M<sup>3</sup>Face Model Card
We introduce M<sup>3</sup>Face, a unified multi-modal multilingual framework for controllable face generation and editing. This framework enables users to utilize only text input to generate controlling modalities automatically, for instance, semantic segmentation or facial landmarks, and subsequently generate face images.

## Getting Started

### Installation
1. Clone our repository:

   ```bash
   git clone https://huggingface.co/m3face/m3face
   cd m3face
   ```

2. Install dependencies:

   ```bash
    pip install -r requirements.txt
   ```

### Resources
- For face generation, VRAM of 10 GB+ for 512x512 images is required.
- For face editing, VRAM of 14 GB+ for 512x512 images is required.

### Pre-trained Models
You can find the checkpoints for the ControlNet model at [`m3face/FaceControlNet`](https://huggingface.co/m3face/FaceControlNet) and the mask/landmark generator model at [`m3face/FaceConditioning`](https://huggingface.co/m3face/FaceConditioning).

### M<sup>3</sup>CelebA Dataset
The M<sup>3</sup>CelebA Dataset is available at [`m3face/M3CelebA`](https://huggingface.co/m3face/M3CelebA). You can view or download it from there.

## Face Generation
You can do face generation with text, segmentation mask, facial landmarks, or a combination of them by running the following command:
```bash
python generate.py --seed 1111 \
                   --condition "landmark" \
                   --prompt "This attractive woman has narrow eyes, rosy cheeks, and wears heavy makeup." \
                   --save_condition   
```
You can define the type of conditioning modality with `--condition`. By default, a conditioning modality will be generated by our framework and will be saved if the `--save_condition` argument is given. Otherwise, you can use your condition image with the `condition_path` argument.

## Face Editing
For face editing, you can run the following command:
```bash
python edit.py --enable_xformers_memory_efficient_attention \
               --seed 1111 \
               --condition "landmark" \
               --prompt "She is a smiling." \
               --image_path "/path/to/image" \
               --condition_path "/path/to/condition" \
               --edit_condition \
               --embedding_optimize_it 500 \
               --model_finetune_it 1000 \
               --alpha 0.7 1 1.1 \
               --num_inference_steps 30 \
               --unet_layer "2and3"
```
You need to specify the input image and original conditioning modality. You can edit the face with an edit conditioning modality (specifying `--edit_condition_path`) or by editing the original conditioning modality with our framework (specifying `--edit_condition`).
The `--unet_layer` argument specifies which UNet layers in the SD to finetune.

> Note: If you don't have the original conditioning modality you can simply generate it using the `plot_mask.py` and `plot_landmark.py` scripts:
```bash
pip install git+https://github.com/mapillary/inplace_abn
python utils/plot_mask.py --image_path "/path/to/image"
python utils/plot_landmark.py --image_path "/path/to/image"
```

## Training
The code and instruction for training our models will be posted soon!