|
# Distributed Arcface Training in Pytorch
|
|
|
|
This is a deep learning library that makes face recognition efficient, and effective, which can train tens of millions
|
|
identity on a single server.
|
|
|
|
## Requirements
|
|
|
|
- Install [pytorch](http://pytorch.org) (torch>=1.6.0), our doc for [install.md](docs/install.md).
|
|
- `pip install -r requirements.txt`.
|
|
- Download the dataset
|
|
from [https://github.com/deepinsight/insightface/tree/master/recognition/_datasets_](https://github.com/deepinsight/insightface/tree/master/recognition/_datasets_)
|
|
.
|
|
|
|
## How to Training
|
|
|
|
To train a model, run `train.py` with the path to the configs:
|
|
|
|
### 1. Single node, 8 GPUs:
|
|
|
|
```shell
|
|
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr="127.0.0.1" --master_port=1234 train.py configs/ms1mv3_r50
|
|
```
|
|
|
|
### 2. Multiple nodes, each node 8 GPUs:
|
|
|
|
Node 0:
|
|
|
|
```shell
|
|
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=2 --node_rank=0 --master_addr="ip1" --master_port=1234 train.py train.py configs/ms1mv3_r50
|
|
```
|
|
|
|
Node 1:
|
|
|
|
```shell
|
|
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=2 --node_rank=1 --master_addr="ip1" --master_port=1234 train.py train.py configs/ms1mv3_r50
|
|
```
|
|
|
|
### 3.Training resnet2060 with 8 GPUs:
|
|
|
|
```shell
|
|
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr="127.0.0.1" --master_port=1234 train.py configs/ms1mv3_r2060.py
|
|
```
|
|
|
|
## Model Zoo
|
|
|
|
- The models are available for non-commercial research purposes only.
|
|
- All models can be found in here.
|
|
- [Baidu Yun Pan](https://pan.baidu.com/s/1CL-l4zWqsI1oDuEEYVhj-g): e8pw
|
|
- [onedrive](https://1drv.ms/u/s!AswpsDO2toNKq0lWY69vN58GR6mw?e=p9Ov5d)
|
|
|
|
### Performance on [**ICCV2021-MFR**](http://iccv21-mfr.com/)
|
|
|
|
ICCV2021-MFR testset consists of non-celebrities so we can ensure that it has very few overlap with public available face
|
|
recognition training set, such as MS1M and CASIA as they mostly collected from online celebrities.
|
|
As the result, we can evaluate the FAIR performance for different algorithms.
|
|
|
|
For **ICCV2021-MFR-ALL** set, TAR is measured on all-to-all 1:1 protocal, with FAR less than 0.000001(e-6). The
|
|
globalised multi-racial testset contains 242,143 identities and 1,624,305 images.
|
|
|
|
For **ICCV2021-MFR-MASK** set, TAR is measured on mask-to-nonmask 1:1 protocal, with FAR less than 0.0001(e-4).
|
|
Mask testset contains 6,964 identities, 6,964 masked images and 13,928 non-masked images.
|
|
There are totally 13,928 positive pairs and 96,983,824 negative pairs.
|
|
|
|
| Datasets | backbone | Training throughout | Size / MB | **ICCV2021-MFR-MASK** | **ICCV2021-MFR-ALL** |
|
|
| :---: | :--- | :--- | :--- |:--- |:--- |
|
|
| MS1MV3 | r18 | - | 91 | **47.85** | **68.33** |
|
|
| Glint360k | r18 | 8536 | 91 | **53.32** | **72.07** |
|
|
| MS1MV3 | r34 | - | 130 | **58.72** | **77.36** |
|
|
| Glint360k | r34 | 6344 | 130 | **65.10** | **83.02** |
|
|
| MS1MV3 | r50 | 5500 | 166 | **63.85** | **80.53** |
|
|
| Glint360k | r50 | 5136 | 166 | **70.23** | **87.08** |
|
|
| MS1MV3 | r100 | - | 248 | **69.09** | **84.31** |
|
|
| Glint360k | r100 | 3332 | 248 | **75.57** | **90.66** |
|
|
| MS1MV3 | mobilefacenet | 12185 | 7.8 | **41.52** | **65.26** |
|
|
| Glint360k | mobilefacenet | 11197 | 7.8 | **44.52** | **66.48** |
|
|
|
|
### Performance on IJB-C and Verification Datasets
|
|
|
|
| Datasets | backbone | IJBC(1e-05) | IJBC(1e-04) | agedb30 | cfp_fp | lfw | log |
|
|
| :---: | :--- | :--- | :--- | :--- |:--- |:--- |:--- |
|
|
| MS1MV3 | r18 | 92.07 | 94.66 | 97.77 | 97.73 | 99.77 |[log](https://raw.githubusercontent.com/anxiangsir/insightface_arcface_log/master/ms1mv3_arcface_r18_fp16/training.log)|
|
|
| MS1MV3 | r34 | 94.10 | 95.90 | 98.10 | 98.67 | 99.80 |[log](https://raw.githubusercontent.com/anxiangsir/insightface_arcface_log/master/ms1mv3_arcface_r34_fp16/training.log)|
|
|
| MS1MV3 | r50 | 94.79 | 96.46 | 98.35 | 98.96 | 99.83 |[log](https://raw.githubusercontent.com/anxiangsir/insightface_arcface_log/master/ms1mv3_arcface_r50_fp16/training.log)|
|
|
| MS1MV3 | r100 | 95.31 | 96.81 | 98.48 | 99.06 | 99.85 |[log](https://raw.githubusercontent.com/anxiangsir/insightface_arcface_log/master/ms1mv3_arcface_r100_fp16/training.log)|
|
|
| MS1MV3 | **r2060**| 95.34 | 97.11 | 98.67 | 99.24 | 99.87 |[log](https://raw.githubusercontent.com/anxiangsir/insightface_arcface_log/master/ms1mv3_arcface_r2060_fp16/training.log)|
|
|
| Glint360k |r18-0.1 | 93.16 | 95.33 | 97.72 | 97.73 | 99.77 |[log](https://raw.githubusercontent.com/anxiangsir/insightface_arcface_log/master/glint360k_cosface_r18_fp16_0.1/training.log)|
|
|
| Glint360k |r34-0.1 | 95.16 | 96.56 | 98.33 | 98.78 | 99.82 |[log](https://raw.githubusercontent.com/anxiangsir/insightface_arcface_log/master/glint360k_cosface_r34_fp16_0.1/training.log)|
|
|
| Glint360k |r50-0.1 | 95.61 | 96.97 | 98.38 | 99.20 | 99.83 |[log](https://raw.githubusercontent.com/anxiangsir/insightface_arcface_log/master/glint360k_cosface_r50_fp16_0.1/training.log)|
|
|
| Glint360k |r100-0.1 | 95.88 | 97.32 | 98.48 | 99.29 | 99.82 |[log](https://raw.githubusercontent.com/anxiangsir/insightface_arcface_log/master/glint360k_cosface_r100_fp16_0.1/training.log)|
|
|
|
|
[comment]: <> (More details see [model.md](docs/modelzoo.md) in docs.)
|
|
|
|
|
|
## [Speed Benchmark](docs/speed_benchmark.md)
|
|
|
|
**Arcface Torch** can train large-scale face recognition training set efficiently and quickly. When the number of
|
|
classes in training sets is greater than 300K and the training is sufficient, partial fc sampling strategy will get same
|
|
accuracy with several times faster training performance and smaller GPU memory.
|
|
Partial FC is a sparse variant of the model parallel architecture for large sacle face recognition. Partial FC use a
|
|
sparse softmax, where each batch dynamicly sample a subset of class centers for training. In each iteration, only a
|
|
sparse part of the parameters will be updated, which can reduce a lot of GPU memory and calculations. With Partial FC,
|
|
we can scale trainset of 29 millions identities, the largest to date. Partial FC also supports multi-machine distributed
|
|
training and mixed precision training.
|
|
|
|

|
|
|
|
More details see
|
|
[speed_benchmark.md](docs/speed_benchmark.md) in docs.
|
|
|
|
### 1. Training speed of different parallel methods (samples / second), Tesla V100 32GB * 8. (Larger is better)
|
|
|
|
`-` means training failed because of gpu memory limitations.
|
|
|
|
| Number of Identities in Dataset | Data Parallel | Model Parallel | Partial FC 0.1 |
|
|
| :--- | :--- | :--- | :--- |
|
|
|125000 | 4681 | 4824 | 5004 |
|
|
|1400000 | **1672** | 3043 | 4738 |
|
|
|5500000 | **-** | **1389** | 3975 |
|
|
|8000000 | **-** | **-** | 3565 |
|
|
|16000000 | **-** | **-** | 2679 |
|
|
|29000000 | **-** | **-** | **1855** |
|
|
|
|
### 2. GPU memory cost of different parallel methods (MB per GPU), Tesla V100 32GB * 8. (Smaller is better)
|
|
|
|
| Number of Identities in Dataset | Data Parallel | Model Parallel | Partial FC 0.1 |
|
|
| :--- | :--- | :--- | :--- |
|
|
|125000 | 7358 | 5306 | 4868 |
|
|
|1400000 | 32252 | 11178 | 6056 |
|
|
|5500000 | **-** | 32188 | 9854 |
|
|
|8000000 | **-** | **-** | 12310 |
|
|
|16000000 | **-** | **-** | 19950 |
|
|
|29000000 | **-** | **-** | 32324 |
|
|
|
|
## Evaluation ICCV2021-MFR and IJB-C
|
|
|
|
More details see [eval.md](docs/eval.md) in docs.
|
|
|
|
## Test
|
|
|
|
We tested many versions of PyTorch. Please create an issue if you are having trouble.
|
|
|
|
- [x] torch 1.6.0
|
|
- [x] torch 1.7.1
|
|
- [x] torch 1.8.0
|
|
- [x] torch 1.9.0
|
|
|
|
## Citation
|
|
|
|
```
|
|
@inproceedings{deng2019arcface,
|
|
title={Arcface: Additive angular margin loss for deep face recognition},
|
|
author={Deng, Jiankang and Guo, Jia and Xue, Niannan and Zafeiriou, Stefanos},
|
|
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
|
|
pages={4690--4699},
|
|
year={2019}
|
|
}
|
|
@inproceedings{an2020partical_fc,
|
|
title={Partial FC: Training 10 Million Identities on a Single Machine},
|
|
author={An, Xiang and Zhu, Xuhan and Xiao, Yang and Wu, Lan and Zhang, Ming and Gao, Yuan and Qin, Bin and
|
|
Zhang, Debing and Fu Ying},
|
|
booktitle={Arxiv 2010.05222},
|
|
year={2020}
|
|
}
|
|
```
|
|
|