File size: 3,331 Bytes
e91cd7d
 
994a8ea
 
 
 
 
 
 
 
 
 
e91cd7d
994a8ea
a753b64
994a8ea
 
ede4598
a753b64
27f775f
 
 
a753b64
994a8ea
 
a753b64
994a8ea
 
27f775f
 
 
a753b64
994a8ea
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
---
license: mit
datasets:
- imdb
language:
- en
metrics:
- accuracy
pipeline_tag: object-detection
tags:
- biology
- medical
---

# Intro 简介
Conducting age and gender recognition in real-world scenarios is a task replete with challenges: not only are there variable environmental conditions, complex poses, and differences in image quality, but there are also situations where the face is partially or fully obscured. MiVOLO is a straightforward approach that leverages the latest visual Transformer for age and gender estimation. This method integrates these two tasks into a unified dual-input/output model, utilizing not only facial information but also full-body image data. This enhances the model's generalization capabilities, allowing it to provide satisfactory results even when the face is not visible in the image. To evaluate the model, experiments were conducted on four popular benchmark datasets, achieving state-of-the-art performance while also demonstrating the ability to process in real-time. Additionally, a new benchmark dataset was introduced based on images from the Open Images dataset. The ground truth annotations of this benchmark were meticulously created by human annotators and ensured high accuracy through intelligent aggregation of voting results. Furthermore, the model's age recognition performance was compared with human-level accuracy, showing a clear superiority over humans across most age ranges. Finally, public access to the model was provided, along with code for verification and inference. Moreover, additional annotations for the datasets used were supplied, and the new benchmark dataset was introduced.

在现实场景进行年龄和性别识别是一项充满挑战的任务: 除了环境条件的多变、姿势复杂和图像质量的差异外, 还存在面部部分或完全遮挡的情况。MiVOLO 是一种简单的方法, 利用最新的视觉 Transformer 进行年龄和性别估计。该方法将这两个任务整合到一个统一的双输入/输出模型中, 不仅利用了面部信息, 还利用了人物图像数据。这提高了模型的泛化能力, 使其能够在图像中面部不可见时依然能够提供令人满意的结果。为了评估该模型, 在四个流行的基准数据集上进行了实验, 并取得了最先进的性能, 同时展示了实时处理的能力。此外, 还基于 Open Images 数据集中的图像引入了一种新的基准数据集。该基准数据集的地面真实标注由人类标注者精心生成, 并通过智能汇总投票结果来保证高准确性。此外, 将模型的年龄识别性能与人类水平的准确性进行了比较, 并展示了在大多数年龄范围内明显优于人类的表现。最后, 向公众提供了对模型的访问权限, 以及用于验证和推理的代码。此外, 还为使用的数据集提供了额外的注释, 并介绍了新的基准数据集。

## Demo 在线体验
<https://www.modelscope.cn/studios/Genius-Society/MiVOLO>

## Usage 使用
```python
from modelscope import snapshot_download
model_dir = snapshot_download("Genius-Society/MiVOLO")
```

## Mirror 镜像
<https://www.modelscope.cn/models/Genius-Society/MiVOLO>

## Reference 参考引用
[1] <a href="https://arxiv.org/pdf/2307.04616">MiVOLO: Multi-input Transformer for Age and Gender Estimation</a>