File size: 5,015 Bytes
7c5766b
 
 
bfbf2d1
7c5766b
bfbf2d1
 
 
 
 
 
 
 
61ebc6a
 
 
 
 
 
 
3336599
61ebc6a
 
 
 
 
 
 
 
 
 
 
 
 
 
bfbf2d1
 
 
 
 
 
 
 
 
 
 
3cf3a1f
bfbf2d1
 
 
 
 
 
 
 
 
 
e47099a
bfbf2d1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61ebc6a
 
 
 
bfbf2d1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7a74f2f
 
bfbf2d1
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
license: mit
datasets:
  - ai4privacy/open-pii-masking-500k-ai4privacy
language:
  - en
tags:
  - pii
  - redaction
  - anonymisation
  - english
model-index:
  - name: english-anonymiser-openpii-ai4privacy
    results:
    - task:
        type: token-classification
        name: PII Masking
      dataset:
        type: ai4privacy/open-pii-masking-500k-ai4privacy
        name: Open PII Masking 500K
        split: english-validation
      metrics:
      - type: f1
        value: 0.9882
        name: F1 Score
      - type: precision
        value: 0.9882
        name: Precision
      - type: recall 
        value: 0.9883
        name: Recall
      - type: accuracy
        value: 0.9917
        name: Accuracy

metrics:
- f1
- precision
- recall
library_name: transformers
pipeline_tag: token-classification
---


# English Anonymiser OpenPII (Ai4Privacy)

This model is designed to **redact Personally Identifiable Information (PII)** from English text. It has been fine-tuned exclusively on the English subset of the [open-pii-masking-500k-ai4privacy](https://huggingface.co/datasets/ai4privacy/open-pii-masking-500k-ai4privacy) dataset.

---

## Evaluation Metrics

The table below summarizes the detailed evaluation results per PII label:

| **Label**          | **TP** | **FP** | **FN** | **Accuracy** | **Precision** | **Recall** | **F1 Score**  |
|--------------------|:------:|:------:|:------:|:------------:|:-------------:|:----------:|:-------------:|
| SURNAME            | 3724   | 0      | 26     | 99.31%       | 100.0%        | 99.31%     | 99.65%        |
| O (Non-PII)        | 0      | 368    | 0      | 99.36%       | n/a         | n/a       | n/a        |
| TIME               | 1934   | 0      | 2      | 99.90%       | 100.0%        | 99.90%     | 99.95%        |
| DRIVERLICENSENUM   | 505    | 0      | 2      | 99.61%       | 100.0%        | 99.61%     | 99.80%        |
| PASSPORTNUM        | 566    | 0      | 0      | 100.0%       | 100.0%        | 100.0%     | 100.0%        |
| GIVENNAME          | 7557   | 0      | 163    | 97.89%       | 100.0%        | 97.89%     | 98.93%        |
| TELEPHONENUM       | 3637   | 0      | 4      | 99.89%       | 100.0%        | 99.89%     | 99.95%        |
| BUILDINGNUM        | 418    | 0      | 8      | 98.12%       | 100.0%        | 98.12%     | 99.05%        |
| AGE                | 164    | 0      | 5      | 97.04%       | 100.0%        | 97.04%     | 98.50%        |
| DATE               | 2335   | 0      | 0      | 100.0%       | 100.0%        | 100.0%     | 100.0%        |
| CITY               | 1717   | 0      | 85     | 95.28%       | 100.0%        | 95.28%     | 97.58%        |
| TITLE              | 363    | 0      | 21     | 94.53%       | 100.0%        | 94.53%     | 97.19%        |
| IDCARDNUM          | 2008   | 0      | 12     | 99.41%       | 100.0%        | 99.41%     | 99.70%        |
| GENDER             | 120    | 0      | 1      | 99.17%       | 100.0%        | 99.17%     | 99.59%        |
| CREDITCARDNUMBER   | 555    | 0      | 3      | 99.46%       | 100.0%        | 99.46%     | 99.73%        |
| SEX                | 77     | 0      | 2      | 97.47%       | 100.0%        | 97.47%     | 98.72%        |
| STREET             | 1379   | 0      | 8      | 99.42%       | 100.0%        | 99.42%     | 99.71%        |
| TAXNUM             | 343    | 0      | 14     | 96.08%       | 100.0%        | 96.08%     | 98.00%        |
| EMAIL              | 2607   | 0      | 1      | 99.96%       | 100.0%        | 99.96%     | 99.98%        |
| SOCIALNUM          | 421    | 0      | 1      | 99.76%       | 100.0%        | 99.76%     | 99.88%        |
| ZIPCODE            | 418    | 0      | 8      | 98.12%       | 100.0%        | 98.12%     | 99.05%        |

**Overall Evaluation:**
- **Accuracy:** 99.17%  
- **Precision:** 98.82%  
- **Recall:** 98.83%  
- **F1 Score:** 98.82%

- **Total True Positives (TP):** 30,848  
- **Total False Positives (FP):** 368  
- **Total False Negatives (FN):** 366  

**Macro-Averaged Metrics:**
- **Accuracy:** 98.56%  
- **Precision:** 95.24%  
- **Recall:** 93.83%  
- **F1 Score:** 94.52%

---

## Model Behavior & Limitations

- **Evaluation Focus:**  
  The metrics shown above reflect performance on the test split of the [open-pii-masking-500k-ai4privacy](https://huggingface.co/datasets/ai4privacy/open-pii-masking-500k-ai4privacy) dataset. Real-world performance may vary and requires additional measures. Feel free to contact support (at) ai4privacy.com

---

## Disclaimer

This model card details the evaluation metrics and fine-tuning parameters for the English anonymiser. **Please note:**  
- The model is provided **as-is** under the MIT License.  
- It is intended solely for redaction purposes and does not perform full PII classification
- Users should carefully test and evaluate its performance on their own data before deploying in production environments.

---

*Ai4Privacy – Committed to protecting personal data in the age of AI.*