English Anonymiser OpenPII (Ai4Privacy)

This model is designed to redact Personally Identifiable Information (PII) from English text. It has been fine-tuned exclusively on the English subset of the open-pii-masking-500k-ai4privacy dataset. Note:

The model redacts detected PII tokens by replacing them with generic placeholders (e.g. [SURNAME], [TIME], etc.) without preserving the original class details.
It supports only a limited set of tokens.

Evaluation Metrics

The table below summarizes the detailed evaluation results per PII label:

Label	TP	FP	FN	Accuracy	Precision	Recall	F1 Score
SURNAME	3724	0	26	99.31%	100.0%	99.31%	99.65%
O (Non-PII)	0	368	0	99.36%	n/a	n/a	n/a
TIME	1934	0	2	99.90%	100.0%	99.90%	99.95%
DRIVERLICENSENUM	505	0	2	99.61%	100.0%	99.61%	99.80%
PASSPORTNUM	566	0	0	100.0%	100.0%	100.0%	100.0%
GIVENNAME	7557	0	163	97.89%	100.0%	97.89%	98.93%
TELEPHONENUM	3637	0	4	99.89%	100.0%	99.89%	99.95%
BUILDINGNUM	418	0	8	98.12%	100.0%	98.12%	99.05%
AGE	164	0	5	97.04%	100.0%	97.04%	98.50%
DATE	2335	0	0	100.0%	100.0%	100.0%	100.0%
CITY	1717	0	85	95.28%	100.0%	95.28%	97.58%
TITLE	363	0	21	94.53%	100.0%	94.53%	97.19%
IDCARDNUM	2008	0	12	99.41%	100.0%	99.41%	99.70%
GENDER	120	0	1	99.17%	100.0%	99.17%	99.59%
CREDITCARDNUMBER	555	0	3	99.46%	100.0%	99.46%	99.73%
SEX	77	0	2	97.47%	100.0%	97.47%	98.72%
STREET	1379	0	8	99.42%	100.0%	99.42%	99.71%
TAXNUM	343	0	14	96.08%	100.0%	96.08%	98.00%
EMAIL	2607	0	1	99.96%	100.0%	99.96%	99.98%
SOCIALNUM	421	0	1	99.76%	100.0%	99.76%	99.88%
ZIPCODE	418	0	8	98.12%	100.0%	98.12%	99.05%

Overall Evaluation:

Accuracy: 99.17%
Precision: 98.82%
Recall: 98.83%
F1 Score: 98.82%
Total True Positives (TP): 30,848
Total False Positives (FP): 368
Total False Negatives (FN): 366

Macro-Averaged Metrics:

Accuracy: 98.56%
Precision: 95.24%
Recall: 93.83%
F1 Score: 94.52%

Model Behavior & Limitations

Redaction-Only:
The model replaces detected PII tokens with generic placeholders (e.g. [SURNAME], [TIME]) without retaining the specific class information.
Evaluation Focus:
The metrics shown above reflect performance on the test split of the open-pii-masking-500k-ai4privacy dataset. Real-world performance may vary and requires additional measures. Feel free to contact support (at) ai4privacy.com

Disclaimer

This model card details the evaluation metrics and fine-tuning parameters for the English anonymiser. Please note:

The model is provided as-is under the MIT License.
It is intended solely for redaction purposes and does not perform full PII classification or preserve the original PII class labels in its output.
Users should carefully evaluate its performance on their own data before deploying in production environments.

Ai4Privacy – Committed to protecting personal data in the age of AI.

ai4privacy
/

llama-ai4privacy-english-anonymiser-openpii

English Anonymiser OpenPII (Ai4Privacy)

Evaluation Metrics

Model Behavior & Limitations

Disclaimer

Dataset used to train ai4privacy/llama-ai4privacy-english-anonymiser-openpii

Collection including ai4privacy/llama-ai4privacy-english-anonymiser-openpii

PII Masking 1m

Evaluation results