English Anonymiser OpenPII (Ai4Privacy)

This model is designed to redact Personally Identifiable Information (PII) from English text. It has been fine-tuned exclusively on the English subset of the open-pii-masking-500k-ai4privacy dataset. Note:

  • The model redacts detected PII tokens by replacing them with generic placeholders (e.g. [SURNAME], [TIME], etc.) without preserving the original class details.
  • It supports only a limited set of tokens.

Evaluation Metrics

The table below summarizes the detailed evaluation results per PII label:

Label TP FP FN Accuracy Precision Recall F1 Score
SURNAME 3724 0 26 99.31% 100.0% 99.31% 99.65%
O (Non-PII) 0 368 0 99.36% n/a n/a n/a
TIME 1934 0 2 99.90% 100.0% 99.90% 99.95%
DRIVERLICENSENUM 505 0 2 99.61% 100.0% 99.61% 99.80%
PASSPORTNUM 566 0 0 100.0% 100.0% 100.0% 100.0%
GIVENNAME 7557 0 163 97.89% 100.0% 97.89% 98.93%
TELEPHONENUM 3637 0 4 99.89% 100.0% 99.89% 99.95%
BUILDINGNUM 418 0 8 98.12% 100.0% 98.12% 99.05%
AGE 164 0 5 97.04% 100.0% 97.04% 98.50%
DATE 2335 0 0 100.0% 100.0% 100.0% 100.0%
CITY 1717 0 85 95.28% 100.0% 95.28% 97.58%
TITLE 363 0 21 94.53% 100.0% 94.53% 97.19%
IDCARDNUM 2008 0 12 99.41% 100.0% 99.41% 99.70%
GENDER 120 0 1 99.17% 100.0% 99.17% 99.59%
CREDITCARDNUMBER 555 0 3 99.46% 100.0% 99.46% 99.73%
SEX 77 0 2 97.47% 100.0% 97.47% 98.72%
STREET 1379 0 8 99.42% 100.0% 99.42% 99.71%
TAXNUM 343 0 14 96.08% 100.0% 96.08% 98.00%
EMAIL 2607 0 1 99.96% 100.0% 99.96% 99.98%
SOCIALNUM 421 0 1 99.76% 100.0% 99.76% 99.88%
ZIPCODE 418 0 8 98.12% 100.0% 98.12% 99.05%

Overall Evaluation:

  • Accuracy: 99.17%

  • Precision: 98.82%

  • Recall: 98.83%

  • F1 Score: 98.82%

  • Total True Positives (TP): 30,848

  • Total False Positives (FP): 368

  • Total False Negatives (FN): 366

Macro-Averaged Metrics:

  • Accuracy: 98.56%
  • Precision: 95.24%
  • Recall: 93.83%
  • F1 Score: 94.52%

Model Behavior & Limitations

  • Redaction-Only:
    The model replaces detected PII tokens with generic placeholders (e.g. [SURNAME], [TIME]) without retaining the specific class information.
  • Evaluation Focus:
    The metrics shown above reflect performance on the test split of the open-pii-masking-500k-ai4privacy dataset. Real-world performance may vary and requires additional measures. Feel free to contact support (at) ai4privacy.com

Disclaimer

This model card details the evaluation metrics and fine-tuning parameters for the English anonymiser. Please note:

  • The model is provided as-is under the MIT License.
  • It is intended solely for redaction purposes and does not perform full PII classification or preserve the original PII class labels in its output.
  • Users should carefully evaluate its performance on their own data before deploying in production environments.

Ai4Privacy – Committed to protecting personal data in the age of AI.

Downloads last month
248
Safetensors
Model size
150M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Dataset used to train ai4privacy/llama-ai4privacy-english-anonymiser-openpii

Collection including ai4privacy/llama-ai4privacy-english-anonymiser-openpii

Evaluation results