Update README.md

7a74f2f verified 11 days ago

5.02 kB

	---
	license: mit
	datasets:
	- ai4privacy/open-pii-masking-500k-ai4privacy
	language:
	- en
	tags:
	- pii
	- redaction
	- anonymisation
	- english
	model-index:
	- name: english-anonymiser-openpii-ai4privacy
	results:
	- task:
	type: token-classification
	name: PII Masking
	dataset:
	type: ai4privacy/open-pii-masking-500k-ai4privacy
	name: Open PII Masking 500K
	split: english-validation
	metrics:
	- type: f1
	value: 0.9882
	name: F1 Score
	- type: precision
	value: 0.9882
	name: Precision
	- type: recall
	value: 0.9883
	name: Recall
	- type: accuracy
	value: 0.9917
	name: Accuracy

	metrics:
	- f1
	- precision
	- recall
	library_name: transformers
	pipeline_tag: token-classification
	---


	# English Anonymiser OpenPII (Ai4Privacy)

	This model is designed to redact Personally Identifiable Information (PII) from English text. It has been fine-tuned exclusively on the English subset of the [open-pii-masking-500k-ai4privacy](https://huggingface.co/datasets/ai4privacy/open-pii-masking-500k-ai4privacy) dataset.

	---

	## Evaluation Metrics

	The table below summarizes the detailed evaluation results per PII label:

	\| Label \| TP \| FP \| FN \| Accuracy \| Precision \| Recall \| F1 Score \|
	\|--------------------\|:------:\|:------:\|:------:\|:------------:\|:-------------:\|:----------:\|:-------------:\|
	\| SURNAME \| 3724 \| 0 \| 26 \| 99.31% \| 100.0% \| 99.31% \| 99.65% \|
	\| O (Non-PII) \| 0 \| 368 \| 0 \| 99.36% \| n/a \| n/a \| n/a \|
	\| TIME \| 1934 \| 0 \| 2 \| 99.90% \| 100.0% \| 99.90% \| 99.95% \|
	\| DRIVERLICENSENUM \| 505 \| 0 \| 2 \| 99.61% \| 100.0% \| 99.61% \| 99.80% \|
	\| PASSPORTNUM \| 566 \| 0 \| 0 \| 100.0% \| 100.0% \| 100.0% \| 100.0% \|
	\| GIVENNAME \| 7557 \| 0 \| 163 \| 97.89% \| 100.0% \| 97.89% \| 98.93% \|
	\| TELEPHONENUM \| 3637 \| 0 \| 4 \| 99.89% \| 100.0% \| 99.89% \| 99.95% \|
	\| BUILDINGNUM \| 418 \| 0 \| 8 \| 98.12% \| 100.0% \| 98.12% \| 99.05% \|
	\| AGE \| 164 \| 0 \| 5 \| 97.04% \| 100.0% \| 97.04% \| 98.50% \|
	\| DATE \| 2335 \| 0 \| 0 \| 100.0% \| 100.0% \| 100.0% \| 100.0% \|
	\| CITY \| 1717 \| 0 \| 85 \| 95.28% \| 100.0% \| 95.28% \| 97.58% \|
	\| TITLE \| 363 \| 0 \| 21 \| 94.53% \| 100.0% \| 94.53% \| 97.19% \|
	\| IDCARDNUM \| 2008 \| 0 \| 12 \| 99.41% \| 100.0% \| 99.41% \| 99.70% \|
	\| GENDER \| 120 \| 0 \| 1 \| 99.17% \| 100.0% \| 99.17% \| 99.59% \|
	\| CREDITCARDNUMBER \| 555 \| 0 \| 3 \| 99.46% \| 100.0% \| 99.46% \| 99.73% \|
	\| SEX \| 77 \| 0 \| 2 \| 97.47% \| 100.0% \| 97.47% \| 98.72% \|
	\| STREET \| 1379 \| 0 \| 8 \| 99.42% \| 100.0% \| 99.42% \| 99.71% \|
	\| TAXNUM \| 343 \| 0 \| 14 \| 96.08% \| 100.0% \| 96.08% \| 98.00% \|
	\| EMAIL \| 2607 \| 0 \| 1 \| 99.96% \| 100.0% \| 99.96% \| 99.98% \|
	\| SOCIALNUM \| 421 \| 0 \| 1 \| 99.76% \| 100.0% \| 99.76% \| 99.88% \|
	\| ZIPCODE \| 418 \| 0 \| 8 \| 98.12% \| 100.0% \| 98.12% \| 99.05% \|

	Overall Evaluation:
	- Accuracy: 99.17%
	- Precision: 98.82%
	- Recall: 98.83%
	- F1 Score: 98.82%

	- Total True Positives (TP): 30,848
	- Total False Positives (FP): 368
	- Total False Negatives (FN): 366

	Macro-Averaged Metrics:
	- Accuracy: 98.56%
	- Precision: 95.24%
	- Recall: 93.83%
	- F1 Score: 94.52%

	---

	## Model Behavior & Limitations

	- Evaluation Focus:
	The metrics shown above reflect performance on the test split of the [open-pii-masking-500k-ai4privacy](https://huggingface.co/datasets/ai4privacy/open-pii-masking-500k-ai4privacy) dataset. Real-world performance may vary and requires additional measures. Feel free to contact support (at) ai4privacy.com

	---

	## Disclaimer

	This model card details the evaluation metrics and fine-tuning parameters for the English anonymiser. Please note:
	- The model is provided as-is under the MIT License.
	- It is intended solely for redaction purposes and does not perform full PII classification
	- Users should carefully test and evaluate its performance on their own data before deploying in production environments.

	---

	Ai4Privacy – Committed to protecting personal data in the age of AI.