AI-Enthusiast11
/

pii-entity-extractor

@@ -5,7 +5,7 @@ tags: [token-classification, ner, deberta, privacy, pii-detection]
 # Model Card for PII Detection with DeBERTa
-This model is a fine-tuned version of [`microsoft/deberta`](https://huggingface.co/microsoft/deberta) for Named Entity Recognition (NER), specifically designed for detecting Personally Identifiable Information (PII) entities like names, SSNs, phone numbers, credit card numbers, addresses, and more.
 ## Model Details
@@ -19,29 +19,62 @@ This transformer-based model is fine-tuned on a custom dataset to detect sensiti
 - **Language(s):** English
 - **Use case:** PII detection in text
-## Uses
-### Direct Use
-This model can be used to identify PII entities in unstructured text. It is suitable for use in privacy compliance systems, redaction tools, or data anonymization pipelines.
-### Downstream Use
-It can be plugged into pipelines for data privacy, automated redaction in chat logs or documents, or regulatory compliance tools.
-### Out-of-Scope Use
-- Not recommended for non-English text
-- Should not be used as a standalone solution for legal compliance without human verification
-- Not suited for medical PII unless fine-tuned further
-## Bias, Risks, and Limitations
-While the model performs well on the provided dataset, it may miss edge cases or non-standard PII formats. False positives or negatives may occur in unseen contexts or informal text.
 ### Recommendations
 - Use human review in high-risk environments.

 # Model Card for PII Detection with DeBERTa
+This model is a fine-tuned version of [`microsoft/deberta`](https://huggingface.co/microsoft/deberta-v3-base) for Named Entity Recognition (NER), specifically designed for detecting Personally Identifiable Information (PII) entities like names, SSNs, phone numbers, credit card numbers, addresses, and more.
 ## Model Details
 - **Language(s):** English
 - **Use case:** PII detection in text
+# Training Details
+## Training Data
+The model was fine-tuned on a custom dataset containing labeled examples of the following PII entity types:
+- NAME
+- SSN
+- PHONE-NO
+- CREDIT-CARD-NO
+- BANK-ACCOUNT-NO
+- BANK-ROUTING-NO
+- ADDRESS
+### Epoch Logs
+| Epoch | Train Loss | Val Loss | Precision | Recall | F1     | Accuracy |
+|-------|------------|----------|-----------|--------|--------|----------|
+| 1     | 0.3672     | 0.1987   | 0.7806    | 0.8114 | 0.7957 | 0.9534   |
+| 2     | 0.1149     | 0.1011   | 0.9161    | 0.9772 | 0.9457 | 0.9797   |
+| 3     | 0.0795     | 0.0889   | 0.9264    | 0.9825 | 0.9536 | 0.9813   |
+| 4     | 0.0708     | 0.0880   | 0.9242    | 0.9842 | 0.9533 | 0.9806   |
+| 5     | 0.0626     | 0.0858   | 0.9235    | 0.9851 | 0.9533 | 0.9806   |
+## SeqEval Classification Report
+| Label            | Precision | Recall | F1-score | Support |
+|------------------|-----------|--------|----------|---------|
+| ADDRESS          | 0.91      | 0.94   | 0.92     | 77      |
+| BANK-ACCOUNT-NO  | 0.91      | 0.99   | 0.95     | 169     |
+| BANK-ROUTING-NO  | 0.85      | 0.96   | 0.90     | 104     |
+| CREDIT-CARD-NO   | 0.95      | 1.00   | 0.97     | 228     |
+| NAME             | 0.98      | 0.97   | 0.97     | 164     |
+| PHONE-NO         | 0.94      | 0.99   | 0.96     | 308     |
+| SSN              | 0.87      | 1.00   | 0.93     | 90      |
+### Summary
+- **Micro avg:** 0.95
+- **Macro avg:** 0.95
+- **Weighted avg:** 0.95
+## Evaluation
+### Testing Data
+Evaluation was done on a held-out portion of the same labeled dataset.
+### Metrics
+- Precision
+- Recall
+- F1 (via seqeval)
+- Entity-wise breakdown
+- Token-level accuracy
+### Results
+- F1-score consistently above 0.95 for most labels, showing robustness in PII detection.
+-
 ### Recommendations
 - Use human review in high-risk environments.