Update README.md
Browse files
README.md
CHANGED
@@ -5,7 +5,7 @@ tags: [token-classification, ner, deberta, privacy, pii-detection]
|
|
5 |
|
6 |
# Model Card for PII Detection with DeBERTa
|
7 |
|
8 |
-
This model is a fine-tuned version of [`microsoft/deberta`](https://huggingface.co/microsoft/deberta) for Named Entity Recognition (NER), specifically designed for detecting Personally Identifiable Information (PII) entities like names, SSNs, phone numbers, credit card numbers, addresses, and more.
|
9 |
|
10 |
## Model Details
|
11 |
|
@@ -19,29 +19,62 @@ This transformer-based model is fine-tuned on a custom dataset to detect sensiti
|
|
19 |
- **Language(s):** English
|
20 |
- **Use case:** PII detection in text
|
21 |
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
### Recommendations
|
46 |
|
47 |
- Use human review in high-risk environments.
|
|
|
5 |
|
6 |
# Model Card for PII Detection with DeBERTa
|
7 |
|
8 |
+
This model is a fine-tuned version of [`microsoft/deberta`](https://huggingface.co/microsoft/deberta-v3-base) for Named Entity Recognition (NER), specifically designed for detecting Personally Identifiable Information (PII) entities like names, SSNs, phone numbers, credit card numbers, addresses, and more.
|
9 |
|
10 |
## Model Details
|
11 |
|
|
|
19 |
- **Language(s):** English
|
20 |
- **Use case:** PII detection in text
|
21 |
|
22 |
+
# Training Details
|
23 |
+
|
24 |
+
## Training Data
|
25 |
+
The model was fine-tuned on a custom dataset containing labeled examples of the following PII entity types:
|
26 |
+
|
27 |
+
- NAME
|
28 |
+
- SSN
|
29 |
+
- PHONE-NO
|
30 |
+
- CREDIT-CARD-NO
|
31 |
+
- BANK-ACCOUNT-NO
|
32 |
+
- BANK-ROUTING-NO
|
33 |
+
- ADDRESS
|
34 |
+
|
35 |
+
|
36 |
+
### Epoch Logs
|
37 |
+
|
38 |
+
| Epoch | Train Loss | Val Loss | Precision | Recall | F1 | Accuracy |
|
39 |
+
|-------|------------|----------|-----------|--------|--------|----------|
|
40 |
+
| 1 | 0.3672 | 0.1987 | 0.7806 | 0.8114 | 0.7957 | 0.9534 |
|
41 |
+
| 2 | 0.1149 | 0.1011 | 0.9161 | 0.9772 | 0.9457 | 0.9797 |
|
42 |
+
| 3 | 0.0795 | 0.0889 | 0.9264 | 0.9825 | 0.9536 | 0.9813 |
|
43 |
+
| 4 | 0.0708 | 0.0880 | 0.9242 | 0.9842 | 0.9533 | 0.9806 |
|
44 |
+
| 5 | 0.0626 | 0.0858 | 0.9235 | 0.9851 | 0.9533 | 0.9806 |
|
45 |
+
|
46 |
+
## SeqEval Classification Report
|
47 |
+
|
48 |
+
| Label | Precision | Recall | F1-score | Support |
|
49 |
+
|------------------|-----------|--------|----------|---------|
|
50 |
+
| ADDRESS | 0.91 | 0.94 | 0.92 | 77 |
|
51 |
+
| BANK-ACCOUNT-NO | 0.91 | 0.99 | 0.95 | 169 |
|
52 |
+
| BANK-ROUTING-NO | 0.85 | 0.96 | 0.90 | 104 |
|
53 |
+
| CREDIT-CARD-NO | 0.95 | 1.00 | 0.97 | 228 |
|
54 |
+
| NAME | 0.98 | 0.97 | 0.97 | 164 |
|
55 |
+
| PHONE-NO | 0.94 | 0.99 | 0.96 | 308 |
|
56 |
+
| SSN | 0.87 | 1.00 | 0.93 | 90 |
|
57 |
+
|
58 |
+
### Summary
|
59 |
+
- **Micro avg:** 0.95
|
60 |
+
- **Macro avg:** 0.95
|
61 |
+
- **Weighted avg:** 0.95
|
62 |
+
|
63 |
+
## Evaluation
|
64 |
+
|
65 |
+
### Testing Data
|
66 |
+
Evaluation was done on a held-out portion of the same labeled dataset.
|
67 |
+
|
68 |
+
### Metrics
|
69 |
+
- Precision
|
70 |
+
- Recall
|
71 |
+
- F1 (via seqeval)
|
72 |
+
- Entity-wise breakdown
|
73 |
+
- Token-level accuracy
|
74 |
+
|
75 |
+
### Results
|
76 |
+
- F1-score consistently above 0.95 for most labels, showing robustness in PII detection.
|
77 |
+
-
|
78 |
### Recommendations
|
79 |
|
80 |
- Use human review in high-risk environments.
|