AI-Enthusiast11 commited on
Commit
3e9b998
·
verified ·
1 Parent(s): 988a009

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -24
README.md CHANGED
@@ -5,7 +5,7 @@ tags: [token-classification, ner, deberta, privacy, pii-detection]
5
 
6
  # Model Card for PII Detection with DeBERTa
7
 
8
- This model is a fine-tuned version of [`microsoft/deberta`](https://huggingface.co/microsoft/deberta) for Named Entity Recognition (NER), specifically designed for detecting Personally Identifiable Information (PII) entities like names, SSNs, phone numbers, credit card numbers, addresses, and more.
9
 
10
  ## Model Details
11
 
@@ -19,29 +19,62 @@ This transformer-based model is fine-tuned on a custom dataset to detect sensiti
19
  - **Language(s):** English
20
  - **Use case:** PII detection in text
21
 
22
-
23
-
24
-
25
- ## Uses
26
-
27
- ### Direct Use
28
-
29
- This model can be used to identify PII entities in unstructured text. It is suitable for use in privacy compliance systems, redaction tools, or data anonymization pipelines.
30
-
31
- ### Downstream Use
32
-
33
- It can be plugged into pipelines for data privacy, automated redaction in chat logs or documents, or regulatory compliance tools.
34
-
35
- ### Out-of-Scope Use
36
-
37
- - Not recommended for non-English text
38
- - Should not be used as a standalone solution for legal compliance without human verification
39
- - Not suited for medical PII unless fine-tuned further
40
-
41
- ## Bias, Risks, and Limitations
42
-
43
- While the model performs well on the provided dataset, it may miss edge cases or non-standard PII formats. False positives or negatives may occur in unseen contexts or informal text.
44
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  ### Recommendations
46
 
47
  - Use human review in high-risk environments.
 
5
 
6
  # Model Card for PII Detection with DeBERTa
7
 
8
+ This model is a fine-tuned version of [`microsoft/deberta`](https://huggingface.co/microsoft/deberta-v3-base) for Named Entity Recognition (NER), specifically designed for detecting Personally Identifiable Information (PII) entities like names, SSNs, phone numbers, credit card numbers, addresses, and more.
9
 
10
  ## Model Details
11
 
 
19
  - **Language(s):** English
20
  - **Use case:** PII detection in text
21
 
22
+ # Training Details
23
+
24
+ ## Training Data
25
+ The model was fine-tuned on a custom dataset containing labeled examples of the following PII entity types:
26
+
27
+ - NAME
28
+ - SSN
29
+ - PHONE-NO
30
+ - CREDIT-CARD-NO
31
+ - BANK-ACCOUNT-NO
32
+ - BANK-ROUTING-NO
33
+ - ADDRESS
34
+
35
+
36
+ ### Epoch Logs
37
+
38
+ | Epoch | Train Loss | Val Loss | Precision | Recall | F1 | Accuracy |
39
+ |-------|------------|----------|-----------|--------|--------|----------|
40
+ | 1 | 0.3672 | 0.1987 | 0.7806 | 0.8114 | 0.7957 | 0.9534 |
41
+ | 2 | 0.1149 | 0.1011 | 0.9161 | 0.9772 | 0.9457 | 0.9797 |
42
+ | 3 | 0.0795 | 0.0889 | 0.9264 | 0.9825 | 0.9536 | 0.9813 |
43
+ | 4 | 0.0708 | 0.0880 | 0.9242 | 0.9842 | 0.9533 | 0.9806 |
44
+ | 5 | 0.0626 | 0.0858 | 0.9235 | 0.9851 | 0.9533 | 0.9806 |
45
+
46
+ ## SeqEval Classification Report
47
+
48
+ | Label | Precision | Recall | F1-score | Support |
49
+ |------------------|-----------|--------|----------|---------|
50
+ | ADDRESS | 0.91 | 0.94 | 0.92 | 77 |
51
+ | BANK-ACCOUNT-NO | 0.91 | 0.99 | 0.95 | 169 |
52
+ | BANK-ROUTING-NO | 0.85 | 0.96 | 0.90 | 104 |
53
+ | CREDIT-CARD-NO | 0.95 | 1.00 | 0.97 | 228 |
54
+ | NAME | 0.98 | 0.97 | 0.97 | 164 |
55
+ | PHONE-NO | 0.94 | 0.99 | 0.96 | 308 |
56
+ | SSN | 0.87 | 1.00 | 0.93 | 90 |
57
+
58
+ ### Summary
59
+ - **Micro avg:** 0.95
60
+ - **Macro avg:** 0.95
61
+ - **Weighted avg:** 0.95
62
+
63
+ ## Evaluation
64
+
65
+ ### Testing Data
66
+ Evaluation was done on a held-out portion of the same labeled dataset.
67
+
68
+ ### Metrics
69
+ - Precision
70
+ - Recall
71
+ - F1 (via seqeval)
72
+ - Entity-wise breakdown
73
+ - Token-level accuracy
74
+
75
+ ### Results
76
+ - F1-score consistently above 0.95 for most labels, showing robustness in PII detection.
77
+ -
78
  ### Recommendations
79
 
80
  - Use human review in high-risk environments.