Commit
·
d84aa0b
1
Parent(s):
486a7ed
model upload
Browse files- README.md +70 -0
- final_model.h5 +3 -0
- images/category_distribution.png +0 -0
- images/confusion_matrices.png +0 -0
- images/dataset_category_balance.png +0 -0
- images/entity_category_heatmap.png +0 -0
- images/highlighted_sample_1.png +0 -0
- images/highlighted_sample_2.png +0 -0
- images/highlighted_sample_3.png +0 -0
- images/highlighted_sample_4.png +0 -0
- images/highlighted_sample_5.png +0 -0
- images/highlighted_sample_6.png +0 -0
- images/per_category_metrics.png +0 -0
- images/training_history.png +0 -0
- labels.json +16 -0
- training_perf/training_stats.json +178 -0
README.md
CHANGED
@@ -1,3 +1,73 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
# Model Card: Redact-V1 PII Detection Model
|
5 |
+
|
6 |
+
This model is designed to automatically detect and redact personally identifiable information (PII) from text. It leverages a deep learning architecture implemented in TensorFlow and fine-tuned on a curated dataset.
|
7 |
+
|
8 |
+
## Overview
|
9 |
+
|
10 |
+
The **Redact-V1** model is engineered for robust PII detection, with applications in data redaction and privacy preservation. The model has been trained and evaluated using the [Redact-V1 dataset](https://huggingface.co/datasets/darkmatter2222/redact-v1), ensuring a high degree of accuracy in recognizing sensitive entities.
|
11 |
+
|
12 |
+
## Model Details
|
13 |
+
|
14 |
+
- **Model File:** [final_model.h5](final_model.h5)
|
15 |
+
- **Training Performance Data:** [training_stats.json](training_perf/training_stats.json)
|
16 |
+
- **Labels:** [labels.json](labels.json)
|
17 |
+
|
18 |
+
The training performance indicators (loss, accuracy, precision, and recall) have been recorded and can be found in the training performance file. Visualizations of model performance, including confusion matrices and training history, are available in the [images](images/) folder.
|
19 |
+
|
20 |
+
## Usage
|
21 |
+
|
22 |
+
Below is sample code to load and use the model in a Python environment:
|
23 |
+
|
24 |
+
```python
|
25 |
+
import os
|
26 |
+
import json
|
27 |
+
import tensorflow as tf
|
28 |
+
import tensorflow_hub as hub
|
29 |
+
|
30 |
+
# Paths to the model and labels.
|
31 |
+
MODEL_PATH = r"final_model.h5"
|
32 |
+
LABELS_PATH = r"labels.json"
|
33 |
+
|
34 |
+
def load_labels(labels_file):
|
35 |
+
with open(labels_file, 'r', encoding='utf-8') as f:
|
36 |
+
return json.load(f)
|
37 |
+
|
38 |
+
def main():
|
39 |
+
print("Loading model from:", MODEL_PATH)
|
40 |
+
model = tf.keras.models.load_model(MODEL_PATH, custom_objects={'KerasLayer': hub.KerasLayer})
|
41 |
+
print("Model loaded successfully.")
|
42 |
+
|
43 |
+
labels = load_labels(LABELS_PATH)
|
44 |
+
print("Loaded labels:", labels)
|
45 |
+
|
46 |
+
# Sample sentence for testing.
|
47 |
+
sample_sentence = "John Doe's account number 1234567890 was flagged for review due to unusual activity."
|
48 |
+
print("Sample sentence:", sample_sentence)
|
49 |
+
|
50 |
+
# Run prediction.
|
51 |
+
predictions = model.predict([sample_sentence])
|
52 |
+
print("Predictions:")
|
53 |
+
for label, prob in zip(labels, predictions[0]):
|
54 |
+
print(f"{label}: {prob:.2f}")
|
55 |
+
|
56 |
+
if __name__ == "__main__":
|
57 |
+
main()
|
58 |
+
```
|
59 |
+
|
60 |
+
# Professional Model Card
|
61 |
+
|
62 |
+
## Workspace
|
63 |
+
|
64 |
+
Collecting workspace information
|
65 |
+
|
66 |
+
## Training Data & Source Code
|
67 |
+
|
68 |
+
- **Training Data:** The model was trained on the [Redact-V1 dataset](https://huggingface.co/datasets/darkmatter2222/redact-v1).
|
69 |
+
- **Source Code:** The training pipeline and preprocessing code can be reviewed in the [NLU-Redact-PII repository](https://github.com/darkmatter2222/NLU-Redact-PII).
|
70 |
+
|
71 |
+
## License
|
72 |
+
|
73 |
+
This project is licensed under the Apache-2.0 license.
|
final_model.h5
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:885cbc0ab04b60d2be804a66d72d047d2f01c1251d761501743d3b42eac4d5e5
|
3 |
+
size 1318078752
|
images/category_distribution.png
ADDED
![]() |
images/confusion_matrices.png
ADDED
![]() |
images/dataset_category_balance.png
ADDED
![]() |
images/entity_category_heatmap.png
ADDED
![]() |
images/highlighted_sample_1.png
ADDED
![]() |
images/highlighted_sample_2.png
ADDED
![]() |
images/highlighted_sample_3.png
ADDED
![]() |
images/highlighted_sample_4.png
ADDED
![]() |
images/highlighted_sample_5.png
ADDED
![]() |
images/highlighted_sample_6.png
ADDED
![]() |
images/per_category_metrics.png
ADDED
![]() |
images/training_history.png
ADDED
![]() |
labels.json
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
[
|
2 |
+
"People Name",
|
3 |
+
"Card Number",
|
4 |
+
"Account Number",
|
5 |
+
"Social Security Number",
|
6 |
+
"Government ID Number",
|
7 |
+
"Date of Birth",
|
8 |
+
"Password",
|
9 |
+
"Tax ID Number",
|
10 |
+
"Phone Number",
|
11 |
+
"Residential Address",
|
12 |
+
"Email Address",
|
13 |
+
"IP Number",
|
14 |
+
"Passport",
|
15 |
+
"Driver License"
|
16 |
+
]
|
training_perf/training_stats.json
ADDED
@@ -0,0 +1,178 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"loss": [
|
3 |
+
0.35982558131217957,
|
4 |
+
0.15620803833007812,
|
5 |
+
0.07565116882324219,
|
6 |
+
0.04694047197699547,
|
7 |
+
0.033311400562524796,
|
8 |
+
0.025203850120306015,
|
9 |
+
0.020155949518084526,
|
10 |
+
0.01679529808461666,
|
11 |
+
0.014130683615803719,
|
12 |
+
0.011758599430322647,
|
13 |
+
0.010233336128294468,
|
14 |
+
0.008858845569193363,
|
15 |
+
0.007855775766074657,
|
16 |
+
0.0066641164012253284,
|
17 |
+
0.0059407246299088,
|
18 |
+
0.00519288657233119,
|
19 |
+
0.004634241107851267,
|
20 |
+
0.004398580640554428,
|
21 |
+
0.0034862596075981855,
|
22 |
+
0.0034201345406472683
|
23 |
+
],
|
24 |
+
"accuracy": [
|
25 |
+
0.2826281785964966,
|
26 |
+
0.5477352142333984,
|
27 |
+
0.6037829518318176,
|
28 |
+
0.6236934065818787,
|
29 |
+
0.637332022190094,
|
30 |
+
0.6406172513961792,
|
31 |
+
0.642210066318512,
|
32 |
+
0.6451966166496277,
|
33 |
+
0.6522648334503174,
|
34 |
+
0.6512693166732788,
|
35 |
+
0.6581383943557739,
|
36 |
+
0.6668989658355713,
|
37 |
+
0.6552513837814331,
|
38 |
+
0.6564459800720215,
|
39 |
+
0.662916898727417,
|
40 |
+
0.6625186800956726,
|
41 |
+
0.6670980453491211,
|
42 |
+
0.6590343713760376,
|
43 |
+
0.6610254049301147,
|
44 |
+
0.6688899993896484
|
45 |
+
],
|
46 |
+
"precision": [
|
47 |
+
0.622481644153595,
|
48 |
+
0.9491262435913086,
|
49 |
+
0.9798324108123779,
|
50 |
+
0.9886569380760193,
|
51 |
+
0.9905564785003662,
|
52 |
+
0.9930499792098999,
|
53 |
+
0.9945850968360901,
|
54 |
+
0.9945999979972839,
|
55 |
+
0.995506763458252,
|
56 |
+
0.9961159229278564,
|
57 |
+
0.99701327085495,
|
58 |
+
0.9976631999015808,
|
59 |
+
0.9980111122131348,
|
60 |
+
0.9984096884727478,
|
61 |
+
0.9986090660095215,
|
62 |
+
0.9991061091423035,
|
63 |
+
0.9989081621170044,
|
64 |
+
0.9991063475608826,
|
65 |
+
0.9997023940086365,
|
66 |
+
0.9994048476219177
|
67 |
+
],
|
68 |
+
"recall": [
|
69 |
+
0.16370388865470886,
|
70 |
+
0.6965585350990295,
|
71 |
+
0.8973508477210999,
|
72 |
+
0.9451844692230225,
|
73 |
+
0.9660807251930237,
|
74 |
+
0.9763802886009216,
|
75 |
+
0.9822728633880615,
|
76 |
+
0.9849962592124939,
|
77 |
+
0.9873731136322021,
|
78 |
+
0.9905422329902649,
|
79 |
+
0.991780161857605,
|
80 |
+
0.9936122894287109,
|
81 |
+
0.9939093589782715,
|
82 |
+
0.9948006868362427,
|
83 |
+
0.9953948855400085,
|
84 |
+
0.9962366819381714,
|
85 |
+
0.996682345867157,
|
86 |
+
0.9964842796325684,
|
87 |
+
0.9980688095092773,
|
88 |
+
0.9977717399597168
|
89 |
+
],
|
90 |
+
"val_loss": [
|
91 |
+
0.21013136208057404,
|
92 |
+
0.07909079641103745,
|
93 |
+
0.04225553944706917,
|
94 |
+
0.028197234496474266,
|
95 |
+
0.021331094205379486,
|
96 |
+
0.01816234365105629,
|
97 |
+
0.015387289226055145,
|
98 |
+
0.013046079315245152,
|
99 |
+
0.011776523664593697,
|
100 |
+
0.011295010335743427,
|
101 |
+
0.01025811955332756,
|
102 |
+
0.00971222948282957,
|
103 |
+
0.009616936556994915,
|
104 |
+
0.00931770820170641,
|
105 |
+
0.008879762142896652,
|
106 |
+
0.008778486400842667,
|
107 |
+
0.008436054922640324,
|
108 |
+
0.00880770105868578,
|
109 |
+
0.008562939241528511,
|
110 |
+
0.008513949811458588
|
111 |
+
],
|
112 |
+
"val_accuracy": [
|
113 |
+
0.5103503465652466,
|
114 |
+
0.584792971611023,
|
115 |
+
0.6345541477203369,
|
116 |
+
0.6504777073860168,
|
117 |
+
0.6528662443161011,
|
118 |
+
0.6612260937690735,
|
119 |
+
0.6480891704559326,
|
120 |
+
0.6644108295440674,
|
121 |
+
0.6660031676292419,
|
122 |
+
0.6667993664741516,
|
123 |
+
0.6588375568389893,
|
124 |
+
0.6691879034042358,
|
125 |
+
0.6409235596656799,
|
126 |
+
0.6544585824012756,
|
127 |
+
0.6544585824012756,
|
128 |
+
0.6632165312767029,
|
129 |
+
0.6246019005775452,
|
130 |
+
0.6281847357749939,
|
131 |
+
0.6425158977508545,
|
132 |
+
0.6222133636474609
|
133 |
+
],
|
134 |
+
"val_precision": [
|
135 |
+
0.9616552591323853,
|
136 |
+
0.986707329750061,
|
137 |
+
0.9932293891906738,
|
138 |
+
0.9945641160011292,
|
139 |
+
0.9944178462028503,
|
140 |
+
0.9960119724273682,
|
141 |
+
0.9962226748466492,
|
142 |
+
0.9958391189575195,
|
143 |
+
0.9964384436607361,
|
144 |
+
0.9966376423835754,
|
145 |
+
0.9964413046836853,
|
146 |
+
0.9970332384109497,
|
147 |
+
0.9972304701805115,
|
148 |
+
0.9966409802436829,
|
149 |
+
0.9966409802436829,
|
150 |
+
0.9970361590385437,
|
151 |
+
0.9968404173851013,
|
152 |
+
0.9972337484359741,
|
153 |
+
0.9966429471969604,
|
154 |
+
0.9970367550849915
|
155 |
+
],
|
156 |
+
"val_recall": [
|
157 |
+
0.49695900082588196,
|
158 |
+
0.8883656859397888,
|
159 |
+
0.9497743844985962,
|
160 |
+
0.9691975712776184,
|
161 |
+
0.978614866733551,
|
162 |
+
0.9799882173538208,
|
163 |
+
0.9831273555755615,
|
164 |
+
0.9860702157020569,
|
165 |
+
0.9880321621894836,
|
166 |
+
0.9886207580566406,
|
167 |
+
0.9888169765472412,
|
168 |
+
0.989013135433197,
|
169 |
+
0.989013135433197,
|
170 |
+
0.989601731300354,
|
171 |
+
0.989601731300354,
|
172 |
+
0.9899941086769104,
|
173 |
+
0.9903864860534668,
|
174 |
+
0.990190327167511,
|
175 |
+
0.990190327167511,
|
176 |
+
0.990190327167511
|
177 |
+
]
|
178 |
+
}
|