darkmatter2222 commited on
Commit
d84aa0b
·
1 Parent(s): 486a7ed

model upload

Browse files
README.md CHANGED
@@ -1,3 +1,73 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ # Model Card: Redact-V1 PII Detection Model
5
+
6
+ This model is designed to automatically detect and redact personally identifiable information (PII) from text. It leverages a deep learning architecture implemented in TensorFlow and fine-tuned on a curated dataset.
7
+
8
+ ## Overview
9
+
10
+ The **Redact-V1** model is engineered for robust PII detection, with applications in data redaction and privacy preservation. The model has been trained and evaluated using the [Redact-V1 dataset](https://huggingface.co/datasets/darkmatter2222/redact-v1), ensuring a high degree of accuracy in recognizing sensitive entities.
11
+
12
+ ## Model Details
13
+
14
+ - **Model File:** [final_model.h5](final_model.h5)
15
+ - **Training Performance Data:** [training_stats.json](training_perf/training_stats.json)
16
+ - **Labels:** [labels.json](labels.json)
17
+
18
+ The training performance indicators (loss, accuracy, precision, and recall) have been recorded and can be found in the training performance file. Visualizations of model performance, including confusion matrices and training history, are available in the [images](images/) folder.
19
+
20
+ ## Usage
21
+
22
+ Below is sample code to load and use the model in a Python environment:
23
+
24
+ ```python
25
+ import os
26
+ import json
27
+ import tensorflow as tf
28
+ import tensorflow_hub as hub
29
+
30
+ # Paths to the model and labels.
31
+ MODEL_PATH = r"final_model.h5"
32
+ LABELS_PATH = r"labels.json"
33
+
34
+ def load_labels(labels_file):
35
+ with open(labels_file, 'r', encoding='utf-8') as f:
36
+ return json.load(f)
37
+
38
+ def main():
39
+ print("Loading model from:", MODEL_PATH)
40
+ model = tf.keras.models.load_model(MODEL_PATH, custom_objects={'KerasLayer': hub.KerasLayer})
41
+ print("Model loaded successfully.")
42
+
43
+ labels = load_labels(LABELS_PATH)
44
+ print("Loaded labels:", labels)
45
+
46
+ # Sample sentence for testing.
47
+ sample_sentence = "John Doe's account number 1234567890 was flagged for review due to unusual activity."
48
+ print("Sample sentence:", sample_sentence)
49
+
50
+ # Run prediction.
51
+ predictions = model.predict([sample_sentence])
52
+ print("Predictions:")
53
+ for label, prob in zip(labels, predictions[0]):
54
+ print(f"{label}: {prob:.2f}")
55
+
56
+ if __name__ == "__main__":
57
+ main()
58
+ ```
59
+
60
+ # Professional Model Card
61
+
62
+ ## Workspace
63
+
64
+ Collecting workspace information
65
+
66
+ ## Training Data & Source Code
67
+
68
+ - **Training Data:** The model was trained on the [Redact-V1 dataset](https://huggingface.co/datasets/darkmatter2222/redact-v1).
69
+ - **Source Code:** The training pipeline and preprocessing code can be reviewed in the [NLU-Redact-PII repository](https://github.com/darkmatter2222/NLU-Redact-PII).
70
+
71
+ ## License
72
+
73
+ This project is licensed under the Apache-2.0 license.
final_model.h5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:885cbc0ab04b60d2be804a66d72d047d2f01c1251d761501743d3b42eac4d5e5
3
+ size 1318078752
images/category_distribution.png ADDED
images/confusion_matrices.png ADDED
images/dataset_category_balance.png ADDED
images/entity_category_heatmap.png ADDED
images/highlighted_sample_1.png ADDED
images/highlighted_sample_2.png ADDED
images/highlighted_sample_3.png ADDED
images/highlighted_sample_4.png ADDED
images/highlighted_sample_5.png ADDED
images/highlighted_sample_6.png ADDED
images/per_category_metrics.png ADDED
images/training_history.png ADDED
labels.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ "People Name",
3
+ "Card Number",
4
+ "Account Number",
5
+ "Social Security Number",
6
+ "Government ID Number",
7
+ "Date of Birth",
8
+ "Password",
9
+ "Tax ID Number",
10
+ "Phone Number",
11
+ "Residential Address",
12
+ "Email Address",
13
+ "IP Number",
14
+ "Passport",
15
+ "Driver License"
16
+ ]
training_perf/training_stats.json ADDED
@@ -0,0 +1,178 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "loss": [
3
+ 0.35982558131217957,
4
+ 0.15620803833007812,
5
+ 0.07565116882324219,
6
+ 0.04694047197699547,
7
+ 0.033311400562524796,
8
+ 0.025203850120306015,
9
+ 0.020155949518084526,
10
+ 0.01679529808461666,
11
+ 0.014130683615803719,
12
+ 0.011758599430322647,
13
+ 0.010233336128294468,
14
+ 0.008858845569193363,
15
+ 0.007855775766074657,
16
+ 0.0066641164012253284,
17
+ 0.0059407246299088,
18
+ 0.00519288657233119,
19
+ 0.004634241107851267,
20
+ 0.004398580640554428,
21
+ 0.0034862596075981855,
22
+ 0.0034201345406472683
23
+ ],
24
+ "accuracy": [
25
+ 0.2826281785964966,
26
+ 0.5477352142333984,
27
+ 0.6037829518318176,
28
+ 0.6236934065818787,
29
+ 0.637332022190094,
30
+ 0.6406172513961792,
31
+ 0.642210066318512,
32
+ 0.6451966166496277,
33
+ 0.6522648334503174,
34
+ 0.6512693166732788,
35
+ 0.6581383943557739,
36
+ 0.6668989658355713,
37
+ 0.6552513837814331,
38
+ 0.6564459800720215,
39
+ 0.662916898727417,
40
+ 0.6625186800956726,
41
+ 0.6670980453491211,
42
+ 0.6590343713760376,
43
+ 0.6610254049301147,
44
+ 0.6688899993896484
45
+ ],
46
+ "precision": [
47
+ 0.622481644153595,
48
+ 0.9491262435913086,
49
+ 0.9798324108123779,
50
+ 0.9886569380760193,
51
+ 0.9905564785003662,
52
+ 0.9930499792098999,
53
+ 0.9945850968360901,
54
+ 0.9945999979972839,
55
+ 0.995506763458252,
56
+ 0.9961159229278564,
57
+ 0.99701327085495,
58
+ 0.9976631999015808,
59
+ 0.9980111122131348,
60
+ 0.9984096884727478,
61
+ 0.9986090660095215,
62
+ 0.9991061091423035,
63
+ 0.9989081621170044,
64
+ 0.9991063475608826,
65
+ 0.9997023940086365,
66
+ 0.9994048476219177
67
+ ],
68
+ "recall": [
69
+ 0.16370388865470886,
70
+ 0.6965585350990295,
71
+ 0.8973508477210999,
72
+ 0.9451844692230225,
73
+ 0.9660807251930237,
74
+ 0.9763802886009216,
75
+ 0.9822728633880615,
76
+ 0.9849962592124939,
77
+ 0.9873731136322021,
78
+ 0.9905422329902649,
79
+ 0.991780161857605,
80
+ 0.9936122894287109,
81
+ 0.9939093589782715,
82
+ 0.9948006868362427,
83
+ 0.9953948855400085,
84
+ 0.9962366819381714,
85
+ 0.996682345867157,
86
+ 0.9964842796325684,
87
+ 0.9980688095092773,
88
+ 0.9977717399597168
89
+ ],
90
+ "val_loss": [
91
+ 0.21013136208057404,
92
+ 0.07909079641103745,
93
+ 0.04225553944706917,
94
+ 0.028197234496474266,
95
+ 0.021331094205379486,
96
+ 0.01816234365105629,
97
+ 0.015387289226055145,
98
+ 0.013046079315245152,
99
+ 0.011776523664593697,
100
+ 0.011295010335743427,
101
+ 0.01025811955332756,
102
+ 0.00971222948282957,
103
+ 0.009616936556994915,
104
+ 0.00931770820170641,
105
+ 0.008879762142896652,
106
+ 0.008778486400842667,
107
+ 0.008436054922640324,
108
+ 0.00880770105868578,
109
+ 0.008562939241528511,
110
+ 0.008513949811458588
111
+ ],
112
+ "val_accuracy": [
113
+ 0.5103503465652466,
114
+ 0.584792971611023,
115
+ 0.6345541477203369,
116
+ 0.6504777073860168,
117
+ 0.6528662443161011,
118
+ 0.6612260937690735,
119
+ 0.6480891704559326,
120
+ 0.6644108295440674,
121
+ 0.6660031676292419,
122
+ 0.6667993664741516,
123
+ 0.6588375568389893,
124
+ 0.6691879034042358,
125
+ 0.6409235596656799,
126
+ 0.6544585824012756,
127
+ 0.6544585824012756,
128
+ 0.6632165312767029,
129
+ 0.6246019005775452,
130
+ 0.6281847357749939,
131
+ 0.6425158977508545,
132
+ 0.6222133636474609
133
+ ],
134
+ "val_precision": [
135
+ 0.9616552591323853,
136
+ 0.986707329750061,
137
+ 0.9932293891906738,
138
+ 0.9945641160011292,
139
+ 0.9944178462028503,
140
+ 0.9960119724273682,
141
+ 0.9962226748466492,
142
+ 0.9958391189575195,
143
+ 0.9964384436607361,
144
+ 0.9966376423835754,
145
+ 0.9964413046836853,
146
+ 0.9970332384109497,
147
+ 0.9972304701805115,
148
+ 0.9966409802436829,
149
+ 0.9966409802436829,
150
+ 0.9970361590385437,
151
+ 0.9968404173851013,
152
+ 0.9972337484359741,
153
+ 0.9966429471969604,
154
+ 0.9970367550849915
155
+ ],
156
+ "val_recall": [
157
+ 0.49695900082588196,
158
+ 0.8883656859397888,
159
+ 0.9497743844985962,
160
+ 0.9691975712776184,
161
+ 0.978614866733551,
162
+ 0.9799882173538208,
163
+ 0.9831273555755615,
164
+ 0.9860702157020569,
165
+ 0.9880321621894836,
166
+ 0.9886207580566406,
167
+ 0.9888169765472412,
168
+ 0.989013135433197,
169
+ 0.989013135433197,
170
+ 0.989601731300354,
171
+ 0.989601731300354,
172
+ 0.9899941086769104,
173
+ 0.9903864860534668,
174
+ 0.990190327167511,
175
+ 0.990190327167511,
176
+ 0.990190327167511
177
+ ]
178
+ }