pii_model / README.md
ab-ai's picture
Update README.md
76fcc1a verified
---
license: apache-2.0
base_model: bert-base-cased
tags:
- PII
- NER
- Bert
- Token Classification
datasets:
- generator
metrics:
- precision
- recall
- f1
- accuracy
model-index:
- name: pii_model
results:
- task:
name: Token Classification
type: token-classification
dataset:
name: generator
type: generator
config: default
split: train
args: default
metrics:
- name: Precision
type: precision
value: 0.954751
- name: Recall
type: recall
value: 0.965233
- name: F1
type: f1
value: 0.959964
- name: Accuracy
type: accuracy
value: 0.991199
pipeline_tag: token-classification
language:
- en
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# Personal Identifiable Information (PII Model)
This model is a fine-tuned version of [bert-base-cased](https://huggingface.co/bert-base-cased) on the generator dataset.
It achieves the following results:
- Training Loss: 0.003900
- Validation Loss: 0.051071
- Precision: 95.53%
- Recall: 96.60%
- F1: 96%
- Accuracy:99.11%
## Model description
Meet our digital safeguard, a savvy token classification model with a knack for spotting personally identifiable information (PII) entities. Trained on the illustrious Bert architecture and fine-tuned on a custom dataset, this model is like a superhero for privacy, swiftly detecting names, addresses, dates of birth, and more. With each token it encounters, it acts as a vigilant guardian, ensuring that sensitive information remains shielded from prying eyes, making the digital realm a safer and more secure place to explore.
## Model can Detect Following Entity Group
- ACCOUNTNUMBER
- FIRSTNAME
- ACCOUNTNAME
- PHONENUMBER
- CREDITCARDCVV
- CREDITCARDISSUER
- PREFIX
- LASTNAME
- AMOUNT
- DATE
- DOB
- COMPANYNAME
- BUILDINGNUMBER
- STREET
- SECONDARYADDRESS
- STATE
- EMAIL
- CITY
- CREDITCARDNUMBER
- SSN
- URL
- USERNAME
- PASSWORD
- COUNTY
- PIN
- MIDDLENAME
- IBAN
- GENDER
- AGE
- ZIPCODE
- SEX
### Training hyperparameters
The following hyperparameters were used during training:
| Hyperparameter | Value |
|------------------------------|---------------|
| Learning Rate | 5e-5 |
| Train Batch Size | 16 |
| Eval Batch Size | 16 |
| Number of Training Epochs | 7 |
| Weight Decay | 0.01 |
| Save Strategy | Epoch |
| Load Best Model at End | True |
| Metric for Best Model | F1 |
| Push to Hub | True |
| Evaluation Strategy | Epoch |
| Early Stopping Patience | 3 |
### Training results
| Epoch | Training Loss | Validation Loss | Precision (%) | Recall (%) | F1 Score (%) | Accuracy (%) |
|-------|---------------|-----------------|---------------|------------|--------------|--------------|
| 1 | 0.0443 | 0.038108 | 91.88 | 95.17 | 93.50 | 98.80 |
| 2 | 0.0318 | 0.035728 | 94.13 | 96.15 | 95.13 | 98.90 |
| 3 | 0.0209 | 0.032016 | 94.81 | 96.42 | 95.61 | 99.01 |
| 4 | 0.0154 | 0.040221 | 93.87 | 95.80 | 94.82 | 98.88 |
| 5 | 0.0084 | 0.048183 | 94.21 | 96.06 | 95.13 | 98.93 |
| 6 | 0.0037 | 0.052281 | 94.49 | 96.60 | 95.53 | 99.07 |
### Author
abhijeet[email protected]
### Framework versions
- Transformers 4.38.2
- Pytorch 2.1.0+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2