ab-ai
/

pii_model

Token Classification

Token Classification

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

pii_model / README.md

ab-ai's picture

Update README.md

76fcc1a verified 7 months ago

|

history blame contribute delete

3.88 kB

	---
	license: apache-2.0
	base_model: bert-base-cased
	tags:
	- PII
	- NER
	- Bert
	- Token Classification
	datasets:
	- generator
	metrics:
	- precision
	- recall
	- f1
	- accuracy
	model-index:
	- name: pii_model
	results:
	- task:
	name: Token Classification
	type: token-classification
	dataset:
	name: generator
	type: generator
	config: default
	split: train
	args: default
	metrics:
	- name: Precision
	type: precision
	value: 0.954751
	- name: Recall
	type: recall
	value: 0.965233
	- name: F1
	type: f1
	value: 0.959964
	- name: Accuracy
	type: accuracy
	value: 0.991199
	pipeline_tag: token-classification
	language:
	- en
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Personal Identifiable Information (PII Model)

	This model is a fine-tuned version of [bert-base-cased](https://huggingface.co/bert-base-cased) on the generator dataset.
	It achieves the following results:

	- Training Loss: 0.003900
	- Validation Loss: 0.051071
	- Precision: 95.53%
	- Recall: 96.60%
	- F1: 96%
	- Accuracy:99.11%

	## Model description

	Meet our digital safeguard, a savvy token classification model with a knack for spotting personally identifiable information (PII) entities. Trained on the illustrious Bert architecture and fine-tuned on a custom dataset, this model is like a superhero for privacy, swiftly detecting names, addresses, dates of birth, and more. With each token it encounters, it acts as a vigilant guardian, ensuring that sensitive information remains shielded from prying eyes, making the digital realm a safer and more secure place to explore.

	## Model can Detect Following Entity Group

	- ACCOUNTNUMBER
	- FIRSTNAME
	- ACCOUNTNAME
	- PHONENUMBER
	- CREDITCARDCVV
	- CREDITCARDISSUER
	- PREFIX
	- LASTNAME
	- AMOUNT
	- DATE
	- DOB
	- COMPANYNAME
	- BUILDINGNUMBER
	- STREET
	- SECONDARYADDRESS
	- STATE
	- EMAIL
	- CITY
	- CREDITCARDNUMBER
	- SSN
	- URL
	- USERNAME
	- PASSWORD
	- COUNTY
	- PIN
	- MIDDLENAME
	- IBAN
	- GENDER
	- AGE
	- ZIPCODE
	- SEX




	### Training hyperparameters
	The following hyperparameters were used during training:

	\| Hyperparameter \| Value \|
	\|------------------------------\|---------------\|
	\| Learning Rate \| 5e-5 \|
	\| Train Batch Size \| 16 \|
	\| Eval Batch Size \| 16 \|
	\| Number of Training Epochs \| 7 \|
	\| Weight Decay \| 0.01 \|
	\| Save Strategy \| Epoch \|
	\| Load Best Model at End \| True \|
	\| Metric for Best Model \| F1 \|
	\| Push to Hub \| True \|
	\| Evaluation Strategy \| Epoch \|
	\| Early Stopping Patience \| 3 \|


	### Training results

	\| Epoch \| Training Loss \| Validation Loss \| Precision (%) \| Recall (%) \| F1 Score (%) \| Accuracy (%) \|
	\|-------\|---------------\|-----------------\|---------------\|------------\|--------------\|--------------\|
	\| 1 \| 0.0443 \| 0.038108 \| 91.88 \| 95.17 \| 93.50 \| 98.80 \|
	\| 2 \| 0.0318 \| 0.035728 \| 94.13 \| 96.15 \| 95.13 \| 98.90 \|
	\| 3 \| 0.0209 \| 0.032016 \| 94.81 \| 96.42 \| 95.61 \| 99.01 \|
	\| 4 \| 0.0154 \| 0.040221 \| 93.87 \| 95.80 \| 94.82 \| 98.88 \|
	\| 5 \| 0.0084 \| 0.048183 \| 94.21 \| 96.06 \| 95.13 \| 98.93 \|
	\| 6 \| 0.0037 \| 0.052281 \| 94.49 \| 96.60 \| 95.53 \| 99.07 \|






	### Author
	abhijeet[email protected]

	### Framework versions

	- Transformers 4.38.2
	- Pytorch 2.1.0+cu121
	- Datasets 2.18.0
	- Tokenizers 0.15.2