Model Card: Redact-V1 PII Detection Model

This model is designed to automatically detect and redact personally identifiable information (PII) from text. It leverages a deep learning architecture implemented in TensorFlow and fine-tuned on a curated dataset.

Overview

The Redact-V1 model is engineered for robust PII detection, with applications in data redaction and privacy preservation. The model has been trained and evaluated using the Redact-V1 dataset, ensuring a high degree of accuracy in recognizing sensitive entities.

Model Details

The training performance indicators (loss, accuracy, precision, and recall) have been recorded and can be found in the training performance file. Visualizations of model performance, including confusion matrices and training history, are available in the images folder.

Highlighted Sample

Supported Classes

The model supports the following PII classes:

  • People Name:
  • Card Number:
  • Account Number:
  • Social Security Number:
  • Government ID Number:
  • Date of Birth:
  • Password:
  • Tax ID Number:
  • Phone Number:
  • Residential Address:
  • Email Address:
  • IP Number:
  • Passport:
  • Driver License:

Usage

Below is sample code to load and use the model in a Python environment:

import os
import json
import tensorflow as tf
import tensorflow_hub as hub

# Paths to the model and labels.
MODEL_PATH = r"final_model.h5"
LABELS_PATH = r"labels.json"

def load_labels(labels_file):
    with open(labels_file, 'r', encoding='utf-8') as f:
        return json.load(f)

def main():
    print("Loading model from:", MODEL_PATH)
    model = tf.keras.models.load_model(MODEL_PATH, custom_objects={'KerasLayer': hub.KerasLayer})
    print("Model loaded successfully.")

    labels = load_labels(LABELS_PATH)
    print("Loaded labels:", labels)

    # Sample sentence for testing.
    sample_sentence = "John Doe's account number 1234567890 was flagged for review due to unusual activity."
    print("Sample sentence:", sample_sentence)

    # Run prediction.
    predictions = model.predict([sample_sentence])
    print("Predictions:")
    for label, prob in zip(labels, predictions[0]):
        print(f"{label}: {prob:.2f}")

if __name__ == "__main__":
    main()

Professional Model Card

Workspace

Collecting workspace information

Training Data & Source Code

License

This project is licensed under the Apache-2.0 license.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.