Kiran5's picture
Track large files and images with Git LFS
54fa0c8

PII detection and Redaction using an NER model

Here we provide code to:

  • fine-tune an encoder model (like StarEncoder) for the task of PII detection (NER): see folder pii_train_ner
  • run inference with our fine-tuned StarPII for PII detection on multiple GPUs: see folder pii_inference
  • redact/mask PII detected with the model: see folder pii_redaction

This is the code we used for PII anonymization in the 800GB dataset StarCoderData.