File size: 591 Bytes
54fa0c8
 
 
 
 
 
 
1
2
3
4
5
6
7
# PII detection and Redaction using an NER model
Here we provide code to:
- fine-tune an encoder model (like [StarEncoder](https://huggingface.co/bigcode/starencoder)) for the task of PII detection (NER): see folder `pii_train_ner`
- run inference with our fine-tuned [StarPII](https://huggingface.co/bigcode/starpii) for PII detection on multiple GPUs: see folder `pii_inference`
- redact/mask PII detected with the model: see folder `pii_redaction`

This is the code we used for PII anonymization in the 800GB dataset [StarCoderData](https://huggingface.co/datasets/bigcode/starcoderdata).