ab-ai commited on
Commit
02a0a08
·
verified ·
1 Parent(s): c43e5dc

Create README.md

Browse files

How to use model added

Files changed (1) hide show
  1. README.md +60 -0
README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PII Guardian PHI3 Mini LORA
2
+
3
+ This repository contains a fine-tuned model for detecting Personally Identifiable Information (PII) using PHI3 Mini with LoRA applied to the query, key, and value layers. The model is optimized for accuracy and efficiency in identifying various PII entities.
4
+
5
+ ## How to Use
6
+
7
+ You can use the following Python code to load and run the model:
8
+
9
+ ```python
10
+ import torch
11
+ from peft import PeftModel
12
+ from transformers import AutoModelForCausalLM, AutoTokenizer
13
+
14
+ # Check the available device
15
+ if torch.cuda.is_available():
16
+ device = "cuda"
17
+ elif torch.backends.mps.is_available():
18
+ device = "mps"
19
+ else:
20
+ device = "cpu"
21
+
22
+ print(f"Using device: {device}")
23
+
24
+ # Check for bf16 support
25
+ is_bf16_support = False
26
+ try:
27
+ tensor_bf16 = torch.tensor([1.0], dtype=torch.bfloat16, device=device)
28
+ is_bf16_support = True
29
+ print("bf16 tensors are supported.")
30
+ except TypeError:
31
+ print("bf16 tensors are not supported.")
32
+
33
+ # Load the base model and tokenizer
34
+ base_model = "microsoft/Phi-3-mini-4k-instruct"
35
+ tokenizer = AutoTokenizer.from_pretrained(base_model)
36
+
37
+ # Load the fine-tuned PII detection model with LoRA
38
+ model = AutoModelForCausalLM.from_pretrained(base_model, return_dict=True, device_map=device, torch_dtype=torch.bfloat16 if is_bf16_support else torch.float16)
39
+ pii_model = PeftModel.from_pretrained(model, "ab-ai/PII-Guardian-Phi3-Mini-LORA")
40
+
41
+ # Example input text
42
+ input_text = "Hi Abner, just a reminder that your next primary care appointment is on 23/03/1926. Please confirm by replying to this email [email protected]."
43
+
44
+ # Define the model prompt
45
+ model_prompt = f"""### Instruction:
46
+ Identify and extract the following PII entities from the text, if present: companyname, pin, currencyname, email, phoneimei, litecoinaddress, currency, eyecolor, street, mac, state, time, vehiclevin, jobarea, date, bic, currencysymbol, currencycode, age, nearbygpscoordinate, amount, ssn, ethereumaddress, zipcode, buildingnumber, dob, firstname, middlename, ordinaldirection, jobtitle, bitcoinaddress, jobtype, phonenumber, height, password, ip, useragent, accountname, city, gender, secondaryaddress, iban, sex, prefix, ipv4, maskednumber, url, username, lastname, creditcardcvv, county, vehiclevrm, ipv6, creditcardissuer, accountnumber, creditcardnumber. Return the output in JSON format.
47
+
48
+ ### Input:
49
+ {input_text}
50
+
51
+ ### Output:
52
+
53
+ """
54
+
55
+ # Tokenize the input and generate a response
56
+ inputs = tokenizer(model_prompt, return_tensors="pt").to(device)
57
+ # adjust max_new_tokens according to your need
58
+ outputs = pii_model.generate(**inputs, do_sample=True, max_new_tokens=120)
59
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
60
+ print(response)