File size: 3,132 Bytes
8b2cbed
 
 
 
 
 
 
 
 
402396f
8b2cbed
 
 
 
 
 
 
 
726de9c
8b2cbed
 
 
 
 
0968137
c2e5ed6
0968137
 
 
 
 
 
 
 
5117ebb
 
0968137
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8b2cbed
0968137
6c0172b
 
 
8b2cbed
 
518b212
8b2cbed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
00800b9
 
 
d6347ad
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
license: mit
library_name: peft
tags:
- trl
- sft
- generated_from_trainer
base_model: microsoft/Phi-3-mini-4k-instruct
datasets:
- shujatoor/ner_instruct-chat
model-index:
- name: checkpoint_dir
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# phi3nedtuned-ner

This model is a fine-tuned version of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) on the generator dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6568

## For Inference
```python
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

config = PeftConfig.from_pretrained("shujatoor/phi3nedtuned-ner")
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct", 
    device_map="cuda", 
    torch_dtype="auto", 
    trust_remote_code=True, 
)
model = PeftModel.from_pretrained(model, "shujatoor/phi3nedtuned-ner")
model.config.to_json_file('adapter_config.json')


torch.random.manual_seed(0)
tokenizer = AutoTokenizer.from_pretrained("shujatoor/phi3nedtuned-ner")


text = "Hasan Pharmacy Madina Market Mustafa Chowk.PCsiR Staff Society College Road, Lahore Drug Lic#441-A/AIT No.1023874 24/04/202422:18:03 M/s*CASH SALES-WALKING CUST Remarks: Ref.: Item Name Qty Price Total Advant Tab 16mg 28 37.50 1050.00 Kepra 500mg Tab 30 85.91 2577.30 Kabrokin 200mg 240 10.67 2560.80 Tab Myteka 10mg Tab 14 37.71 527.94 Cipocain Ear/drops 1 168.00 168.00 Medicam T/paste 1 240.00 240.00 100gm Total items:6 Gross Total : 7,124.04 Disc: 523.68 DR.HASAN Net Total. 6,600.00 (Computer Software developed by Abuzar Consultancy Ph 042-37426911-15)."
qs = f'{text} What is the drug license number of the store??'
print('Question:',qs, '\n')
messages = [
    #{"role": "system", "content": "Only output the answer, nothing else"},
    {"role": "user", "content": qs},

]

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 512,
    "return_full_text": False,
    #"temperature": 0.0,
    "do_sample": False,
}

output = pipe(messages, **generation_args)

print('Answer:', output[0]['generated_text'], '\n')

"""
expected answer:

Answer: 441-A/AIT No.1023874

"""

```

## Intended uses & limitations

Named Entity Recognition (NER)

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 1
- eval_batch_size: 1
- seed: 0
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.2
- num_epochs: 1

### Training results



### Framework versions

- PEFT 0.10.1.dev0
- Transformers 4.41.0.dev0
- Pytorch 2.2.1+cu121
- Datasets 2.19.0
- Tokenizers 0.19.1

### License

The model is licensed under the MIT license.