abdoelsayed commited on
Commit
5984321
·
1 Parent(s): f937d4b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -0
README.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama2
3
+ language:
4
+ - en
5
+ - ar
6
+ metrics:
7
+ - accuracy
8
+ - f1
9
+ library_name: transformers
10
+ ---
11
+
12
+ # llama-7b-v2-Receipt-Key-Extraction
13
+
14
+ llama-7b-v2-Receipt-Key-Extraction is a 7 billion parameter based on LLamA v1
15
+
16
+ [AMuRD: Annotated Multilingual Receipts Dataset for Cross-lingual Key Information Extraction and Classification](https://arxiv.org/abs/2309.09800)
17
+
18
+ ## Uses
19
+
20
+ The model is intended for research-only use in English and Arabic for key information extraction for items in receipts.
21
+
22
+ ## How to Get Started with the Model
23
+
24
+ Use the code below to get started with the model.
25
+
26
+ ```bibtex
27
+ # pip install -q transformers
28
+
29
+ import torch
30
+ from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
31
+
32
+ checkpoint = "abdoelsayed/llama-7b-v2-Receipt-Key-Extraction"
33
+ device = "cuda" if torch.cuda.is_available() else "cpu"
34
+
35
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint, model_max_length=512,
36
+ padding_side="right",
37
+ use_fast=False,)
38
+ model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
39
+
40
+ def generate_response(instruction, input_text, max_new_tokens=100, temperature=0.1, num_beams=4 ,top_k=40):
41
+ prompt = f"Below is an instruction that describes a task, paired with an input that provides further context.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input_text}\n\n### Response:"
42
+ inputs = tokenizer(prompt, return_tensors="pt")
43
+ input_ids = inputs["input_ids"].to(device)
44
+ generation_config = GenerationConfig(
45
+ temperature=temperature,
46
+ top_p=top_p,
47
+ top_k=top_k,
48
+ num_beams=num_beams,
49
+ )
50
+ with torch.no_grad():
51
+ outputs = model.generate(input_ids,generation_config=generation_config, max_new_tokens=max_new_tokens)
52
+ outputs = tokenizer.decode(outputs.sequences[0])
53
+ return output.split("### Response:")[-1].strip().replace("</s>","")
54
+
55
+ instruction = "Extract the class, Brand, Weight, Number of units, Size of units, Price, T.Price, Pack, Unit from the following sentence"
56
+ input_text = "Americana Okra zero 400 gm"
57
+
58
+ response = generate_response(instruction, input_text)
59
+ print(response)
60
+
61
+ ```
62
+
63
+
64
+
65
+ ## How to Cite
66
+
67
+ Please cite this model using this format.
68
+
69
+ ```bibtex
70
+ @misc{abdallah2023amurd,
71
+ title={AMuRD: Annotated Multilingual Receipts Dataset for Cross-lingual Key Information Extraction and Classification},
72
+ author={Abdelrahman Abdallah and Mahmoud Abdalla and Mohamed Elkasaby and Yasser Elbendary and Adam Jatowt},
73
+ year={2023},
74
+ eprint={2309.09800},
75
+ archivePrefix={arXiv},
76
+ primaryClass={cs.CL}
77
+ }
78
+ ```