--- library_name: transformers tags: [] --- # Model Card for Model ID Extract POS Receipt Image Data To JSON Record ## Model Details Finetuned Google's PaliGemma Model for Receipt Image extraction to JSON Record. gradio demo app: https://github.com/minyang-chen/paligemma-receipt-json-v2 ### Model Usage Setup Environment ``` pip install transformers==4.42.2 pip install datasets pip install peft accelerate bitsandbytes ``` Specify Device ``` import torch device = torch.device("cuda" if torch.cuda.is_available() else "cpu") device_map={"":0} ``` Step-1 Load Image Processor ``` from transformers import AutoProcessor FINETUNED_MODEL_ID = "mychen76/paligemma-receipt-json-3b-mix-448-v2b" processor = AutoProcessor.from_pretrained(FINETUNED_MODEL_ID) ``` Step-2 Set Task Prompt ``` TASK_PROMPT = "EXTRACT_JSON_RECEIPT" MAX_LENGTH = 512 inputs = processor(text=TASK_PROMPT, images=test_image, return_tensors="pt").to(device) for k,v in inputs.items(): print(k,v.shape) ``` Step-3 load model ``` import torch from transformers import PaliGemmaForConditionalGeneration from transformers import BitsAndBytesConfig from transformers import BitsAndBytesConfig from peft import get_peft_model, LoraConfig # Load Full model model = PaliGemmaForConditionalGeneration.from_pretrained(FINETUNED_MODEL_ID,device_map={"":0}) ``` OR Load Quantized ``` # Q-LoRa bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_type=torch.bfloat16 ) lora_config = LoraConfig( r=8, target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"], task_type="CAUSAL_LM" ) model = PaliGemmaForConditionalGeneration.from_pretrained(FINETUNED_MODEL_ID, quantization_config=bnb_config, device_map={"":0}) ``` Step-4 Inference ``` # Autoregressively generate,use greedy decoding here, for more fancy methods see https://huggingface.co/blog/how-to-generate generated_ids = model.generate(**inputs, max_new_tokens=MAX_LENGTH) # Next turn each predicted token ID back into a string using the decode method # chop of the prompt, which consists of image tokens and text prompt image_token_index = model.config.image_token_index num_image_tokens = len(generated_ids[generated_ids==image_token_index]) num_text_tokens = len(processor.tokenizer.encode(PROMPT)) num_prompt_tokens = num_image_tokens + num_text_tokens + 2 generated_text = processor.batch_decode(generated_ids[:, num_prompt_tokens:], skip_special_tokens=True, clean_up_tokenization_spaces=False)[0] print(generated_text) ``` Result Tokens ``` '(718)308-1118Brooklyn,NY112112.981NORI2.351TOMATOESPLUM0.971ONIONSVIDALIA2.481HAMBURRN0.991FTRAWBERRY0.991FTRAWBERRY0.571PILSNER", tokens, re.IGNORECASE) if start_token is None: break key = start_token.group(1) key_escaped = re.escape(key) end_token = re.search(rf"", tokens, re.IGNORECASE) start_token = start_token.group() if end_token is None: tokens = tokens.replace(start_token, "") else: end_token = end_token.group() start_token_escaped = re.escape(start_token) end_token_escaped = re.escape(end_token) content = re.search( f"{start_token_escaped}(.*?){end_token_escaped}", tokens, re.IGNORECASE | re.DOTALL ) if content is not None: content = content.group(1).strip() if r""): leaf = leaf.strip() if leaf in added_vocab and leaf[0] == "<" and leaf[-2:] == "/>": leaf = leaf[1:-2] # for categorical special tokens output[key].append(leaf) if len(output[key]) == 1: output[key] = output[key][0] tokens = tokens[tokens.find(end_token) + len(end_token) :].strip() if tokens[:6] == r"": # non-leaf nodes return [output] + token2json(tokens[6:], is_inner_value=True, added_vocab=added_vocab) if len(output): return [output] if is_inner_value else output else: return [] if is_inner_value else {"text_sequence": tokens} ## generated generated_json = token2json(generated_text) print(generated_json) ``` Final Result in Json ``` [{'total': '', 'tips': '', 'time': '', 'telephone': '(718)308-1118', 'tax': '', 'subtotal': '', 'store_name': '', 'store_addr': 'Brooklyn,NY11211', 'item_value': '2.98', 'item_quantity': '1', 'item_name': 'NORI', 'item_key': ''}, {'item_value': '2.35', 'item_quantity': '1', 'item_name': 'TOMATOESPLUM', 'item_key': ''}, {'item_value': '0.97', 'item_quantity': '1', 'item_name': 'ONIONSVIDALIA', 'item_key': ''}, {'item_value': '2.48', 'item_quantity': '1', 'item_name': 'HAMBURRN', 'item_key': ''}, {'item_value': '0.99', 'item_quantity': '1', 'item_name': 'FTRAWBERRY', 'item_key': ''}, {'item_value': '0.99', 'item_quantity': '1', 'item_name': 'FTRAWBERRY', 'item_key': ''}, {'item_value': '0.57', 'item_quantity': '1'}] ``` This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** mychen76@gmail.com - **Model type:** Vision Model for Receipt Image Data Extraction - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] - **Finetuned from model [optional]:** PaliGemma-3b-pt-224 ### Model Sources [optional] - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses ### Direct Use [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data see here: mychen76/invoices-and-receipts_ocr_v1 [More Information Needed] ### Training Procedure #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]