File size: 7,049 Bytes
196134e 2e87bd0 97dd243 96cb768 06972a3 96cb768 196134e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
---
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- mistral
- trl
base_model: unsloth/mistral-7b-instruct-v0.2-bnb-4bit
---
## Format
- chatml
## Usage Notebook
model_id: mychen76/mistral_ocr2json_v3_chatml
https://github.com/minyang-chen/LLM_convert_receipt_image-to-json_or_xml/blob/main/Convert_Receipt_Image-to-Json_using_OCR_to_JSON_v2_ChatML.ipynb
receipt_boxes from Paddle-OCR
```
ocr boxes:
[
[[[188.0, 54.0], [453.0, 54.0], [453.0, 85.0], [188.0, 85.0]], ('The Lone Pine', 0.9998102188110352)],
[[[194.0, 96.0], [449.0, 98.0], [449.0, 122.0], [194.0, 120.0]], ('43 Manchester Road', 0.9988968372344971)],
[[[228.0, 127.0], [416.0, 130.0], [416.0, 154.0], [228.0, 151.0]], ('12480 Brisbane', 0.9658010601997375)],
[[[267.0, 162.0], [375.0, 162.0], [375.0, 186.0], [267.0, 186.0]], ('Australia', 0.9997145533561707)],
[[[234.0, 193.0], [409.0, 193.0], [409.0, 216.0], [234.0, 216.0]], ('617-3236-6207', 0.9996874332427979)],
[[[46.0, 255.0], [308.0, 255.0], [308.0, 278.0], [46.0, 278.0]], ('Invoice 08000008', 0.9919923543930054)],
[[[466.0, 255.0], [598.0, 255.0], [598.0, 278.0], [466.0, 278.0]], ('09/04/08', 0.9994747042655945)],
[[[42.0, 283.0], [132.0, 283.0], [132.0, 311.0], [42.0, 311.0]], ('Table', 0.9969210624694824)],
[[[174.0, 283.0], [214.0, 283.0], [214.0, 311.0], [174.0, 311.0]], ('25', 0.9997891783714294)],
[[[514.0, 284.0], [601.0, 284.0], [601.0, 311.0], [514.0, 311.0]], ('12:45', 0.9964954257011414)],
[[[67.0, 346.0], [291.0, 349.0], [291.0, 376.0], [67.0, 374.0]], ('2 Carlsberg Bottle', 0.9987921118736267)],
[[[515.0, 346.0], [599.0, 346.0], [599.0, 372.0], [515.0, 372.0]], ('16.00', 0.9999278783798218)],
[
[[69.0, 385.0], [395.0, 387.0], [395.0, 411.0], [69.0, 409.0]],
('3 Heineken Draft Standard.', 0.9832896590232849)
],
[[[515.0, 384.0], [599.0, 384.0], [599.0, 409.0], [515.0, 409.0]], ('24.60', 0.9998160600662231)],
[
[[71.0, 423.0], [391.0, 423.0], [391.0, 446.0], [71.0, 446.0]],
('1 Heineken Draft Half Liter.', 0.9641079306602478)
],
[[[515.0, 421.0], [601.0, 421.0], [601.0, 450.0], [515.0, 450.0]], ('15.20', 0.9998868703842163)],
[
[[69.0, 460.0], [430.0, 461.0], [430.0, 485.0], [69.0, 484.0]],
('2 Carlsberg Bucket (5 bottles).', 0.974445641040802)
],
[[[515.0, 461.0], [599.0, 461.0], [599.0, 486.0], [515.0, 486.0]], ('80.00', 0.9999423027038574)],
[
[[69.0, 498.0], [367.0, 500.0], [367.0, 524.0], [69.0, 522.0]],
('4 Grilled Chicken Breast.', 0.9773013591766357)
],
[[[515.0, 499.0], [599.0, 499.0], [599.0, 524.0], [515.0, 524.0]], ('74.00', 0.9999669194221497)],
[[[68.0, 534.0], [250.0, 537.0], [250.0, 562.0], [68.0, 560.0]], ('3 Sirloin Steak', 0.9997309446334839)],
[[[515.0, 537.0], [599.0, 537.0], [599.0, 561.0], [515.0, 561.0]], ('96.00', 0.9999544024467468)],
[[[67.0, 571.0], [162.0, 574.0], [161.0, 601.0], [67.0, 598.0]], ('1 Coke', 0.9997830390930176)],
[[[530.0, 572.0], [602.0, 572.0], [602.0, 601.0], [530.0, 601.0]], ('3.50', 0.9999455213546753)],
[[[69.0, 609.0], [219.0, 613.0], [218.0, 638.0], [68.0, 634.0]], ('5 Ice Cream', 0.9914276003837585)],
[[[516.0, 611.0], [599.0, 611.0], [599.0, 637.0], [516.0, 637.0]], ('18.00', 0.9999335408210754)],
[[[154.0, 664.0], [288.0, 664.0], [288.0, 688.0], [154.0, 688.0]], ('Subtotal', 0.9990750551223755)],
[[[499.0, 664.0], [599.0, 664.0], [599.0, 688.0], [499.0, 688.0]], ('327.30', 0.9999768137931824)],
[
[[155.0, 701.0], [397.0, 701.0], [397.0, 724.0], [155.0, 724.0]],
('Sales/Gov Tax - 5%', 0.9552016854286194)
],
[[[514.0, 697.0], [601.0, 697.0], [601.0, 724.0], [514.0, 724.0]], ('16.36', 0.999823272228241)],
[
[[155.0, 733.0], [419.0, 733.0], [419.0, 757.0], [155.0, 757.0]],
('Service Charge - 10%', 0.9921379089355469)
],
[[[512.0, 728.0], [601.0, 731.0], [600.0, 759.0], [511.0, 757.0]], ('32.73', 0.9999620318412781)],
[[[154.0, 775.0], [335.0, 775.0], [335.0, 799.0], [154.0, 799.0]], ('GRAND TOTAL', 0.9899482131004333)],
[[[499.0, 778.0], [599.0, 778.0], [599.0, 802.0], [499.0, 802.0]], ('376.40', 0.9999797940254211)],
[[[39.0, 831.0], [223.0, 831.0], [223.0, 859.0], [39.0, 859.0]], ('Thank you and', 0.9922393560409546)],
[[[336.0, 831.0], [407.0, 831.0], [407.0, 860.0], [336.0, 860.0]], ('Cash', 0.9998616576194763)],
[[[499.0, 831.0], [601.0, 831.0], [601.0, 859.0], [499.0, 859.0]], ('400.00', 0.9998554587364197)],
[[[38.0, 866.0], [220.0, 862.0], [220.0, 891.0], [38.0, 895.0]], ('see you again!', 0.9798372983932495)],
[[[336.0, 864.0], [438.0, 869.0], [437.0, 898.0], [335.0, 894.0]], ('Change', 0.9998979568481445)],
[[[515.0, 867.0], [599.0, 867.0], [599.0, 892.0], [515.0, 892.0]], ('23.60', 0.9999337196350098)],
[[[37.0, 901.0], [108.0, 901.0], [108.0, 930.0], [37.0, 930.0]], ('John', 0.9990785717964172)],
[
[[73.0, 962.0], [569.0, 965.0], [569.0, 991.0], [73.0, 989.0]],
('Bring this bill back within the next 10 days', 0.9880552887916565)
],
[
[[50.0, 1000.0], [591.0, 1000.0], [591.0, 1023.0], [50.0, 1023.0]],
("and get 15% discount on that day's food bill..", 0.9851154685020447)
]
]
```
# prompt
f"""<|im_start|>system
You are POS receipt data expert, parse, detect, recognize and convert following receipt OCR image result into structure receipt data object.
Don't make up value not in the Input. Output must be a well-formed JSON object.```json<|im_end|>
<|im_start|>user
{receipt_boxes}<|im_end|>
<|im_start|>assistant
"""
# Result
```
<|im_start|>assistant
{
"store_name": "The Lone Pine",
"store_address": "43 Manchester Road",
"city": "Brisbane",
"country": "Australia",
"phone": "617-3236-6207",
"invoice_number": "08000008",
"invoice_date": "09/04/08",
"table_number": "25",
"time": "12:45",
"items": [
{
"item_name": "Carlsberg Bottle",
"quantity": "2",
"price": "16.00"
},
{
"item_name": "Heineken Draft Half Liter.",
"quantity": "1",
"price": "15.20"
},
{
"item_name": "Heineken Draft Standard.",
"quantity": "3",
"price": "12.00"
},
{
"item_name": "Sirloin Steak",
"quantity": "3",
"price": "96.00"
},
{
"item_name": "Grilled Chicken Breast",
"quantity": "4",
"price": "74.00"
},
{
"item_name": "Coke",
"quantity": "1",
"price": "3.50"
},
{
"item_name": "Ice Cream",
"quantity": "5",
"price": "18.00"
}
],
"subtotal": "327.30",
"tax": "16.36",
"service_charge": "32.73",
"total": "376.40",
"payment": {
"cash": "400.00",
"change": "23.60"
},
"customer": {
"name": "John"
},
"discount": "15%"
}<|im_end|></s>
```
# Uploaded model
- **Developed by:** mychen76
- **License:** apache-2.0
- **Finetuned from model :** unsloth/mistral-7b-instruct-v0.2-bnb-4bit
|