File size: 2,603 Bytes
21961b7
 
 
 
 
 
 
 
 
 
 
 
 
 
6f8663f
21961b7
5074dda
 
 
 
1f5289b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
003fb78
 
21961b7
003fb78
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21961b7
5074dda
 
 
 
 
 
 
003fb78
21961b7
 
 
 
 
003fb78
21961b7
 
 
 
6f8663f
21961b7
6f8663f
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
language:
- ko
pipeline_tag: image-to-text
---

# **deplot_kr**

deplot_kr is a Image-to-Data(Text) model based on the google's pix2struct architecture.
It was fine-tuned from [DePlot](https://huggingface.co/google/deplot), using korean chart image-text pairs.

deplot_kr์€ google์˜ pix2struct ๊ตฌ์กฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ํ•œ๊ตญ์–ด image-to-data(ํ…์ŠคํŠธ ํ˜•ํƒœ์˜ ๋ฐ์ดํ„ฐ ํ…Œ์ด๋ธ”) ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
[DePlot](https://huggingface.co/google/deplot) ๋ชจ๋ธ์„ ํ•œ๊ตญ์–ด ์ฐจํŠธ ์ด๋ฏธ์ง€-ํ…์ŠคํŠธ ์Œ ๋ฐ์ดํ„ฐ์„ธํŠธ(30๋งŒ ๊ฐœ)๋ฅผ ์ด์šฉํ•˜์—ฌ fine-tuning ํ–ˆ์Šต๋‹ˆ๋‹ค.

## **How to use**

You can run a prediction by input an image.    
Model predict the data table of text form in the image.    
    
์ด๋ฏธ์ง€๋ฅผ ๋ชจ๋ธ์— ์ž…๋ ฅํ•˜๋ฉด ๋ชจ๋ธ์€ ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ ํ‘œ ํ˜•ํƒœ์˜ ๋ฐ์ดํ„ฐ ํ…Œ์ด๋ธ”์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.    

```python
from transformers import Pix2StructForConditionalGeneration, Pix2StructImageProcessor, AutoTokenizer, Pix2StructProcessor
from PIL import Image

image_processor = Pix2StructImageProcessor()
tokenizer = AutoTokenizer.from_pretrained("brainventures/deplot_kr")
processor = Pix2StructProcessor(image_processor=image_processor, tokenizer=tokenizer)

model = Pix2StructForConditionalGeneration.from_pretrained("brainventures/deplot_kr")

image_path = "IMAGE_PATH"
image = Image.open(image_path)

inputs = processor(images=image, return_tensors="pt")
pred = model.generate(flattened_patches=flattened_patches, attention_mask=attention_mask, max_length=1024)
print(processor.batch_decode(deplot_generated_ids, skip_special_token=True)[0])

```

**Model Input Image**
![model_input_image](./sample.jpg)

**Model Output - Prediction**

๋Œ€์ƒ:     
์ œ๋ชฉ: 2011-2021 ๋ณด๊ฑด๋ณต์ง€ ๋ถ„์•ผ ์ผ์ž๋ฆฌ์˜ <unk>์ฆ    
์œ ํ˜•: ๋‹จ์ผํ˜• ์ผ๋ฐ˜ ์„ธ๋กœ <unk>๋Œ€ํ˜•    
| ๋ณด๊ฑด(์ฒœ ๋ช…) | ๋ณต์ง€(์ฒœ ๋ช…)    
1๋ถ„์œ„ | 29.7 | 178.4    
2๋ถ„์œ„ | 70.8 | 97.3    
3๋ถ„์œ„ | 86.4 | 61.3    
4๋ถ„์œ„ | 28.2 | 16.0    
5๋ถ„์œ„ | 52.3 | 0.9    
     
     

### **Preprocessing**

According to [Liu et al.(2023)](https://arxiv.org/pdf/2212.10505.pdf)...     

- markdown format
- | : seperating cells (์—ด ๊ตฌ๋ถ„)
- \n : seperating rows (ํ–‰ ๊ตฌ๋ถ„)       


### **Train**

The model was trained in a TPU environment.
- num_warmup_steps : 1,000
- num_training_steps : 40,000 

## **Evaluation Results**

This model achieves the following results:

|metrics name | % |
|:---|---:|
| RNSS (Relative Number Set Similarity)| 99.5483 |
| RMS F1 (Relative Mapping Similarity)| 16.6401 |

## Contact

For questions and comments, please use the discussion tab or email [email protected]