File size: 2,603 Bytes
21961b7 6f8663f 21961b7 5074dda 1f5289b 003fb78 21961b7 003fb78 21961b7 5074dda 003fb78 21961b7 003fb78 21961b7 6f8663f 21961b7 6f8663f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
---
language:
- ko
pipeline_tag: image-to-text
---
# **deplot_kr**
deplot_kr is a Image-to-Data(Text) model based on the google's pix2struct architecture.
It was fine-tuned from [DePlot](https://huggingface.co/google/deplot), using korean chart image-text pairs.
deplot_kr์ google์ pix2struct ๊ตฌ์กฐ๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ํ ํ๊ตญ์ด image-to-data(ํ
์คํธ ํํ์ ๋ฐ์ดํฐ ํ
์ด๋ธ) ๋ชจ๋ธ์
๋๋ค.
[DePlot](https://huggingface.co/google/deplot) ๋ชจ๋ธ์ ํ๊ตญ์ด ์ฐจํธ ์ด๋ฏธ์ง-ํ
์คํธ ์ ๋ฐ์ดํฐ์ธํธ(30๋ง ๊ฐ)๋ฅผ ์ด์ฉํ์ฌ fine-tuning ํ์ต๋๋ค.
## **How to use**
You can run a prediction by input an image.
Model predict the data table of text form in the image.
์ด๋ฏธ์ง๋ฅผ ๋ชจ๋ธ์ ์
๋ ฅํ๋ฉด ๋ชจ๋ธ์ ์ด๋ฏธ์ง๋ก๋ถํฐ ํ ํํ์ ๋ฐ์ดํฐ ํ
์ด๋ธ์ ์์ธกํฉ๋๋ค.
```python
from transformers import Pix2StructForConditionalGeneration, Pix2StructImageProcessor, AutoTokenizer, Pix2StructProcessor
from PIL import Image
image_processor = Pix2StructImageProcessor()
tokenizer = AutoTokenizer.from_pretrained("brainventures/deplot_kr")
processor = Pix2StructProcessor(image_processor=image_processor, tokenizer=tokenizer)
model = Pix2StructForConditionalGeneration.from_pretrained("brainventures/deplot_kr")
image_path = "IMAGE_PATH"
image = Image.open(image_path)
inputs = processor(images=image, return_tensors="pt")
pred = model.generate(flattened_patches=flattened_patches, attention_mask=attention_mask, max_length=1024)
print(processor.batch_decode(deplot_generated_ids, skip_special_token=True)[0])
```
**Model Input Image**
![model_input_image](./sample.jpg)
**Model Output - Prediction**
๋์:
์ ๋ชฉ: 2011-2021 ๋ณด๊ฑด๋ณต์ง ๋ถ์ผ ์ผ์๋ฆฌ์ <unk>์ฆ
์ ํ: ๋จ์ผํ ์ผ๋ฐ ์ธ๋ก <unk>๋ํ
| ๋ณด๊ฑด(์ฒ ๋ช
) | ๋ณต์ง(์ฒ ๋ช
)
1๋ถ์ | 29.7 | 178.4
2๋ถ์ | 70.8 | 97.3
3๋ถ์ | 86.4 | 61.3
4๋ถ์ | 28.2 | 16.0
5๋ถ์ | 52.3 | 0.9
### **Preprocessing**
According to [Liu et al.(2023)](https://arxiv.org/pdf/2212.10505.pdf)...
- markdown format
- | : seperating cells (์ด ๊ตฌ๋ถ)
- \n : seperating rows (ํ ๊ตฌ๋ถ)
### **Train**
The model was trained in a TPU environment.
- num_warmup_steps : 1,000
- num_training_steps : 40,000
## **Evaluation Results**
This model achieves the following results:
|metrics name | % |
|:---|---:|
| RNSS (Relative Number Set Similarity)| 99.5483 |
| RMS F1 (Relative Mapping Similarity)| 16.6401 |
## Contact
For questions and comments, please use the discussion tab or email [email protected] |