|
--- |
|
language: |
|
- ko |
|
pipeline_tag: image-to-text |
|
--- |
|
|
|
# **deplot_kr** |
|
|
|
deplot_kr is a Image-to-Data(Text) model based on the google's pix2struct architecture. |
|
It was fine-tuned from [DePlot](https://huggingface.co/google/deplot), using korean chart image-text pairs. |
|
|
|
deplot_kr์ google์ pix2struct ๊ตฌ์กฐ๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ํ ํ๊ตญ์ด image-to-data(ํ
์คํธ ํํ์ ๋ฐ์ดํฐ ํ
์ด๋ธ) ๋ชจ๋ธ์
๋๋ค. |
|
[DePlot](https://huggingface.co/google/deplot) ๋ชจ๋ธ์ ํ๊ตญ์ด ์ฐจํธ ์ด๋ฏธ์ง-ํ
์คํธ ์ ๋ฐ์ดํฐ์ธํธ(30๋ง ๊ฐ)๋ฅผ ์ด์ฉํ์ฌ fine-tuning ํ์ต๋๋ค. |
|
|
|
## **How to use** |
|
|
|
You can run a prediction by input an image. |
|
Model predict the data table of text form in the image. |
|
|
|
์ด๋ฏธ์ง๋ฅผ ๋ชจ๋ธ์ ์
๋ ฅํ๋ฉด ๋ชจ๋ธ์ ์ด๋ฏธ์ง๋ก๋ถํฐ ํ ํํ์ ๋ฐ์ดํฐ ํ
์ด๋ธ์ ์์ธกํฉ๋๋ค. |
|
|
|
```python |
|
from transformers import Pix2StructForConditionalGeneration, Pix2StructImageProcessor, AutoTokenizer, Pix2StructProcessor |
|
from PIL import Image |
|
|
|
image_processor = Pix2StructImageProcessor() |
|
tokenizer = AutoTokenizer.from_pretrained("brainventures/deplot_kr") |
|
processor = Pix2StructProcessor(image_processor=image_processor, tokenizer=tokenizer) |
|
|
|
model = Pix2StructForConditionalGeneration.from_pretrained("brainventures/deplot_kr") |
|
|
|
image_path = "IMAGE_PATH" |
|
image = Image.open(image_path) |
|
|
|
inputs = processor(images=image, return_tensors="pt") |
|
pred = model.generate(flattened_patches=flattened_patches, attention_mask=attention_mask, max_length=1024) |
|
print(processor.batch_decode(deplot_generated_ids, skip_special_token=True)[0]) |
|
|
|
``` |
|
|
|
**Model Input Image** |
|
![model_input_image](./sample.jpg) |
|
|
|
**Model Output - Prediction** |
|
|
|
๋์: |
|
์ ๋ชฉ: 2011-2021 ๋ณด๊ฑด๋ณต์ง ๋ถ์ผ ์ผ์๋ฆฌ์ <unk>์ฆ |
|
์ ํ: ๋จ์ผํ ์ผ๋ฐ ์ธ๋ก <unk>๋ํ |
|
| ๋ณด๊ฑด(์ฒ ๋ช
) | ๋ณต์ง(์ฒ ๋ช
) |
|
1๋ถ์ | 29.7 | 178.4 |
|
2๋ถ์ | 70.8 | 97.3 |
|
3๋ถ์ | 86.4 | 61.3 |
|
4๋ถ์ | 28.2 | 16.0 |
|
5๋ถ์ | 52.3 | 0.9 |
|
|
|
|
|
|
|
### **Preprocessing** |
|
|
|
According to [Liu et al.(2023)](https://arxiv.org/pdf/2212.10505.pdf)... |
|
|
|
- markdown format |
|
- | : seperating cells (์ด ๊ตฌ๋ถ) |
|
- \n : seperating rows (ํ ๊ตฌ๋ถ) |
|
|
|
|
|
### **Train** |
|
|
|
The model was trained in a TPU environment. |
|
- num_warmup_steps : 1,000 |
|
- num_training_steps : 40,000 |
|
|
|
## **Evaluation Results** |
|
|
|
This model achieves the following results: |
|
|
|
|metrics name | % | |
|
|:---|---:| |
|
| RNSS (Relative Number Set Similarity)| 99.5483 | |
|
| RMS F1 (Relative Mapping Similarity)| 16.6401 | |
|
|
|
## Contact |
|
|
|
For questions and comments, please use the discussion tab or email [email protected] |