|
--- |
|
library_name: transformers |
|
language: |
|
- ko |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
|
|
|
|
## Model Details |
|
์ค ๋จ์๋ก ์์์ด ํฌํจ๋ ๊ธ์๋ฅผ ์ธ์ ๋ชจ๋ธ์
๋๋ค. |
|
[microsoft TrOCR-large](https://huggingface.co/microsoft/trocr-large-handwritten) ๋ชจ๋ธ์ ๊ธฐ๋ฐ์ผ๋ก ํ๊ตญ์ด + latex ๋ฐ์ดํฐ์
finetuning ํ์ต๋๋ค. |
|
์ค ๋จ์๋ก ์ด๋ฏธ์ง๋ฅผ cropํ๋ ๋ณ๋์ detector๊ฐ ํ์ํฉ๋๋ค. |
|
## Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
### Direct Use |
|
|
|
```python |
|
from PIL import Image |
|
import glob |
|
from transformers import TrOCRProcessor, VisionEncoderDecoderModel |
|
import torch |
|
import IPython.display as ipd |
|
|
|
## ์ด๋ฏธ์ง ์ค๋น |
|
img_path_list = sorted(glob.glob('images/mathematical_expression_2-*.png')) |
|
img_list = [Image.open(img_path).convert("RGB") for img_path in img_path_list] |
|
|
|
## ๋ชจ๋ธ ๋ฐ ํ๋ก์ธ์ ์ค๋น |
|
model_path = 'TeamUNIVA/23MATHQ_TrOCR-large' |
|
processor = TrOCRProcessor.from_pretrained(model_path) |
|
model = VisionEncoderDecoderModel.from_pretrained(model_path) |
|
model.eval() |
|
|
|
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') |
|
model.to(device) |
|
|
|
processor.feature_extractor.size = model.config.encoder.image_size |
|
|
|
gc = model.generation_config |
|
gc.max_length = 128 |
|
gc.early_stopping = True |
|
gc.no_repeat_ngram_size = 3 |
|
gc.length_penalty = 2.0 |
|
gc.num_beams = 4 |
|
gc.eos_token_id = processor.tokenizer.sep_token_id |
|
|
|
## TrOCR ์ถ๋ก |
|
pixel_values = processor(img_list, return_tensors="pt").pixel_values |
|
generated_ids = model.generate(pixel_values.to(model.device), pad_token_id=processor.tokenizer.eos_token_id) |
|
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True) |
|
|
|
for img,text in zip(img_list, generated_text): |
|
ipd.display(img) |
|
print(text) |
|
|
|
``` |
|
### Result example |
|
<img src="./images/result.png" alt="Result" width="1200"/> |
|
|
|
### BibTeX entry and citation info |
|
``` |
|
@misc{li2021trocr, |
|
title={TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models}, |
|
author={Minghao Li and Tengchao Lv and Lei Cui and Yijuan Lu and Dinei Florencio and Cha Zhang and Zhoujun Li and Furu Wei}, |
|
year={2021}, |
|
eprint={2109.10282}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |