General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Haoran Wei*, Chenglong Liu*, Jinyue Chen, Jia Wang, Lingyu Kong, Yanming Xu, Zheng Ge, Liang Zhao, Jianjian Sun, Yuang Peng, Chunrui Han, Xiangyu Zhang
Usage
Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.10:
torch==2.0.1
torchvision==0.15.2
transformers==4.37.2
megfile==3.1.2
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('ucaslcl/GOT-OCR2_0', trust_remote_code=True)
model = AutoModel.from_pretrained('ucaslcl/GOT-OCR2_0', trust_remote_code=True, low_cpu_mem_usage=True, device_map='cuda', use_safetensors=True, pad_token_id=tokenizer.eos_token_id)
model = model.eval().cuda()
# input your test image
image_file = 'xxx.jpg'
# plain texts OCR
model.chat(tokenizer, image_file, ocr_type='ocr')
# format texts OCR:
model.chat(tokenizer, image_file, ocr_type='format')
# fine-grained OCR:
model.chat(tokenizer, image_file, ocr_type='ocr', ocr_box='')
model.chat(tokenizer, image_file, ocr_type='format', ocr_box='')
model.chat(tokenizer, image_file, ocr_type='ocr', ocr_color='')
model.chat(tokenizer, image_file, ocr_type='format', ocr_color='')
# multi-crop OCR:
res = model.chat_crop(tokenizer, image_file = image_file)
# render the formatted OCR results:
model.chat(tokenizer, image_file, ocr_type='format', ocr_box='', ocr_color='', render=True, save_render_file = './demo.html')
print(res)
https://huggingface.co/ucaslcl/GOT-OCR2_0/tree/main/assets#:~:text=a%20minute%20ago-,got_logo.png