File size: 2,829 Bytes
d4ecb34 1b8f5e0 73ffd39 1b8f5e0 cd358bb 1b8f5e0 cd358bb 64495d8 cd358bb 1b8f5e0 cd358bb 73ffd39 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
<h1>General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
</h1>
[GitHub](https://github.com/Ucas-HaoranWei/GOT-OCR2.0/tree/main)
<h3><a href="">General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model</a></h3>
<a href="https://github.com/Ucas-HaoranWei/GOT-OCR2.0/"><img src="https://img.shields.io/badge/Project-Page-Green"></a>
<a href="https://arxiv.org/abs/2409.01704"><img src="https://img.shields.io/badge/Paper-PDF-orange"></a>
<a href="https://github.com/Ucas-HaoranWei/GOT-OCR2.0/blob/main/assets/wechat.jpg"><img src="https://img.shields.io/badge/Wechat-blue"></a>
<a href="https://github.com/Ucas-HaoranWei/GOT-OCR2.0/blob/main/assets/wechat2.jpg"><img src="https://img.shields.io/badge/Wechat2-blue"></a>
<a href="https://zhuanlan.zhihu.com/p/718163422"><img src="https://img.shields.io/badge/zhihu-red"></a>
[Haoran Wei*](https://scholar.google.com/citations?user=J4naK0MAAAAJ&hl=en), Chenglong Liu*, Jinyue Chen, Jia Wang, Lingyu Kong, Yanming Xu, [Zheng Ge](https://joker316701882.github.io/), Liang Zhao, [Jianjian Sun](https://scholar.google.com/citations?user=MVZrGkYAAAAJ&hl=en), [Yuang Peng](https://scholar.google.com.hk/citations?user=J0ko04IAAAAJ&hl=zh-CN&oi=ao), Chunrui Han, [Xiangyu Zhang](https://scholar.google.com/citations?user=yuB-cfoAAAAJ&hl=en)
<p align="center">
<img src="https://huggingface.co/ucaslcl/GOT-OCR2_0/tree/main/assets/got_logo.png" style="width: 200px" align=center>
</p>
## Usage
Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.10:
```
torch==2.0.1
torchvision==0.15.2
transformers==4.37.2
megfile==3.1.2
```
```python
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('ucaslcl/GOT-OCR2_0', trust_remote_code=True)
model = AutoModel.from_pretrained('ucaslcl/GOT-OCR2_0', trust_remote_code=True, low_cpu_mem_usage=True, device_map='cuda', use_safetensors=True, pad_token_id=tokenizer.eos_token_id)
model = model.eval().cuda()
# input your test image
image_file = 'xxx.jpg'
# plain texts OCR
model.chat(tokenizer, image_file, ocr_type='ocr')
# format texts OCR:
model.chat(tokenizer, image_file, ocr_type='format')
# fine-grained OCR:
model.chat(tokenizer, image_file, ocr_type='ocr', ocr_box='')
model.chat(tokenizer, image_file, ocr_type='format', ocr_box='')
model.chat(tokenizer, image_file, ocr_type='ocr', ocr_color='')
model.chat(tokenizer, image_file, ocr_type='format', ocr_color='')
# multi-crop OCR:
res = model.chat_crop(tokenizer, image_file = image_file)
# render the formatted OCR results:
model.chat(tokenizer, image_file, ocr_type='format', ocr_box='', ocr_color='', render=True, save_render_file = './demo.html')
print(res)
```
https://huggingface.co/ucaslcl/GOT-OCR2_0/tree/main/assets#:~:text=a%20minute%20ago-,got_logo.png |