# OCR with Hugging Face Transformers
This repository demonstrates how to perform Optical Character Recognition (OCR) using the Hugging Face Transformers library. The code in this repository utilizes a pretrained model for OCR on images.
Prerequisites
Before you can run the code, you'll need to install the required libraries. You can do this with pip
:
pip install transformers
pip install pillow
Usage
You can use the provided code to perform OCR on images. Here are the basic steps:
- Import the necessary libraries:
from transformers import VisionEncoderDecoderModel
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests
- Load the pretrained OCR model and processor:
model = VisionEncoderDecoderModel.from_pretrained("vanshp123/ocrmnist")
processor = TrOCRProcessor.from_pretrained('microsoft/trocr-base-stage1')
- Load an image for OCR. You can replace
"/content/left_digit_section_4.png"
with the path to your image:
image = Image.open("/content/left_digit_section_4.png").convert("RGB")
- Process the image using the OCR processor and generate the text:
pixel_values = processor(images=image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
generated_text
will contain the text recognized from the image.
Example
You can use this code as a starting point for your OCR projects. It's important to adapt it to your specific use case and customize it as needed.
License
This code uses models from the Hugging Face Transformers library, and you should review their licensing and usage terms for the pretrained models.
- Downloads last month
- 32
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.