vanshp123/ocrmnist · Hugging Face

# OCR with Hugging Face Transformers

This repository demonstrates how to perform Optical Character Recognition (OCR) using the Hugging Face Transformers library. The code in this repository utilizes a pretrained model for OCR on images.

Prerequisites

Before you can run the code, you'll need to install the required libraries. You can do this with pip:

pip install transformers
pip install pillow

Usage

You can use the provided code to perform OCR on images. Here are the basic steps:

Import the necessary libraries:

from transformers import VisionEncoderDecoderModel
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests

Load the pretrained OCR model and processor:

model = VisionEncoderDecoderModel.from_pretrained("vanshp123/ocrmnist")
processor = TrOCRProcessor.from_pretrained('microsoft/trocr-base-stage1')

Load an image for OCR. You can replace "/content/left_digit_section_4.png" with the path to your image:

image = Image.open("/content/left_digit_section_4.png").convert("RGB")

Process the image using the OCR processor and generate the text:

pixel_values = processor(images=image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

generated_text will contain the text recognized from the image.

Example

You can use this code as a starting point for your OCR projects. It's important to adapt it to your specific use case and customize it as needed.

License

This code uses models from the Hugging Face Transformers library, and you should review their licensing and usage terms for the pretrained models.