OCR on image

#28
by glitchyordis - opened

Obtaining key information is quite straightforward but Is there a way to obtain bbox locations from texts detected?

glitchyordis changed discussion title from OCR text to OCR on image

You can prompt the model to return bbox locations (see here: https://huggingface.co/spaces/maxiw/Qwen2-VL-Detection). I also tried "detect all texts" but the results are not super precise.

I tried OCR on a not-that-clear text screenshot, it's working nearly perfectly. But the model seems not good at recognize twisted text. E.g. words on bottle.

This comment has been hidden

Sign up or log in to comment