Handwriting-to-text conversion

by MangoTore1 - opened Sep 30

Sep 30

Hi all. I'm looking for a model that can convert large quantities of handwritten pages (PDF or JPG) to text; I have >10K pages in English and French from a single author that I want to process. Ideally, it could be trained on this author's handwriting style to improve it's accuracy. I've found a few places that do this but for exhorbitant per-page fees. And I couldn't find anything relevant on HF. Could anyone here point me to a suitable solution? Thanks!

alxfgh

Owner Oct 1

Hello, you could try one-shotting with base Qwen2-VL on an image of the handwriting, it works pretty well without fine-tuning. Otherwise you can finetune with labelled samples (I followed this guide https://swift.readthedocs.io/en/latest/Multi-Modal/qwen2-vl-best-practice.html#fine-tuning) of the handwriting, I can help if need be. There are some codebases on github for cursive handwriting that use LSTMs around. I finetuned this model on the IAM handwriting database (https://fki.tic.heia-fr.ch/databases/iam-handwriting-database) and this handwritten character dataset (https://github.com/skyatmoon/CHoiCe-Dataset).

MangoTore1

Oct 3

Thanks @alxfgh . I'm looking into these useful ideas.

suraj007

30 days ago

@alxfgh hi iam try to load this adapter with qwen/Qwen2-VL-7B-Instruct but certain error is throwing, can you provide me a sample code to run the modal with your adapter. please

suraj007

30 days ago

@alxfgh below is what i am using
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
from peft import PeftModel, PeftConfig
import torch

model_id = "Qwen/Qwen2-VL-7B-Instruct"
model = Qwen2VLForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_id)

path_to_adapter = "./lora/checkpoint-50"
print("wel")
model = PeftModel.from_pretrained(model, path_to_adapter)
print("done")

but it provide me error like below

(self_attn): Qwen2VLSdpaAttention(
    (q_proj): Linear(in_features=3584, out_features=3584, bias=True)
    (k_proj): Linear(in_features=3584, out_features=512, bias=True)
    (v_proj): Linear(in_features=3584, out_features=512, bias=True)
    (o_proj): Linear(in_features=3584, out_features=3584, bias=False)
    (rotary_emb): Qwen2VLRotaryEmbedding()
  )
  (mlp): Qwen2MLP(
    (gate_proj): Linear(in_features=3584, out_features=18944, bias=False)
    (up_proj): Linear(in_features=3584, out_features=18944, bias=False)
    (down_proj): Linear(in_features=18944, out_features=3584, bias=False)
    (act_fn): SiLU()
  )
  (input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
  (post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
)

)
(norm): Qwen2RMSNorm((3584,), eps=1e-06)
(rotary_emb): Qwen2VLRotaryEmbedding()
) is not supported. Currently, only the following modules are supported: torch.nn.Linear, torch.nn.Embedding, torch.nn.Conv2d, transformers.pytorch_utils.Conv1D.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment