Handwriting-to-text conversion
Hi all. I'm looking for a model that can convert large quantities of handwritten pages (PDF or JPG) to text; I have >10K pages in English and French from a single author that I want to process. Ideally, it could be trained on this author's handwriting style to improve it's accuracy. I've found a few places that do this but for exhorbitant per-page fees. And I couldn't find anything relevant on HF. Could anyone here point me to a suitable solution? Thanks!
Hello, you could try one-shotting with base Qwen2-VL on an image of the handwriting, it works pretty well without fine-tuning. Otherwise you can finetune with labelled samples (I followed this guide https://swift.readthedocs.io/en/latest/Multi-Modal/qwen2-vl-best-practice.html#fine-tuning) of the handwriting, I can help if need be. There are some codebases on github for cursive handwriting that use LSTMs around. I finetuned this model on the IAM handwriting database (https://fki.tic.heia-fr.ch/databases/iam-handwriting-database) and this handwritten character dataset (https://github.com/skyatmoon/CHoiCe-Dataset).
Thanks @alxfgh . I'm looking into these useful ideas.
@alxfgh
below is what i am using
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
from peft import PeftModel, PeftConfig
import torch
model_id = "Qwen/Qwen2-VL-7B-Instruct"
model = Qwen2VLForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_id)
path_to_adapter = "./lora/checkpoint-50"
print("wel")
model = PeftModel.from_pretrained(model, path_to_adapter)
print("done")
but it provide me error like below
(self_attn): Qwen2VLSdpaAttention(
(q_proj): Linear(in_features=3584, out_features=3584, bias=True)
(k_proj): Linear(in_features=3584, out_features=512, bias=True)
(v_proj): Linear(in_features=3584, out_features=512, bias=True)
(o_proj): Linear(in_features=3584, out_features=3584, bias=False)
(rotary_emb): Qwen2VLRotaryEmbedding()
)
(mlp): Qwen2MLP(
(gate_proj): Linear(in_features=3584, out_features=18944, bias=False)
(up_proj): Linear(in_features=3584, out_features=18944, bias=False)
(down_proj): Linear(in_features=18944, out_features=3584, bias=False)
(act_fn): SiLU()
)
(input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
(post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
)
)
(norm): Qwen2RMSNorm((3584,), eps=1e-06)
(rotary_emb): Qwen2VLRotaryEmbedding()
) is not supported. Currently, only the following modules are supported: torch.nn.Linear
, torch.nn.Embedding
, torch.nn.Conv2d
, transformers.pytorch_utils.Conv1D
.