Finetuning for a different language

#2
by jdchawla - opened

What would be the best way to fine-tune it on Korean? Is there any open source code specifically for the Qwen2-VL backbone?
The tevatron repo has code for Phi-3 but not Qwen...
I have prepared a dataset ~500k rows of image-query pairs and want to fine-tune this model, but I'm not sure what would be the right way to do it.

Owner

thanks for your interest. I can update the code tomorrow for qwen and let you know

thank you very much!

Owner

Hi @jdchawla , I have updated the basic qwen code (see the dse/qwen example folder in tevatron), however, I recently do not have many compute resources to test it well, feel free to open issue or pull request.

For your specific use case, I suggest the following training procedures:

  1. encode your images with MrLight/dse-qwen2-2b-mrl-v1 by following the example code for document encoding.
  • prepare your corpus dataset like Tevatron/wiki-ss-corpus
  1. encode all your queries with MrLight/dse-qwen2-2b-mrl-v1 by following the example code for query encoding
  • prepare your query dataset like Tevatron/wiki-ss-nq
  • (no need to deal with positive docs and negative docs for now)
  1. Do search using above query and passage representations, by following the search code

The above steps will gives you retrieval results using MrLight/dse-qwen2-2b-mrl-v1 for your queries over your doc images , here the main purpose is to do hard negative mining. (I assume our qwen can do some good zeroshot retrieval on Korean task )

  1. Now, based on the retrieval results, you can create your training data, query, positive documents, negative_docuements in the format of Tevatron/wiki-ss-nq. By putting the paired document id into positive, and negative document id into negative passages. (leave the text field as empty, we only need docid here)

  2. train your model by following the training example.

I will work on making the Tevatron toolkit more friendly for multimodal in the following weeks, but meanwhile, feel free to ping me if there is further question.

Owner

btw. when I initially finetune qwen, I turned off backward propagation for the visual encoder to save VRAM usage. If qwen it self has good enough performance on Korean document OCR etc, I think its ok to turn off the backward propagation for the visual encoder.

Thank you so much for the code and the insights!

Hi @MrLight , thanks for updated code! How did you train for MRL? In the code I can see just the matmul between the full sized embeddings

Sign up or log in to comment