mbiswas/smolvlm-point-new-modality-new-instruction-epoch2
Updated
β’
22
Hi thanks a lot sharing, I tried a similar approach for making the vlm point to objects in the image, in x y co ordinates using the pixmo points dataset. But inspite of training on around 20k subset of the dataset, the model just produces random x y values and is not improving the reward at all beyond a certain point. I am using a format reward similar to you, and the distance between predicted point and truth as reward I.e. exp(-distance) . It just doesnβt work!! Do you have any insights why it doesnβt work for pointing ? I used qwen2vl 2b.