How to finetune using DPO?
Hello,
I have a standard DPO dataset with columns for images, rejected points, and chosen points, containing 2D coordinates for GUI visual grounding tasks. What prompt format is needed to correctly train the model using the DPO technique? The paper mentions that a 2D PixMo-Points dataset was used to train the model, but could you clarify the exact approach?
Hello @Maverick17 , we are releasing paper with complete details of dataset, training and evaluation shortly.
Hello @amanrangapur , shortly means by the end of this week or by the end of november? :)
I'm really looking forward to the release of the dataset, training and eval. scripts!
Hi @Maverick17 , I mean last week of November..
Hello @amanrangapur , what is the state of data release? We are entering the end of November :)
Hey @Maverick17 , we're planning to release this week. Stay tuned.
@amanrangapur Seems you guys are still not ready...
Hi
@Maverick17
, dataset is out(subset) check this: https://huggingface.co/collections/allenai/pixmo-674746ea613028006285687b
Training, evals, checkpoints are here: https://github.com/allenai/molmo