VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search Paper • 2503.10582 • Published 3 days ago • 16
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning Paper • 2503.10291 • Published 3 days ago • 28
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing Paper • 2503.10639 • Published 3 days ago • 37
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning Paper • 2503.05379 • Published 9 days ago • 31
VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering Paper • 2503.06492 • Published 7 days ago • 9
Gemini Embedding: Generalizable Embeddings from Gemini Paper • 2503.07891 • Published 6 days ago • 28