PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published 22 days ago • 118
Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models Paper • 2407.19474 • Published Jul 28 • 23
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision Paper • 2407.06189 • Published Jul 8 • 25
DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published Jun 17 • 50
DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published Jun 17 • 50
Leaving Reality to Imagination: Robust Classification via Generated Datasets Paper • 2302.02503 • Published Feb 5, 2023
ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling Paper • 2307.01909 • Published Jul 4, 2023
Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models Paper • 2308.15812 • Published Aug 30, 2023 • 1
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts Paper • 2310.02255 • Published Oct 3, 2023 • 2
Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation Paper • 2305.14327 • Published May 23, 2023
ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models Paper • 2401.13311 • Published Jan 24 • 10
VideoCon: Robust Video-Language Alignment via Contrast Captions Paper • 2311.10111 • Published Nov 15, 2023 • 7
VideoCon: Robust Video-Language Alignment via Contrast Captions Paper • 2311.10111 • Published Nov 15, 2023 • 7
CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning Paper • 2303.03323 • Published Mar 6, 2023 • 1
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use Paper • 2308.06595 • Published Aug 12, 2023 • 5
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use Paper • 2308.06595 • Published Aug 12, 2023 • 5