OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Paper โข 2502.18411 โข Published 2 days ago โข 60
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution Paper โข 2410.16256 โข Published Oct 21, 2024 โข 60
Running 101 101 Open VLM Video Leaderboard ๐ VLMEvalKit Eval Results in video understanding benchmark
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models Paper โข 2409.16191 โข Published Sep 24, 2024 โข 42
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? Paper โข 2407.11963 โข Published Jul 16, 2024 โข 44
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding Paper โข 2406.14515 โข Published Jun 20, 2024 โข 33
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs Paper โข 2406.14544 โข Published Jun 20, 2024 โข 35