MMMU

non-profit

https://mmmu-benchmark.github.io/

MMMU-Benchmark

Activity Feed Request to join this org

AI & ML interests

Multimodal Model Evaluation

Recent Activity

zhangysk authored a paper about 3 hours ago

SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models

zhangysk authored a paper about 3 hours ago

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

zhangysk authored a paper about 3 hours ago

Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?

View all activity

MMMU's activity

zhangysk

authored 3 papers about 3 hours ago

SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models

Paper • 2502.13059 • Published 9 days ago

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published 7 days ago • 91

Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?

Paper • 2502.19361 • Published about 18 hours ago • 12

zhangysk

authored 2 papers 2 days ago

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models

Paper • 2502.16614 • Published 4 days ago • 22

Audio-FLAN: A Preliminary Release

Paper • 2502.16584 • Published 4 days ago • 30

a43992899

authored a paper 2 days ago

Audio-FLAN: A Preliminary Release

Paper • 2502.16584 • Published 4 days ago • 30

yuanshengni

updated a dataset 4 days ago

MMMU/MMMU_Pro

Viewer • Updated 4 days ago • 5.19k • 6.31k • 22

yuanshengni

authored a paper 6 days ago

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published 7 days ago • 91

aaabiao

authored a paper 6 days ago

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published 7 days ago • 91

yuanshengni

in MMMU/MMMU_Pro 14 days ago

ValueError: BuilderConfig 'standard' not found. Available: ['standard (10 options)', 'standard (4 options)', 'vision']

#5 opened 3 months ago by

shilinxu

aaabiao

authored a paper 16 days ago

Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM

Paper • 2502.06635 • Published 17 days ago • 4

zhangysk

authored 4 papers 17 days ago

yuexiang96

authored a paper 21 days ago

Demystifying Long Chain-of-Thought Reasoning in LLMs

Paper • 2502.03373 • Published 22 days ago • 54

gneubig

authored a paper 21 days ago

Demystifying Long Chain-of-Thought Reasoning in LLMs

Paper • 2502.03373 • Published 22 days ago • 54

DongfuJiang

authored a paper 22 days ago

ACECODER: Acing Coder RL via Automated Test-Case Synthesis

Paper • 2502.01718 • Published 24 days ago • 28

RLSNLP

authored 2 papers 23 days ago

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Paper • 2311.16502 • Published Nov 27, 2023 • 35

iDesigner: A High-Resolution and Complex-Prompt Following Text-to-Image Diffusion Model for Interior Design

Paper • 2312.04326 • Published Dec 7, 2023 • 3

AI & ML interests

Recent Activity

Team members 17

MMMU's activity

ValueError: BuilderConfig 'standard' not found. Available: ['standard (10 options)', 'standard (4 options)', 'vision']