lioushz

Shz

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

updated a dataset 3 days ago

opencompass/AIME2025

updated a dataset 3 days ago

Shz/aime_tmp

View all activity

Organizations

Shz's activity

upvoted a paper 2 days ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published 2 days ago • 60

updated 2 datasets 3 days ago

opencompass/AIME2025

Viewer • Updated 3 days ago • 30 • 1.26k • 7

Shz/aime_tmp

Viewer • Updated 3 days ago • 30 • 14

published a dataset 3 days ago

Shz/aime_tmp

Viewer • Updated 3 days ago • 30 • 14

published a model 5 days ago

Shz/DeepSeek-R1-Distill-Qwen-1.5B-GRPO

Updated 5 days ago

liked a dataset 20 days ago

opencompass/AIME2025

Viewer • Updated 3 days ago • 30 • 1.26k • 7

published a dataset 20 days ago

opencompass/AIME2025

Viewer • Updated 3 days ago • 30 • 1.26k • 7

liked a dataset about 2 months ago

opencompass/LiveMathBench

Viewer • Updated 1 day ago • 283 • 524 • 4

upvoted a paper 2 months ago

Are Your LLMs Capable of Stable Reasoning?

Paper • 2412.13147 • Published Dec 17, 2024 • 92

updated a dataset 4 months ago

opencompass/mmmlu_lite

Viewer • Updated Nov 1, 2024 • 20k • 213 • 2

liked a dataset 4 months ago

opencompass/mmmlu_lite

Viewer • Updated Nov 1, 2024 • 20k • 213 • 2

upvoted a paper 4 months ago

CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution

Paper • 2410.16256 • Published Oct 21, 2024 • 60

liked a Space 4 months ago

101

Open VLM Video Leaderboard

🌎

VLMEvalKit Eval Results in video understanding benchmark

upvoted a paper 5 months ago

HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

Paper • 2409.16191 • Published Sep 24, 2024 • 42

liked a dataset 7 months ago

MU-NLPC/Calc-gsm8k

Viewer • Updated Oct 30, 2023 • 17.6k • 464 • 5

upvoted a paper 8 months ago

NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?

Paper • 2407.11963 • Published Jul 16, 2024 • 44

liked a Space 8 months ago

4.34k

OpenGPT 4o

🔥

GPT 4o like bot.

upvoted 2 papers 8 months ago

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

Paper • 2406.14515 • Published Jun 20, 2024 • 33

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs

Paper • 2406.14544 • Published Jun 20, 2024 • 35

liked a model about 2 years ago

valhalla/bart-large-finetuned-squadv1

Question Answering • Updated Jun 14, 2021 • 801 • 7