Towards the Aha Moment of Vision-Language Models

Multi-modal Multilingual Instruction
university
AI & ML interests
None defined yet.
Recent Activity
Collections
1
models
9

MMInstruction/Qwen2-VL-72B-Video-T3
Updated
•
5

MMInstruction/Giraffe
Updated
•
17
•
2

MMInstruction/LongVA-7B-Video-T3
Updated
•
14

MMInstruction/Qwen-VL-ArXivCap
Text Generation
•
Updated
•
24
•
4

MMInstruction/Qwen-VL-ArXivQA
Text Generation
•
Updated
•
32
•
4

MMInstruction/Silkie
Text Generation
•
Updated
•
28
•
12

MMInstruction/YingVLM
Updated
•
26
•
1

MMInstruction/YingVLM-zh
Updated
•
10

MMInstruction/YingVLM-Video
Updated
•
10
datasets
15
MMInstruction/Video-T3-QA
Viewer
•
Updated
•
162k
•
213
•
1
MMInstruction/SuperClevr_Val
Viewer
•
Updated
•
5k
•
187
MMInstruction/Clevr_CoGenT_TrainA_R1
Viewer
•
Updated
•
37.8k
•
2.79k
•
38
MMInstruction/Clevr_CoGenT_TrainA_70K_Complex
Viewer
•
Updated
•
70k
•
2.29k
•
3
MMInstruction/Clevr_CoGenT_ValB
Viewer
•
Updated
•
5k
•
81
•
1
MMInstruction/Clevr_CoGenT_ValA
Viewer
•
Updated
•
5k
•
87
MMInstruction/Clevr_CoAgent_TrainA_R1
Viewer
•
Updated
•
2.5k
•
55
MMInstruction/VL-RewardBench
Viewer
•
Updated
•
1.25k
•
1.12k
•
6
MMInstruction/RedTeamingVLM
Updated
•
3.4k
•
14
MMInstruction/VLFeedback
Viewer
•
Updated
•
80.3k
•
655
•
46