new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

Apr 11

Submitted by

teowu

Kimi-VL Technical Report

·
92 authors

2

Submitted by

aditi184

DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

·
17 authors

4

Submitted by

zhoutianyi

C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing

·
3 authors

2

Submitted by

Lin-Chen

VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning

·
10 authors

2

Submitted by

lzyhha

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

·
8 authors

2

Submitted by

ChrisDing1105

MM-IFEngine: Towards Multimodal Instruction Following

·
10 authors

2

Submitted by

yhyang-myron

HoloPart: Generative 3D Part Amodal Segmentation

·
8 authors

2

Submitted by

akhaliq

Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models

·
6 authors

Submitted by

salmannyu

MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations

·
5 authors

2

Submitted by

russwang

SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement

·
9 authors

2

Submitted by

Franck-Dernoncourt

Towards Visual Text Grounding of Multimodal Large Language Model

·
9 authors

2

Submitted by

jzr99

Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction

·
5 authors

2

Submitted by

RishubhPar

MonoPlace3D: Learning 3D-Aware Object Placement for 3D Monocular Detection

·
5 authors

2

Submitted by

RishubhPar

Compass Control: Multi Object Orientation Control for Text-to-Image Generation

·
4 authors

2

Submitted by

nielsr

TAPNext: Tracking Any Point (TAP) as Next Token Prediction

·
10 authors