DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper โข 2501.12948 โข Published Jan 22 โข 346
SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces Paper โข 2501.09756 โข Published Jan 16 โข 19
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models Paper โข 2501.02955 โข Published Jan 6 โข 40
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper โข 2501.01957 โข Published Jan 3 โข 42
SDPO: Segment-Level Direct Preference Optimization for Social Agents Paper โข 2501.01821 โข Published Jan 3 โข 18
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation Paper โข 2410.23090 โข Published Oct 30, 2024 โข 54
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching Paper โข 2410.06885 โข Published Oct 9, 2024 โข 44