1 1

Jifeng Dai

daijifeng

https://jifengdai.org/

AI & ML interests

None yet

Recent Activity

authored a paper 9 days ago

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

authored a paper 16 days ago

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

authored a paper about 1 month ago

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

View all activity

Organizations

None yet

daijifeng's activity

authored a paper 9 days ago

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Paper • 2412.09604 • Published 13 days ago • 35

authored a paper 16 days ago

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published 19 days ago • 121

authored a paper about 1 month ago

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15 • 68

authored a paper 2 months ago

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Paper • 2410.13861 • Published Oct 17 • 52

authored a paper 5 months ago

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Paper • 2408.02718 • Published Aug 5 • 60

authored 2 papers 6 months ago

Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams

Paper • 2406.08085 • Published Jun 12 • 13

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Paper • 2407.03320 • Published Jul 3 • 93

authored a paper 8 months ago

Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments

Paper • 2403.13803 • Published Mar 20

liked a model 8 months ago

OpenGVLab/InternVL-Chat-V1-5

Image-Text-to-Text • Updated 7 days ago • 2.47k • 405

authored a paper 11 months ago

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

Paper • 2401.15977 • Published Jan 29 • 37

authored a paper about 1 year ago

ControlLLM: Augment Language Models with Tools by Searching on Graphs

Paper • 2310.17796 • Published Oct 26, 2023 • 17

authored 4 papers over 1 year ago

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

Paper • 2308.01907 • Published Aug 3, 2023 • 11

Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments via Large Language Models with Text-based Knowledge and Memory

Paper • 2305.17144 • Published May 25, 2023 • 2

InternChat: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language

Paper • 2305.05662 • Published May 9, 2023 • 4

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Paper • 2305.11175 • Published May 18, 2023 • 3