Bringing MLLMs into Embodied World
VideoRefer x VideoLLaMA3
Frontier Foundation Models for Video Understanding
VideoLLaMA2-AV
Media understanding