InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding Paper • 2403.15377 • Published Mar 22, 2024 • 22
InternVideo: General Video Foundation Models via Generative and Discriminative Learning Paper • 2212.03191 • Published Dec 6, 2022
Memory-and-Anticipation Transformer for Online Action Understanding Paper • 2308.07893 • Published Aug 15, 2023
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Paper • 2312.14238 • Published Dec 21, 2023 • 18
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark Paper • 2311.17005 • Published Nov 28, 2023 • 2
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding Paper • 2403.09626 • Published Mar 14, 2024 • 13