VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos Paper • 2411.04923 • Published 9 days ago • 20
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models Paper • 2311.13435 • Published Nov 22, 2023 • 16
TokenFlow: Consistent Diffusion Features for Consistent Video Editing Paper • 2307.10373 • Published Jul 19, 2023 • 56