matlok
's Collections
Papers - Video
updated
Video as the New Language for Real-World Decision Making
Paper
•
2402.17139
•
Published
•
18
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
Paper
•
2310.19512
•
Published
•
15
VideoMamba: State Space Model for Efficient Video Understanding
Paper
•
2403.06977
•
Published
•
27
VideoCrafter2: Overcoming Data Limitations for High-Quality Video
Diffusion Models
Paper
•
2401.09047
•
Published
•
13
V3D: Video Diffusion Models are Effective 3D Generators
Paper
•
2403.06738
•
Published
•
28
DragAnything: Motion Control for Anything using Entity Representation
Paper
•
2403.07420
•
Published
•
13
BLIP: Bootstrapping Language-Image Pre-training for Unified
Vision-Language Understanding and Generation
Paper
•
2201.12086
•
Published
•
3
Video Editing via Factorized Diffusion Distillation
Paper
•
2403.09334
•
Published
•
21
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision
Understanding
Paper
•
2403.09530
•
Published
•
8
3D-VLA: A 3D Vision-Language-Action Generative World Model
Paper
•
2403.09631
•
Published
•
7
Generic 3D Diffusion Adapter Using Controlled Multi-View Editing
Paper
•
2403.12032
•
Published
•
14
Vid2Robot: End-to-end Video-conditioned Policy Learning with
Cross-Attention Transformers
Paper
•
2403.12943
•
Published
•
14
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
Paper
•
2403.12962
•
Published
•
7
Efficient Video Diffusion Models via Content-Frame Motion-Latent
Decomposition
Paper
•
2403.14148
•
Published
•
18
VidToMe: Video Token Merging for Zero-Shot Video Editing
Paper
•
2312.10656
•
Published
•
10
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation
from Text
Paper
•
2403.14773
•
Published
•
10
TC4D: Trajectory-Conditioned Text-to-4D Generation
Paper
•
2403.17920
•
Published
•
16
Improving Automatic VQA Evaluation Using Large Language Models
Paper
•
2310.02567
•
Published
•
3
Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D
Gaussians
Paper
•
2403.17898
•
Published
•
14
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper
•
2401.12945
•
Published
•
86
Garment3DGen: 3D Garment Stylization and Texture Generation
Paper
•
2403.18816
•
Published
•
21
Zero-shot Prompt-based Video Encoder for Surgical Gesture Recognition
Paper
•
2403.19786
•
Published
•
2
CameraCtrl: Enabling Camera Control for Text-to-Video Generation
Paper
•
2404.02101
•
Published
•
22
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
Prediction
Paper
•
2404.02905
•
Published
•
65
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with
Interleaved Visual-Textual Tokens
Paper
•
2404.03413
•
Published
•
25
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse
Controls to Any Diffusion Model
Paper
•
2404.09967
•
Published
•
20
Dynamic Typography: Bringing Words to Life
Paper
•
2404.11614
•
Published
•
44
Pegasus-v1 Technical Report
Paper
•
2404.14687
•
Published
•
30
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper
•
2404.14507
•
Published
•
21
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video
Dense Captioning
Paper
•
2404.16994
•
Published
•
35
Capabilities of Gemini Models in Medicine
Paper
•
2404.18416
•
Published
•
23
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video
Generation
Paper
•
2405.01434
•
Published
•
53
Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering
for HDR View Synthesis
Paper
•
2406.06216
•
Published
•
19
EvTexture: Event-driven Texture Enhancement for Video Super-Resolution
Paper
•
2406.13457
•
Published
•
16
What Matters in Detecting AI-Generated Videos like Sora?
Paper
•
2406.19568
•
Published
•
13
Movie Gen: A Cast of Media Foundation Models
Paper
•
2410.13720
•
Published
•
91
Adaptive Caching for Faster Video Generation with Diffusion Transformers
Paper
•
2411.02397
•
Published
•
23
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
Paper
•
2411.18613
•
Published
•
50
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper
•
2412.10360
•
Published
•
136