Multimodal - a markredito Collection

markredito 's Collections

Image Generation

LLMs

Audio

Interpretability

Music Generation

3D

Multimodal

updated Sep 7

Compositional Foundation Models for Hierarchical Planning

Paper • 2309.08587 • Published Sep 15, 2023 • 9
DreamLLM: Synergistic Multimodal Comprehension and Creation

Paper • 2309.11499 • Published Sep 20, 2023 • 58
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning

Paper • 2309.15091 • Published Sep 26, 2023 • 32
Context-Aware Meta-Learning

Paper • 2310.10971 • Published Oct 17, 2023 • 16
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

Paper • 2310.11441 • Published Oct 17, 2023 • 26
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning

Paper • 2310.09478 • Published Oct 14, 2023 • 19
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1 • 44
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22 • 124