VLMS - a Deping Collection

Deping 's Collections

VisionExpertModels

LLMs

VLMS

GeneralDetector

VLMS

updated Sep 22

PsiPi/liuhaotian_llava-v1.5-13b-GGUF

Image-Text-to-Text • Updated Mar 11 • 992 • 36
TRI-ML/prismatic-vlms

Image-to-Text • Updated May 6 • 15
bczhou/tiny-llava-v1-hf

Image-Text-to-Text • Updated Aug 17 • 1.91k • 55
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling

Paper • 2402.06118 • Published Feb 9 • 13
LEGO:Language Enhanced Multi-modal Grounding Model

Paper • 2401.06071 • Published Jan 11 • 10
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Paper • 2403.18814 • Published Mar 27 • 45
Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models

Paper • 2403.16999 • Published Mar 25 • 4
Salesforce/instructblip-vicuna-7b

Image-Text-to-Text • Updated Nov 21 • 249k • 85
Pegasus-v1 Technical Report

Paper • 2404.14687 • Published Apr 23 • 30
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Paper • 2404.16375 • Published Apr 25 • 16
Needle In A Multimodal Haystack

Paper • 2406.07230 • Published Jun 11 • 53