VLM 👁️👁️ - a thehandsomefrog4825 Collection

thehandsomefrog4825 's Collections

Object detection 🔍

VLM 👁️👁️

Object segmentation 🧩

Reinforce learning 🔃

GAN

Robotic 🤖🔧

TTI ⌨️➡️🖼️

TTS ⌨️➡️🗣️

TTV 📝➡️📺

Generative 🎨

VLM 👁️👁️

updated 13 days ago

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 99
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Paper • 2412.19723 • Published Dec 27, 2024 • 82
VisionZip: Longer is Better but Not Necessary in Vision Language Models

Paper • 2412.04467 • Published Dec 5, 2024 • 107
PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published Dec 4, 2024 • 128
ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published Nov 26, 2024 • 80
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

Paper • 2501.04003 • Published Jan 7 • 25
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published Dec 31, 2024 • 41
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Paper • 2501.06186 • Published Jan 10 • 61
DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests

Paper • 2501.04671 • Published Jan 8
The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering

Paper • 2502.03628 • Published 18 days ago • 11