OpenGVLab

community

https://github.com/opengvlab

opengvlab

OpenGVLab

Activity Feed Request to join this org

AI & ML interests

Computer Vision

Recent Activity

huiserwang updated a dataset 5 days ago

OpenGVLab/MMBench-GUI

wjn922 authored a paper 12 days ago

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

wjn922 authored a paper 12 days ago

Language as Queries for Referring Video Object Segmentation

View all activity

Organization Card

Community About org cards

OpenGVLab

Welcome to OpenGVLab! We are a research group from Shanghai AI Lab focused on Vision-Centric AI research. The GV in our name, OpenGVLab, means general vision, a general understanding of vision, so little effort is needed to adapt to new vision-based tasks.

Models

InternVL: a pioneering open-source alternative to GPT-4V.
InternImage: a large-scale vision foundation models with deformable convolutions.
InternVideo: large-scale video foundation models for multimodal understanding.
VideoChat: an end-to-end chat assistant for video comprehension.
All-Seeing-Project: towards panoptic visual recognition and understanding of the open world.

Datasets

ShareGPT4o: a groundbreaking large-scale resource that we plan to open-source with 200K meticulously annotated images, 10K videos with highly descriptive captions, and 10K audio files with detailed descriptions.
InternVid: a large-scale video-text dataset for multimodal understanding and generation.
MMPR: a high-quality, large-scale multimodal preference dataset.

Benchmarks

MVBench: a comprehensive benchmark for multimodal video understanding.
CRPE: a benchmark covering all elements of the relation triplets (subject, predicate, object), providing a systematic platform for the evaluation of relation comprehension ability.
MM-NIAH: a comprehensive benchmark for long multimodal documents comprehension.
GMAI-MMBench: a comprehensive multimodal evaluation benchmark towards general medical AI.

Collections 26

View 26 collections

spaces 12

InternVideo2.5

Hierarchical Compression for Long-Context Video Modeling

InternVL

Chat with an AI that understands text and images

MVBench Leaderboard

Submit model evaluation and view leaderboard

InternVideo2 Chat 8B HD

Upload a video to chat about its contents

ControlLLM

Display maintenance message for ControlLLM

models 224

OpenGVLab/InternVideo2_5_Chat_8B

Video-Text-to-Text • 8B • Updated 16 days ago • 31.4k • 78

OpenGVLab/OpenCUA_Env

Updated 20 days ago • 2

OpenGVLab/InternVideo2-Stage2_6B-224p-f4

Updated 21 days ago • 6

OpenGVLab/Mono-InternVL-2B

Image-Text-to-Text • 3B • Updated 29 days ago • 9.98k • 36

OpenGVLab/Mono-InternVL-2B-S1-3

Image-Text-to-Text • 3B • Updated 29 days ago • 85 • 1

OpenGVLab/Mono-InternVL-2B-S1-2

Image-Text-to-Text • 3B • Updated 29 days ago • 12

OpenGVLab/Mono-InternVL-2B-S1-1

Image-Text-to-Text • 3B • Updated 29 days ago • 4

OpenGVLab/Docopilot-8B

Image-Text-to-Text • 8B • Updated Jul 20 • 31 • 3

OpenGVLab/Docopilot-2B

Image-Text-to-Text • 2B • Updated Jul 20 • 67 • 8

OpenGVLab/ZeroGUI-OSWorld-7B

Image-Text-to-Text • 8B • Updated Jun 20 • 18 • 4

View 224 models

datasets 45

OpenGVLab/MMBench-GUI

Preview • Updated 5 days ago • 229 • 35

OpenGVLab/VRBench

Preview • Updated 16 days ago • 61 • 3

OpenGVLab/GUI-Odyssey

Viewer • Updated 16 days ago • 7.74k • 9.5k • 25

OpenGVLab/LORIS

Updated 23 days ago • 160 • 3

OpenGVLab/OpenCUA_Env

Updated 26 days ago • 23

OpenGVLab/Doc-750K

Preview • Updated 29 days ago • 6.05k • 12

OpenGVLab/Mono-InternVL-2B-Synthetic-Data

Viewer • Updated 29 days ago • 3.05k • 83 • 2

OpenGVLab/VideoChat-Flash-Training-Data

Viewer • Updated Jun 24 • 87k • 9.44k • 12

OpenGVLab/VisualPRM400K-v1.1

Preview • Updated May 29 • 16.6k • 7

OpenGVLab/MMPR-v1.2-prompts

Updated May 29 • 8.88k • 1

View 45 datasets