README / README.md
luodian's picture
Update README.md
40e4cf7 verified
|
raw
history blame
2.36 kB
metadata
title: README
emoji: 
colorFrom: blue
colorTo: red
sdk: static
pinned: false
  • [2024-08] 🤞🤞 We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed by consolidating our insights into data, models, and visual representations in the LLaVA-NeXT blog series.

    GitHub | Blog

  • [2024-06] 🧑‍🎨🧑‍🎨 We release the LLaVA-NeXT-Interleave, an all-around LMM that extends the model capabilities to new real-world settings: Multi-image, Multi-frame (videos), Multi-view (3D) and maintains the performance of the Multi-patch (single-image) scenarios.

    GitHub | Blog

  • [2024-06] 🚀🚀 We release the LongVA, a long language model with state-of-the-art performance on video understanding tasks.

    GitHub | Blog

  • [2024-06] 🎬🎬 The lmms-eval/v0.2 has been upgraded to support video evaluations for video models like LLaVA-NeXT Video and Gemini 1.5 Pro across tasks such as EgoSchema, PerceptionTest, VideoMME, and more.

    GitHub | Blog

  • [2024-05] 🚀🚀 We release the LLaVA-NeXT Video, a video model with state-of-the-art performance and reaching to Google's Gemini level performance on diverse video understanding tasks.

    GitHub | Blog

  • [2024-05] 🚀🚀 We release the LLaVA-NeXT with state-of-the-art and near GPT-4V performance at multiple multimodal benchmarks. LLaVA model family now reaches at 72B, and 110B parameters level.

    GitHub | Blog

  • [2024-03] We release the lmms-eval, a toolkit for holistic evaluations with 50+ multimodal datasets and 10+ models.

    GitHub | Blog