Spaces:

lmms-lab
/

README

Running

File size: 3,369 Bytes

bdff7d6
 
 
 
 
 
 
 
cae3b05
3b1a8e9
9d6082d
4671ed0
9d6082d
cae3b05
8baea48
 
 
cae3b05
5064629
 
cae3b05
 
 
40e4cf7
cae3b05
 
40e4cf7
d58207a
 
cae3b05
 
05e11c3
 
132ba68
 
 
cae3b05
 
05e11c3
 
cae3b05
 
05e11c3
 
cae3b05
05e11c3
 
 
cae3b05
05e11c3
cae3b05

---
title: README
emoji: ⚡
colorFrom: blue
colorTo: red
sdk: static
pinned: false
---

- **[2024-11]** 🤯🤯 We introduce **Multimodal SAE**, the first framework designed to interpret learned features in large-scale multimodal models using Sparse Autoencoders. Through our approach, we leverage LLaVA-OneVision-72B to analyze and explain the SAE-derived features of LLaVA-NeXT-LLaMA3-8B. Furthermore, we demonstrate the ability to steer model behavior by clamping specific features to alleviate hallucinations and avoid safety-related issues.
    
    [GitHub](https://github.com/EvolvingLMMs-Lab/multimodal-sae) | [Paper](https://arxiv.org/abs/2411.14982)

- **[2024-10]** 🔥🔥 We present **`LLaVA-Critic`**, the first open-source large multimodal model as a generalist evaluator for assessing LMM-generated responses across diverse multimodal tasks and scenarios.
    
    [GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) | [Blog](https://llava-vl.github.io/blog/2024-10-03-llava-critic/)
  
- **[2024-10]** 🎬🎬 Introducing **`LLaVA-Video`**, a family of open large multimodal models designed specifically for advanced video understanding. We're open-sourcing **LLaVA-Video-178K**, a high-quality, synthetic dataset for video instruction tuning.
    
    [GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) | [Blog](https://github.com/LLaVA-VL/LLaVA-NeXT)

- **[2024-08]** 🤞🤞 We present **`LLaVA-OneVision`**, a family of LMMs developed by consolidating insights into data, models, and visual representations.

    [GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) | [Blog](https://llava-vl.github.io/blog/2024-08-05-llava-onevision/)

- **[2024-06]** 🧑‍🎨🧑‍🎨 We release **`LLaVA-NeXT-Interleave`**, an LMM extending capabilities to real-world settings: Multi-image, Multi-frame (videos), Multi-view (3D), and Multi-patch (single-image).

    [GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) | [Blog](https://llava-vl.github.io/blog/2024-06-16-llava-next-interleave/)

- **[2024-06]** 🚀🚀 We release **`LongVA`**, a long language model with state-of-the-art video understanding performance.

    [GitHub](https://github.com/EvolvingLMMs-Lab/LongVA) | [Blog](https://lmms-lab.github.io/posts/longva/)

<details>
  <summary>Older Updates (2024-06 and earlier)</summary>

- **[2024-06]** 🎬🎬 The **`lmms-eval/v0.2`** toolkit now supports video evaluations for models like LLaVA-NeXT Video and Gemini 1.5 Pro.

    [GitHub](https://github.com/EvolvingLMMs-Lab/lmms-eval) | [Blog](https://lmms-lab.github.io/posts/lmms-eval-0.2/)

- **[2024-05]** 🚀🚀 We release **`LLaVA-NeXT Video`**, a model performing at Google's Gemini level on video understanding tasks.

    [GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) | [Blog](https://llava-vl.github.io/blog/2024-04-30-llava-next-video/)

- **[2024-05]** 🚀🚀 The **`LLaVA-NeXT`** model family reaches near GPT-4V performance on multimodal benchmarks, with models up to 110B parameters.

    [GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) | [Blog](https://llava-vl.github.io/blog/2024-05-10-llava-next-stronger-llms/)

- **[2024-03]** We release **`lmms-eval`**, a toolkit for holistic evaluations with 50+ multimodal datasets and 10+ models.

    [GitHub](https://github.com/EvolvingLMMs-Lab/lmms-eval) | [Blog](https://lmms-lab.github.io/posts/lmms-eval-0.1/)
</details>