Spaces:

lmms-lab
/

README

Running

App Files Files Community

README / README.md

luodian

Update README.md

46b2641 verified about 1 month ago

preview code

raw

history blame contribute delete

3.81 kB

	---
	title: README
	emoji: ⚡
	colorFrom: blue
	colorTo: red
	sdk: static
	pinned: false
	---

	- [2024-11] 🔔🔔 We are excited to introduce LMMs-Eval/v0.3.0, focusing on audio understanding. Building upon LMMs-Eval/v0.2.0, we have added audio models and tasks. Now, LMMs-Eval provides a consistent evaluation toolkit across image, video, and audio modalities.

	[GitHub](https://github.com/EvolvingLMMs-Lab/lmms-eval) \| [Documentation](https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/docs/lmms-eval-0.3.md)

	- [2024-11] 🤯🤯 We introduce Multimodal SAE, the first framework designed to interpret learned features in large-scale multimodal models using Sparse Autoencoders. Through our approach, we leverage LLaVA-OneVision-72B to analyze and explain the SAE-derived features of LLaVA-NeXT-LLaMA3-8B. Furthermore, we demonstrate the ability to steer model behavior by clamping specific features to alleviate hallucinations and avoid safety-related issues.

	[GitHub](https://github.com/EvolvingLMMs-Lab/multimodal-sae) \| [Paper](https://arxiv.org/abs/2411.14982)

	- [2024-10] 🔥🔥 We present `LLaVA-Critic`, the first open-source large multimodal model as a generalist evaluator for assessing LMM-generated responses across diverse multimodal tasks and scenarios.

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://llava-vl.github.io/blog/2024-10-03-llava-critic/)

	- [2024-10] 🎬🎬 Introducing `LLaVA-Video`, a family of open large multimodal models designed specifically for advanced video understanding. We're open-sourcing LLaVA-Video-178K, a high-quality, synthetic dataset for video instruction tuning.

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://github.com/LLaVA-VL/LLaVA-NeXT)

	- [2024-08] 🤞🤞 We present `LLaVA-OneVision`, a family of LMMs developed by consolidating insights into data, models, and visual representations.

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://llava-vl.github.io/blog/2024-08-05-llava-onevision/)

	- [2024-06] 🧑‍🎨🧑‍🎨 We release `LLaVA-NeXT-Interleave`, an LMM extending capabilities to real-world settings: Multi-image, Multi-frame (videos), Multi-view (3D), and Multi-patch (single-image).

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://llava-vl.github.io/blog/2024-06-16-llava-next-interleave/)

	- [2024-06] 🚀🚀 We release `LongVA`, a long language model with state-of-the-art video understanding performance.

	[GitHub](https://github.com/EvolvingLMMs-Lab/LongVA) \| [Blog](https://lmms-lab.github.io/posts/longva/)

	<details>
	<summary>Older Updates (2024-06 and earlier)</summary>

	- [2024-06] 🎬🎬 The `lmms-eval/v0.2` toolkit now supports video evaluations for models like LLaVA-NeXT Video and Gemini 1.5 Pro.

	[GitHub](https://github.com/EvolvingLMMs-Lab/lmms-eval) \| [Blog](https://lmms-lab.github.io/posts/lmms-eval-0.2/)

	- [2024-05] 🚀🚀 We release `LLaVA-NeXT Video`, a model performing at Google's Gemini level on video understanding tasks.

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://llava-vl.github.io/blog/2024-04-30-llava-next-video/)

	- [2024-05] 🚀🚀 The `LLaVA-NeXT` model family reaches near GPT-4V performance on multimodal benchmarks, with models up to 110B parameters.

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://llava-vl.github.io/blog/2024-05-10-llava-next-stronger-llms/)

	- [2024-03] We release `lmms-eval`, a toolkit for holistic evaluations with 50+ multimodal datasets and 10+ models.

	[GitHub](https://github.com/EvolvingLMMs-Lab/lmms-eval) \| [Blog](https://lmms-lab.github.io/posts/lmms-eval-0.1/)
	</details>

	---
	title: README
	emoji: ⚡
	colorFrom: blue
	colorTo: red
	sdk: static
	pinned: false
	---

	- [2024-11] 🔔🔔 We are excited to introduce LMMs-Eval/v0.3.0, focusing on audio understanding. Building upon LMMs-Eval/v0.2.0, we have added audio models and tasks. Now, LMMs-Eval provides a consistent evaluation toolkit across image, video, and audio modalities.

	[GitHub](https://github.com/EvolvingLMMs-Lab/lmms-eval) \| [Documentation](https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/docs/lmms-eval-0.3.md)

	- [2024-11] 🤯🤯 We introduce Multimodal SAE, the first framework designed to interpret learned features in large-scale multimodal models using Sparse Autoencoders. Through our approach, we leverage LLaVA-OneVision-72B to analyze and explain the SAE-derived features of LLaVA-NeXT-LLaMA3-8B. Furthermore, we demonstrate the ability to steer model behavior by clamping specific features to alleviate hallucinations and avoid safety-related issues.

	[GitHub](https://github.com/EvolvingLMMs-Lab/multimodal-sae) \| [Paper](https://arxiv.org/abs/2411.14982)

	- [2024-10] 🔥🔥 We present `LLaVA-Critic`, the first open-source large multimodal model as a generalist evaluator for assessing LMM-generated responses across diverse multimodal tasks and scenarios.

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://llava-vl.github.io/blog/2024-10-03-llava-critic/)

	- [2024-10] 🎬🎬 Introducing `LLaVA-Video`, a family of open large multimodal models designed specifically for advanced video understanding. We're open-sourcing LLaVA-Video-178K, a high-quality, synthetic dataset for video instruction tuning.

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://github.com/LLaVA-VL/LLaVA-NeXT)

	- [2024-08] 🤞🤞 We present `LLaVA-OneVision`, a family of LMMs developed by consolidating insights into data, models, and visual representations.

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://llava-vl.github.io/blog/2024-08-05-llava-onevision/)

	- [2024-06] 🧑‍🎨🧑‍🎨 We release `LLaVA-NeXT-Interleave`, an LMM extending capabilities to real-world settings: Multi-image, Multi-frame (videos), Multi-view (3D), and Multi-patch (single-image).

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://llava-vl.github.io/blog/2024-06-16-llava-next-interleave/)

	- [2024-06] 🚀🚀 We release `LongVA`, a long language model with state-of-the-art video understanding performance.

	[GitHub](https://github.com/EvolvingLMMs-Lab/LongVA) \| [Blog](https://lmms-lab.github.io/posts/longva/)

	<details>
	<summary>Older Updates (2024-06 and earlier)</summary>

	- [2024-06] 🎬🎬 The `lmms-eval/v0.2` toolkit now supports video evaluations for models like LLaVA-NeXT Video and Gemini 1.5 Pro.

	[GitHub](https://github.com/EvolvingLMMs-Lab/lmms-eval) \| [Blog](https://lmms-lab.github.io/posts/lmms-eval-0.2/)

	- [2024-05] 🚀🚀 We release `LLaVA-NeXT Video`, a model performing at Google's Gemini level on video understanding tasks.

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://llava-vl.github.io/blog/2024-04-30-llava-next-video/)

	- [2024-05] 🚀🚀 The `LLaVA-NeXT` model family reaches near GPT-4V performance on multimodal benchmarks, with models up to 110B parameters.

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://llava-vl.github.io/blog/2024-05-10-llava-next-stronger-llms/)

	- [2024-03] We release `lmms-eval`, a toolkit for holistic evaluations with 50+ multimodal datasets and 10+ models.

	[GitHub](https://github.com/EvolvingLMMs-Lab/lmms-eval) \| [Blog](https://lmms-lab.github.io/posts/lmms-eval-0.1/)
	</details>