Spaces:

lmms-lab
/

README

Running

App Files Files Community

README / README.md

Jingkang

Update README.md

621fdf3 verified 16 days ago

preview code

raw

history blame

4.93 kB

	---
	title: README
	emoji: ⚡
	colorFrom: blue
	colorTo: red
	sdk: static
	pinned: true
	---

	- [2025-3] 👓👓 Introducing `EgoLife`: Towards Egocentric Life Assistant. For one week, six individuals lived together, capturing every moment through AI glasses, and creating the EgoLife dataset. Based on this we build models and benchmarks to drive the future of AI life assistants that capable of recalling past events, tracking habits, and providing personalized, long-context assistance to enhance daily life.

	[Homepage](https://egolife-ai.github.io) \| [Github](https://github.com/EvolvingLMMs-Lab/EgoLife) \| [Blog](https://egolife-ai.github.io/blog) \| [Paper](https://huggingface.co/papers/2503.03803) \| [Demo](https://egolife.lmms-lab.com/)

	- [2025-1] 🎬🎬 Introducing `VideoMMMU`: Evaluating Knowledge Acquisition from Professional Videos. Spanning 6 professional disciplines (Art, Business, Science, Medicine, Humanities, Engineering) and 30 diverse subjects, Video-MMMU challenges models to learn and apply college-level knowledge from videos.

	[Homepage](https://videommmu.github.io) \| [Github](https://github.com/videommmu/VideoMMMU) \| [Paper](https://arxiv.org/abs/2501.13826)

	- [2024-11] 🔔🔔 We are excited to introduce LMMs-Eval/v0.3.0, focusing on audio understanding. Building upon LMMs-Eval/v0.2.0, we have added audio models and tasks. Now, LMMs-Eval provides a consistent evaluation toolkit across image, video, and audio modalities.

	[GitHub](https://github.com/EvolvingLMMs-Lab/lmms-eval) \| [Documentation](https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/docs/lmms-eval-0.3.md)

	- [2024-11] 🤯🤯 We introduce Multimodal SAE, the first framework designed to interpret learned features in large-scale multimodal models using Sparse Autoencoders. Through our approach, we leverage LLaVA-OneVision-72B to analyze and explain the SAE-derived features of LLaVA-NeXT-LLaMA3-8B. Furthermore, we demonstrate the ability to steer model behavior by clamping specific features to alleviate hallucinations and avoid safety-related issues.

	[GitHub](https://github.com/EvolvingLMMs-Lab/multimodal-sae) \| [Paper](https://arxiv.org/abs/2411.14982)

	- [2024-10] 🔥🔥 We present `LLaVA-Critic`, the first open-source large multimodal model as a generalist evaluator for assessing LMM-generated responses across diverse multimodal tasks and scenarios.

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://llava-vl.github.io/blog/2024-10-03-llava-critic/)

	- [2024-10] 🎬🎬 Introducing `LLaVA-Video`, a family of open large multimodal models designed specifically for advanced video understanding. We're open-sourcing LLaVA-Video-178K, a high-quality, synthetic dataset for video instruction tuning.

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://github.com/LLaVA-VL/LLaVA-NeXT)

	- [2024-08] 🤞🤞 We present `LLaVA-OneVision`, a family of LMMs developed by consolidating insights into data, models, and visual representations.

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://llava-vl.github.io/blog/2024-08-05-llava-onevision/)

	- [2024-06] 🧑‍🎨🧑‍🎨 We release `LLaVA-NeXT-Interleave`, an LMM extending capabilities to real-world settings: Multi-image, Multi-frame (videos), Multi-view (3D), and Multi-patch (single-image).

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://llava-vl.github.io/blog/2024-06-16-llava-next-interleave/)

	- [2024-06] 🚀🚀 We release `LongVA`, a long language model with state-of-the-art video understanding performance.

	[GitHub](https://github.com/EvolvingLMMs-Lab/LongVA) \| [Blog](https://lmms-lab.github.io/posts/longva/)

	<details>
	<summary>Older Updates (2024-06 and earlier)</summary>

	- [2024-06] 🎬🎬 The `lmms-eval/v0.2` toolkit now supports video evaluations for models like LLaVA-NeXT Video and Gemini 1.5 Pro.

	[GitHub](https://github.com/EvolvingLMMs-Lab/lmms-eval) \| [Blog](https://lmms-lab.github.io/posts/lmms-eval-0.2/)

	- [2024-05] 🚀🚀 We release `LLaVA-NeXT Video`, a model performing at Google's Gemini level on video understanding tasks.

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://llava-vl.github.io/blog/2024-04-30-llava-next-video/)

	- [2024-05] 🚀🚀 The `LLaVA-NeXT` model family reaches near GPT-4V performance on multimodal benchmarks, with models up to 110B parameters.

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://llava-vl.github.io/blog/2024-05-10-llava-next-stronger-llms/)

	- [2024-03] We release `lmms-eval`, a toolkit for holistic evaluations with 50+ multimodal datasets and 10+ models.

	[GitHub](https://github.com/EvolvingLMMs-Lab/lmms-eval) \| [Blog](https://lmms-lab.github.io/posts/lmms-eval-0.1/)
	</details>

	---
	title: README
	emoji: ⚡
	colorFrom: blue
	colorTo: red
	sdk: static
	pinned: true
	---

	- [2025-3] 👓👓 Introducing `EgoLife`: Towards Egocentric Life Assistant. For one week, six individuals lived together, capturing every moment through AI glasses, and creating the EgoLife dataset. Based on this we build models and benchmarks to drive the future of AI life assistants that capable of recalling past events, tracking habits, and providing personalized, long-context assistance to enhance daily life.

	[Homepage](https://egolife-ai.github.io) \| [Github](https://github.com/EvolvingLMMs-Lab/EgoLife) \| [Blog](https://egolife-ai.github.io/blog) \| [Paper](https://huggingface.co/papers/2503.03803) \| [Demo](https://egolife.lmms-lab.com/)

	- [2025-1] 🎬🎬 Introducing `VideoMMMU`: Evaluating Knowledge Acquisition from Professional Videos. Spanning 6 professional disciplines (Art, Business, Science, Medicine, Humanities, Engineering) and 30 diverse subjects, Video-MMMU challenges models to learn and apply college-level knowledge from videos.

	[Homepage](https://videommmu.github.io) \| [Github](https://github.com/videommmu/VideoMMMU) \| [Paper](https://arxiv.org/abs/2501.13826)

	- [2024-11] 🔔🔔 We are excited to introduce LMMs-Eval/v0.3.0, focusing on audio understanding. Building upon LMMs-Eval/v0.2.0, we have added audio models and tasks. Now, LMMs-Eval provides a consistent evaluation toolkit across image, video, and audio modalities.

	[GitHub](https://github.com/EvolvingLMMs-Lab/lmms-eval) \| [Documentation](https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/docs/lmms-eval-0.3.md)

	- [2024-11] 🤯🤯 We introduce Multimodal SAE, the first framework designed to interpret learned features in large-scale multimodal models using Sparse Autoencoders. Through our approach, we leverage LLaVA-OneVision-72B to analyze and explain the SAE-derived features of LLaVA-NeXT-LLaMA3-8B. Furthermore, we demonstrate the ability to steer model behavior by clamping specific features to alleviate hallucinations and avoid safety-related issues.

	[GitHub](https://github.com/EvolvingLMMs-Lab/multimodal-sae) \| [Paper](https://arxiv.org/abs/2411.14982)

	- [2024-10] 🔥🔥 We present `LLaVA-Critic`, the first open-source large multimodal model as a generalist evaluator for assessing LMM-generated responses across diverse multimodal tasks and scenarios.

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://llava-vl.github.io/blog/2024-10-03-llava-critic/)

	- [2024-10] 🎬🎬 Introducing `LLaVA-Video`, a family of open large multimodal models designed specifically for advanced video understanding. We're open-sourcing LLaVA-Video-178K, a high-quality, synthetic dataset for video instruction tuning.

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://github.com/LLaVA-VL/LLaVA-NeXT)

	- [2024-08] 🤞🤞 We present `LLaVA-OneVision`, a family of LMMs developed by consolidating insights into data, models, and visual representations.

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://llava-vl.github.io/blog/2024-08-05-llava-onevision/)

	- [2024-06] 🧑‍🎨🧑‍🎨 We release `LLaVA-NeXT-Interleave`, an LMM extending capabilities to real-world settings: Multi-image, Multi-frame (videos), Multi-view (3D), and Multi-patch (single-image).

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://llava-vl.github.io/blog/2024-06-16-llava-next-interleave/)

	- [2024-06] 🚀🚀 We release `LongVA`, a long language model with state-of-the-art video understanding performance.

	[GitHub](https://github.com/EvolvingLMMs-Lab/LongVA) \| [Blog](https://lmms-lab.github.io/posts/longva/)

	<details>
	<summary>Older Updates (2024-06 and earlier)</summary>

	- [2024-06] 🎬🎬 The `lmms-eval/v0.2` toolkit now supports video evaluations for models like LLaVA-NeXT Video and Gemini 1.5 Pro.

	[GitHub](https://github.com/EvolvingLMMs-Lab/lmms-eval) \| [Blog](https://lmms-lab.github.io/posts/lmms-eval-0.2/)

	- [2024-05] 🚀🚀 We release `LLaVA-NeXT Video`, a model performing at Google's Gemini level on video understanding tasks.

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://llava-vl.github.io/blog/2024-04-30-llava-next-video/)

	- [2024-05] 🚀🚀 The `LLaVA-NeXT` model family reaches near GPT-4V performance on multimodal benchmarks, with models up to 110B parameters.

	[GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT) \| [Blog](https://llava-vl.github.io/blog/2024-05-10-llava-next-stronger-llms/)

	- [2024-03] We release `lmms-eval`, a toolkit for holistic evaluations with 50+ multimodal datasets and 10+ models.

	[GitHub](https://github.com/EvolvingLMMs-Lab/lmms-eval) \| [Blog](https://lmms-lab.github.io/posts/lmms-eval-0.1/)
	</details>