MBZUAI
/

VideoGPT-plus_LLaMA3-8B-8k

Model card Files Files and versions Community

VideoGPT-plus_LLaMA3-8B-8k / mlp2x_gelu_clip_l14_336px /README.md

mmaaz60's picture

Upload folder using huggingface_hub

7846d8a verified 7 months ago

|

1.24 kB

	---
	{}
	---

	[![CODE](https://img.shields.io/badge/GitHub-Repository-<COLOR>)](https://github.com/mbzuai-oryx/LLaVA-pp)

	# LLaMA-3-V: Extending the Visual Capabilities of LLaVA with Meta-Llama-3-8B-Instruct

	## Repository Overview

	This repository features LLaVA v1.5 trained with the Meta-Llama-3-8B-Instruct LLM. This integration aims to leverage the strengths of both models to offer advanced vision-language understanding.

	## Training Strategy
	- Only Vision-to-Language projector is trained. The rest of the model is frozen.
	- Note: The repository contains only the projector weights.

	## Key Components

	- Base Large Language Model (LLM): [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
	- Base Large Multimodal Model (LMM): [LLaVA-v1.5](https://github.com/haotian-liu/LLaVA)

	## Training Data

	- Pretraining Dataset: [LCS-558K](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain)

	## Download It As

	```
	git lfs install
	git clone https://huggingface.co/MBZUAI/LLaVA-Meta-Llama-3-8B-Instruct-pretrain
	```

	---


	## Contributions

	Contributions are welcome! Please 🌟 our repository [LLaVA++](https://github.com/mbzuai-oryx/LLaVA-pp) if you find this model useful.

	---