MBZUAI
/

LLaVA-Phi-3-mini-4k-instruct-lora

Text Generation

Inference Endpoints

Model card Files Files and versions Community

LLaVA-Phi-3-mini-4k-instruct-lora / README.md

mmaaz60's picture

Update README.md

90e0948 verified 8 months ago

|

history blame contribute delete

1.52 kB

	---
	license: mit
	---

	[![CODE](https://img.shields.io/badge/GitHub-Repository-<COLOR>)](https://github.com/mbzuai-oryx/LLaVA-pp)

	# Phi-3-V: Extending the Visual Capabilities of LLaVA with Phi-3

	## Repository Overview

	This repository features LLaVA v1.5 trained with the Phi-3-mini-3.8B LLM. This integration aims to leverage the strengths of both models to offer advanced vision-language understanding.

	## Training Strategy
	- Pretraining: Only Vision-to-Language projector is trained. The rest of the model is frozen.
	- Fine-tuning: LLM is LoRA fine-tuned. Only the vision-backbone (CLIP) is kept frozen.
	- Note: The repository contains projector and LORA weights.

	## Key Components

	- Base Large Language Model (LLM): [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
	- Base Large Multimodal Model (LMM): [LLaVA-v1.5](https://github.com/haotian-liu/LLaVA)

	## Training Data

	- Pretraining Dataset: [LCS-558K](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain)
	- Fine-tuning Dataset: [LLaVA-Instruct-665K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_v1_5_mix665k.json)

	## Download It As

	```
	git lfs install
	git clone https://huggingface.co/MBZUAI/LLaVA-Phi-3-mini-4k-instruct-lora
	```

	---

	## License

	This project is available under the MIT License.

	## Contributions

	Contributions are welcome! Please 🌟 our repository [LLaVA++](https://github.com/mbzuai-oryx/LLaVA-pp) if you find this model useful.

	---