File size: 1,248 Bytes
d49014b 3a37b29 d49014b ce40284 0858cb3 ce40284 3a37b29 ce40284 3a37b29 ce40284 3a37b29 ce40284 3a37b29 ce40284 3a37b29 ce40284 3a37b29 ce40284 3a37b29 ce40284 3a37b29 ce40284 3a37b29 ce40284 3a37b29 ce40284 3a37b29 ce40284 3a37b29 ce40284 3a37b29 ce40284 0858cb3 ce40284 3a37b29 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
---
license: mit
---
[![CODE](https://img.shields.io/badge/GitHub-Repository-<COLOR>)](https://github.com/mbzuai-oryx/LLaVA-pp)
# Phi-3-V: Extending the Visual Capabilities of LLaVA with Phi-3
## Repository Overview
This repository features LLaVA v1.5 trained with the Phi-3-mini-3.8B LLM. This integration aims to leverage the strengths of both models to offer advanced vision-language understanding.
## Key Components
- **Base Large Language Model (LLM):** [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
- **Base Large Multimodal Model (LMM):** [LLaVA-v1.5](https://github.com/haotian-liu/LLaVA)
## Training Data
- **Pretraining Dataset:** [LCS-558K](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain)
- **Fine-tuning Dataset:** [LLaVA-Instruct-665K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_v1_5_mix665k.json)
## Download It As
```
git lfs install
git clone https://huggingface.co/MBZUAI/LLaVA-Phi-3-mini-4k-instruct-lora
```
---
## License
This project is available under the MIT License.
## Contributions
Contributions are welcome! Please 🌟 our repository [LLaVA++](https://github.com/mbzuai-oryx/LLaVA-pp) if you find this model useful.
---
|