PKU-Alignment
/

llama3.1-8b-instruct-vision

Model card Files Files and versions Community

llama3.1-8b-instruct-vision / README.md

XuyaoWang's picture

Update README.md

ce6e5cc verified 6 months ago

|

history blame contribute delete

2.57 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- meta-llama/Meta-Llama-3.1-8B-Instruct
	---
	# 🦙 Llama3.1-8b-instruct-vision Model Card

	## Model Details

	This repository contains a reproduced version of the [LLaVA](https://github.com/haotian-liu/LLaVA) model from the [Llama 3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) foundation model using the [PKU-Alignment/align-anything](https://github.com/PKU-Alignment/align-anything) library.

	> NOTE: The reproduced version of LLaVA has some different implementation details than the original [LLaVA](https://github.com/haotian-liu/LLaVA) model.
	>
	> 1. The reproduced LLaVA uses a different conversation template than the original [LLaVA](https://github.com/haotian-liu/LLaVA) model.
	> 2. The initial model weights are loaded from Llama 3.1 8B Instruct model ([meta-llama/Llama 3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)) rather than [lmsys/vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5).

	- Developed by: the [PKU-Alignment](https://github.com/PKU-Alignment) Team.
	- Model Type: An auto-regressive language model based on the transformer architecture.
	- License: Non-commercial license.
	- Fine-tuned from model: [meta-llama/Llama 3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).

	## Model Sources

	- Repository: <https://github.com/PKU-Alignment/align-anything>
	- Dataset:
	- <https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K>
	- <https://huggingface.co/datasets/OpenGVLab/ShareGPT-4o>
	- <https://huggingface.co/datasets/HuggingFaceM4/A-OKVQA>
	- <https://huggingface.co/datasets/Multimodal-Fatima/OK-VQA_train>
	- <https://huggingface.co/datasets/howard-hou/OCR-VQA>
	- <https://huggingface.co/datasets/HuggingFaceM4/VQAv2>

	## How to use model (reprod.)

	- Using transformers

	```python
	from transformers import (
	LlavaForConditionalGeneration,
	AutoProcessor,
	)
	from PIL import Image

	path = <path_to_model_dir>
	processor = AutoProcessor.from_pretrained(path)
	model = LlavaForConditionalGeneration.from_pretrained(path)

	prompt = "<\|start_header_id\|>user<\|end_header_id\|>: <image> Give an overview of what's in the image.\n<\|start_header_id\|>assistant<\|end_header_id\|>: "
	image_path = "align-anything/assets/test_image.webp"
	image = Image.open(image_path)

	inputs = processor(text=prompt, images=image, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=1024)
	print(processor.decode(outputs[0], skip_special_tokens=True))
	```