|
--- |
|
base_model: unsloth/Llama-3.2-11B-Vision-Instruct |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- unsloth |
|
- mllama |
|
license: apache-2.0 |
|
language: |
|
- en |
|
model-index: |
|
- name: DocumentCogito |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: IFEval (0-Shot) |
|
type: wis-k/instruction-following-eval |
|
split: train |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: inst_level_strict_acc and prompt_level_strict_acc |
|
value: 50.64 |
|
name: averaged accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FDocumentCogito |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: BBH (3-Shot) |
|
type: SaylorTwift/bbh |
|
split: test |
|
args: |
|
num_few_shot: 3 |
|
metrics: |
|
- type: acc_norm |
|
value: 29.79 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FDocumentCogito |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MATH Lvl 5 (4-Shot) |
|
type: lighteval/MATH-Hard |
|
split: test |
|
args: |
|
num_few_shot: 4 |
|
metrics: |
|
- type: exact_match |
|
value: 16.24 |
|
name: exact match |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FDocumentCogito |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GPQA (0-shot) |
|
type: Idavidrein/gpqa |
|
split: train |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 8.84 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FDocumentCogito |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MuSR (0-shot) |
|
type: TAUR-Lab/MuSR |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 8.6 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FDocumentCogito |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU-PRO (5-shot) |
|
type: TIGER-Lab/MMLU-Pro |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 31.14 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FDocumentCogito |
|
name: Open LLM Leaderboard |
|
--- |
|
|
|
# **unsloth/Llama-3.2-11B-Vision-Instruct (Fine-Tuned)** |
|
|
|
## **Model Overview** |
|
This model, fine-tuned from the `unsloth/Llama-3.2-11B-Vision-Instruct` base, is optimized for vision-language tasks with enhanced instruction-following capabilities. Fine-tuning was completed 2x faster using the [Unsloth](https://github.com/unslothai/unsloth) framework combined with Hugging Face's TRL library, ensuring efficient training while maintaining high performance. |
|
|
|
## **Key Information** |
|
- **Developed by:** Daemontatox |
|
- **Base Model:** `unsloth/Llama-3.2-11B-Vision-Instruct` |
|
- **License:** Apache-2.0 |
|
- **Language:** English (`en`) |
|
- **Frameworks Used:** Hugging Face Transformers, Unsloth, and TRL |
|
|
|
## **Performance and Use Cases** |
|
This model is ideal for applications involving: |
|
- Vision-based text generation and description tasks |
|
- Instruction-following in multimodal contexts |
|
- General-purpose text generation with enhanced reasoning |
|
|
|
### **Features** |
|
- **2x Faster Training:** Leveraging the Unsloth framework for accelerated fine-tuning. |
|
- **Multimodal Capabilities:** Enhanced to handle vision-language interactions. |
|
- **Instruction Optimization:** Tailored for improved comprehension and execution of instructions. |
|
|
|
|
|
## **How to Use** |
|
|
|
### **Inference Example (Hugging Face Transformers)** |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("Daemontatox/finetuned-llama-3.2-vision-instruct") |
|
model = AutoModelForCausalLM.from_pretrained("Daemontatox/finetuned-llama-3.2-vision-instruct") |
|
|
|
input_text = "Describe the image showing a sunset over mountains." |
|
inputs = tokenizer(input_text, return_tensors="pt") |
|
outputs = model.generate(**inputs, max_length=100) |
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Daemontatox__DocumentCogito-details)! |
|
Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=Daemontatox%2FDocumentCogito&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)! |
|
|
|
| Metric |Value (%)| |
|
|-------------------|--------:| |
|
|**Average** | 24.21| |
|
|IFEval (0-Shot) | 50.64| |
|
|BBH (3-Shot) | 29.79| |
|
|MATH Lvl 5 (4-Shot)| 16.24| |
|
|GPQA (0-shot) | 8.84| |
|
|MuSR (0-shot) | 8.60| |
|
|MMLU-PRO (5-shot) | 31.14| |
|
|
|
|