Mamba2InLlama3.2-3B
Collection
Mamba distilled from Llama3.2 3B Instruct. The Mamba in the Llama: Distilling and Accelerating Hybrid Models (https://arxiv.org/abs/2408.15237).
•
2 items
•
Updated
Zero-shot results when using the Llama-3.1-70B-Instruct as the teacher model, and the Llama-3.2-3B-Instruct as the initialized model
Model | Llama-3.2-3B-Instruct | Llama-3.2-Mamba2-0.5-3B-sft | Llama-3.2-Mamba2-0.5-3B-dpo |
---|---|---|---|
Initialization Model | N/A | Llama-3.2-3B-Instruct | Llama-3.2-3B-Instruct |
Teacher Model | N/A | Llama-3.1-8B-Instruct | Llama-3.1-8B-Instruct |
arc_challenge | 0.459 | 0.4667 | 0.541 |
arc_easy | 0.7407 | 0.7668 | 0.8026 |
hellaswag | 0.7043 | 0.6913 | 0.7445 |
mmlu | 0.6043 | 0.5271 | 0.5247 |
openbookqa | 0.36 | 0.388 | 0.424 |
piqa | 0.7568 | 0.7601 | 0.7769 |
pubmedqa | 0.696 | 0.638 | 0.654 |
race | 0.4067 | 0.3981 | 0.4344 |
winogrande | 0.6748 | 0.6606 | 0.6732 |
@article{junxiongdaniele2024mambainllama,
title = {The Mamba in the Llama: Distilling and Accelerating Hybrid Models},
author = {Junxiong Wang and Daniele Paliotta and Avner May and Alexander M. Rush and Tri Dao},
journal = {arXiv preprint arXiv:2408.15237},
year = {2024}
}