---
license: apache-2.0
---

Zero-shot results when using the [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) as the teacher model, and the [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) as the initialized model

| Task          | Llama-3.2-3B-Instruct | Llama3.2-Mamba-3B-distill |
|---------------|------------------------|--------------------------|
| arc_challenge | 0.459                 | 0.4838                   |
| arc_easy      | 0.7407                | 0.7765                   |
| hellaswag     | 0.7043                | 0.7037                   |
| mmlu          | 0.6043                | 0.5448                   |
| openbookqa    | 0.36                  | 0.394                    |
| piqa          | 0.7568                | 0.7731                   |
| pubmedqa      | 0.696                 | 0.664                    |
| race          | 0.4067                | 0.4029                   |
| winogrande    | 0.6748                | 0.6732                   |


```
@article{junxiongdaniele2024mambainllama,
  title   = {The Mamba in the Llama: Distilling and Accelerating Hybrid Models},
  author  = {Junxiong Wang and Daniele Paliotta and Avner May and Alexander M. Rush and Tri Dao},
  journal = {arXiv preprint arXiv:2408.15237},
  year    = {2024}
}
```