README.md · JunxiongWang/Llama3.2-Mamba-3B-distill at main

metadata

license: apache-2.0

Zero-shot results when using the Llama-3.1-70B-Instruct as the teacher model, and the Llama-3.2-3B-Instruct as the initialized model

Task	Llama-3.2-3B-Instruct	Llama3.2-Mamba-3B-distill
arc_challenge	0.459	0.4838
arc_easy	0.7407	0.7765
hellaswag	0.7043	0.7037
mmlu	0.6043	0.5448
openbookqa	0.36	0.394
piqa	0.7568	0.7731
pubmedqa	0.696	0.664
race	0.4067	0.4029
winogrande	0.6748	0.6732

@article{junxiongdaniele2024mambainllama,
  title   = {The Mamba in the Llama: Distilling and Accelerating Hybrid Models},
  author  = {Junxiong Wang and Daniele Paliotta and Avner May and Alexander M. Rush and Tri Dao},
  journal = {arXiv preprint arXiv:2408.15237},
  year    = {2024}
}