JunxiongWang's picture
Update README.md
55eafdf verified
metadata
license: apache-2.0

Zero-shot results when using the Llama-3.1-70B-Instruct as the teacher model, and the Llama-3.2-3B-Instruct as the initialized model

Task Llama-3.2-3B-Instruct Llama3.2-Mamba-3B-distill
arc_challenge 0.459 0.4838
arc_easy 0.7407 0.7765
hellaswag 0.7043 0.7037
mmlu 0.6043 0.5448
openbookqa 0.36 0.394
piqa 0.7568 0.7731
pubmedqa 0.696 0.664
race 0.4067 0.4029
winogrande 0.6748 0.6732
@article{junxiongdaniele2024mambainllama,
  title   = {The Mamba in the Llama: Distilling and Accelerating Hybrid Models},
  author  = {Junxiong Wang and Daniele Paliotta and Avner May and Alexander M. Rush and Tri Dao},
  journal = {arXiv preprint arXiv:2408.15237},
  year    = {2024}
}