JunxiongWang
/

Llama3.2-Mamba-3B-distill

Model card Files Files and versions Community

Llama3.2-Mamba-3B-distill / README.md

JunxiongWang's picture

Update README.md

55eafdf verified about 9 hours ago

|

history blame contribute delete

1.34 kB

	---
	license: apache-2.0
	---

	Zero-shot results when using the [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) as the teacher model, and the [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) as the initialized model

	\| Task \| Llama-3.2-3B-Instruct \| Llama3.2-Mamba-3B-distill \|
	\|---------------\|------------------------\|--------------------------\|
	\| arc_challenge \| 0.459 \| 0.4838 \|
	\| arc_easy \| 0.7407 \| 0.7765 \|
	\| hellaswag \| 0.7043 \| 0.7037 \|
	\| mmlu \| 0.6043 \| 0.5448 \|
	\| openbookqa \| 0.36 \| 0.394 \|
	\| piqa \| 0.7568 \| 0.7731 \|
	\| pubmedqa \| 0.696 \| 0.664 \|
	\| race \| 0.4067 \| 0.4029 \|
	\| winogrande \| 0.6748 \| 0.6732 \|


	```
	@article{junxiongdaniele2024mambainllama,
	title = {The Mamba in the Llama: Distilling and Accelerating Hybrid Models},
	author = {Junxiong Wang and Daniele Paliotta and Avner May and Alexander M. Rush and Tri Dao},
	journal = {arXiv preprint arXiv:2408.15237},
	year = {2024}
	}
	```