File size: 3,643 Bytes
7dfd96a
 
 
 
78e54b5
7dfd96a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
---
license: apache-2.0
---

Zero-shot results when using the [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) as the teacher model, and the [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) as the initialized model

| Model          | [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | [Llama-3.2-Mamba2-0.5-3B-sft](https://huggingface.co/JunxiongWang/Mamba2InLlama3B_Half)       | [Llama-3.2-Mamba2-0.5-3B-dpo](https://huggingface.co/JunxiongWang/Mamba2InLlama3B_Half_DPO)       |
|---------------|---------------------------------------------------------------------------------|-----------------------------------|-----------------------------------|
| Initialization Model | N/A                                                                             | Llama-3.2-3B-Instruct             | Llama-3.2-3B-Instruct             |
| Teacher Model | N/A                                                                             | Llama-3.1-8B-Instruct             | Llama-3.1-8B-Instruct             |
| arc_challenge   | 0.459                                                                           | 0.4667                                                            | 0.541                                                                 |
| arc_easy        | 0.7407                                                                          | 0.7668                                                            | 0.8026                                                                |                                                               |
| hellaswag       | 0.7043                                                                          | 0.6913                                                            | 0.7445                                                                |
| mmlu            | 0.6043                                                                          | 0.5271                                                            | 0.5247                                                                |
| openbookqa      | 0.36                                                                            | 0.388                                                             | 0.424                                                                 |
| piqa            | 0.7568                                                                          | 0.7601                                                            | 0.7769                                                                |
| pubmedqa        | 0.696                                                                           | 0.638                                                             | 0.654                                                                 |
| race            | 0.4067                                                                          | 0.3981                                                            | 0.4344                                                                |
| winogrande      | 0.6748                                                                          | 0.6606                                                            | 0.6732                                                                |


```
@article{junxiongdaniele2024mambainllama,
  title   = {The Mamba in the Llama: Distilling and Accelerating Hybrid Models},
  author  = {Junxiong Wang and Daniele Paliotta and Avner May and Alexander M. Rush and Tri Dao},
  journal = {arXiv preprint arXiv:2408.15237},
  year    = {2024}
}
```