Directly distill from Llama without doing SFT and DPO
Junxiong Wang
JunxiongWang
AI & ML interests
Attention Free Model / Subquadratic Language Models
Organizations
Collections
6
models
33
JunxiongWang/Llama3.1-Mamba-8B-distill
Updated
JunxiongWang/Llama3.2-Mamba-3B-distill
Updated
JunxiongWang/Llama3.2-Mamba2-3B-distill
Updated
•
66
JunxiongWang/Llama3.2-Mamba2-3B-dpo
Updated
•
77
JunxiongWang/Llama3.1-Mamba2-8B-distill
Updated
JunxiongWang/MambaByte_Stories
Text Generation
•
Updated
•
212
•
1
JunxiongWang/MambaByte_Arxiv
Text Generation
•
Updated
•
19
•
3
JunxiongWang/MambaByte_PG19_353M
Text Generation
•
Updated
•
12
JunxiongWang/MambaByte_Books
Text Generation
•
Updated
•
674
•
2
JunxiongWang/MambaByte_Code
Text Generation
•
Updated
•
426
•
2