llama-moe
/

LLaMA-MoE-v1-3_5B-4_16-sft

Text Generation

Model card Files Files and versions Community

LLaMA-MoE-v1-3_5B-4_16-sft / README.md

Spico's picture

Update README.md

afb9cf6 verified 6 months ago

|

history blame contribute delete

3.69 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- MoE
	---
	# LLaMA-MoE-v1-3.5B (4/16) SFT

	[[💻 Code]](https://github.com/pjlab-sys4nlp/llama-moe) \| [[📜 Technical Report]](https://github.com/pjlab-sys4nlp/llama-moe/blob/main/docs/LLaMA_MoE.pdf)

	This is the supervised fine-tuned version of [LLaMA-MoE-v1-3_5B-4_16](https://huggingface.co/llama-moe/LLaMA-MoE-v1-3_5B-4_16) on [Deita-6k](https://huggingface.co/datasets/hkust-nlp/deita-6k-v0) for 2 epochs.


	\| Model \| \#Activated Experts \| \#Experts \| \#Activated Params \| Foundation Model \| SFT Model \|
	\| :------------------------ \| :-----------------: \| :-------: \| :----------------: \| :---------------------------------------------------------------: \| :------------------------------------------------------------------: \|
	\| LLaMA-MoE-3.0B \| 2 \| 16 \| 3.0B \| [🤗 base](https://huggingface.co/llama-moe/LLaMA-MoE-v1-3_0B-2_16) \| [🤗 SFT](https://huggingface.co/llama-moe/LLaMA-MoE-v1-3_0B-2_16-sft) \|
	\| LLaMA-MoE-3.5B (4/16) \| 4 \| 16 \| 3.5B \| [🤗 base](https://huggingface.co/llama-moe/LLaMA-MoE-v1-3_5B-4_16) \| [🤗 SFT](https://huggingface.co/llama-moe/LLaMA-MoE-v1-3_5B-4_16-sft) \|
	\| LLaMA-MoE-3.5B (2/8) \| 2 \| 8 \| 3.5B \| [🤗 base](https://huggingface.co/llama-moe/LLaMA-MoE-v1-3_5B-2_8) \| [🤗 SFT](https://huggingface.co/llama-moe/LLaMA-MoE-v1-3_5B-2_8-sft) \|


	## 🚀 QuickStart

	```python
	# python>=3.10

	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_dir = "llama-moe/LLaMA-MoE-v1-3_5B-4_16-sft"
	tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(model_dir, torch_dtype=torch.bfloat16, trust_remote_code=True)
	model.eval()
	model.cuda()

	input_text = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. human: Give me a three-day plan in Suzhou. gpt:"
	inputs = tokenizer(input_text, return_tensors="pt")
	input_ids = inputs["input_ids"].cuda()

	pred = model.generate(input_ids, max_length=100, temperature=1.0, do_sample=True, use_cache=True)
	print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
	"""
	Sure, I can provide you with a three-day itinerary in Suzhou. Here's what we can do:

	Day 1:

	* Visit Suzhou Industrial Park, a major commercial and manufacturing district ...
	"""
	```

	## 📊 Performance

	\| Model \| MMLU \| ARC-c \| HellaSeag \| TruthfulQA \| MT-Bench \|
	\| :------------------------------------- \| :---: \| :---: \| :-------: \| :--------: \| :------: \|
	\| Sheared LLaMA-2.7B ShareGPT \| 28.41 \| 41.04 \| 71.21 \| 47.65 \| 3.79 \|
	\| Sheared LLaMA-2.7B Deita6K (Our Impl.) \| 25.24 \| 43.69 \| 71.70 \| 49.00 \| 4.06 \|
	\| LLaMA-MoE-v1-3.0B (2/16) \| 23.61 \| 43.43 \| 72.28 \| 44.24 \| 4.15 \|
	\| LLaMA-MoE-v1-3.5B (4/16) \| 26.49 \| 48.29 \| 75.10 \| 45.91 \| 4.60 \|
	\| LLaMA-MoE-v1-3.5B (2/8) \| 25.53 \| 45.99 \| 74.95 \| 44.39 \| 4.72 \|

	## 📃 Citation

	```bibtex
	@article{llama-moe,
	title={LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training},
	author={Tong Zhu and Xiaoye Qu and Daize Dong and Jiacheng Ruan and Jingqi Tong and Conghui He and Yu Cheng},
	journal={arXiv preprint arXiv:2406.16554},
	year={2024},
	url={https://arxiv.org/abs/2406.16554},
	}
	```