|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
tags: |
|
- MoE |
|
--- |
|
# LLaMA-MoE-v1-3.5B (4/16) SFT |
|
|
|
[[π» Code]](https://github.com/pjlab-sys4nlp/llama-moe) | [[π Technical Report]](https://github.com/pjlab-sys4nlp/llama-moe/blob/main/docs/LLaMA_MoE.pdf) |
|
|
|
This is the supervised fine-tuned version of [LLaMA-MoE-v1-3_5B-4_16](https://huggingface.co/llama-moe/LLaMA-MoE-v1-3_5B-4_16) on [Deita-6k](https://huggingface.co/datasets/hkust-nlp/deita-6k-v0) for 2 epochs. |
|
|
|
|
|
| Model | \#Activated Experts | \#Experts | \#Activated Params | Foundation Model | SFT Model | |
|
| :------------------------ | :-----------------: | :-------: | :----------------: | :---------------------------------------------------------------: | :------------------------------------------------------------------: | |
|
| **LLaMA-MoE-3.0B** | 2 | 16 | 3.0B | [π€ base](https://huggingface.co/llama-moe/LLaMA-MoE-v1-3_0B-2_16) | [π€ SFT](https://huggingface.co/llama-moe/LLaMA-MoE-v1-3_0B-2_16-sft) | |
|
| **LLaMA-MoE-3.5B (4/16)** | 4 | 16 | 3.5B | [π€ base](https://huggingface.co/llama-moe/LLaMA-MoE-v1-3_5B-4_16) | [π€ SFT](https://huggingface.co/llama-moe/LLaMA-MoE-v1-3_5B-4_16-sft) | |
|
| **LLaMA-MoE-3.5B (2/8)** | 2 | 8 | 3.5B | [π€ base](https://huggingface.co/llama-moe/LLaMA-MoE-v1-3_5B-2_8) | [π€ SFT](https://huggingface.co/llama-moe/LLaMA-MoE-v1-3_5B-2_8-sft) | |
|
|
|
|
|
## π QuickStart |
|
|
|
```python |
|
# python>=3.10 |
|
|
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
model_dir = "llama-moe/LLaMA-MoE-v1-3_5B-4_16-sft" |
|
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True) |
|
model = AutoModelForCausalLM.from_pretrained(model_dir, torch_dtype=torch.bfloat16, trust_remote_code=True) |
|
model.eval() |
|
model.cuda() |
|
|
|
input_text = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. human: Give me a three-day plan in Suzhou. gpt:" |
|
inputs = tokenizer(input_text, return_tensors="pt") |
|
input_ids = inputs["input_ids"].cuda() |
|
|
|
pred = model.generate(input_ids, max_length=100, temperature=1.0, do_sample=True, use_cache=True) |
|
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True)) |
|
""" |
|
Sure, I can provide you with a three-day itinerary in Suzhou. Here's what we can do: |
|
|
|
Day 1: |
|
|
|
* Visit Suzhou Industrial Park, a major commercial and manufacturing district ... |
|
""" |
|
``` |
|
|
|
## π Performance |
|
|
|
| Model | MMLU | ARC-c | HellaSeag | TruthfulQA | MT-Bench | |
|
| :------------------------------------- | :---: | :---: | :-------: | :--------: | :------: | |
|
| Sheared LLaMA-2.7B ShareGPT | 28.41 | 41.04 | 71.21 | 47.65 | 3.79 | |
|
| Sheared LLaMA-2.7B Deita6K (Our Impl.) | 25.24 | 43.69 | 71.70 | 49.00 | 4.06 | |
|
| LLaMA-MoE-v1-3.0B (2/16) | 23.61 | 43.43 | 72.28 | 44.24 | 4.15 | |
|
| LLaMA-MoE-v1-3.5B (4/16) | 26.49 | 48.29 | 75.10 | 45.91 | 4.60 | |
|
| LLaMA-MoE-v1-3.5B (2/8) | 25.53 | 45.99 | 74.95 | 44.39 | 4.72 | |
|
|
|
## π Citation |
|
|
|
```bibtex |
|
@article{llama-moe, |
|
title={LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training}, |
|
author={Tong Zhu and Xiaoye Qu and Daize Dong and Jiacheng Ruan and Jingqi Tong and Conghui He and Yu Cheng}, |
|
journal={arXiv preprint arXiv:2406.16554}, |
|
year={2024}, |
|
url={https://arxiv.org/abs/2406.16554}, |
|
} |
|
``` |
|
|