---
license: apache-2.0
language:
- en
tags:
- MoE
---
# LLaMA-MoE-v1-3.5B (4/16) SFT

[[💻 Code]](https://github.com/pjlab-sys4nlp/llama-moe) | [[📜 Technical Report]](https://github.com/pjlab-sys4nlp/llama-moe/blob/main/docs/LLaMA_MoE.pdf)

This is the supervised fine-tuned version of [LLaMA-MoE-v1-3_5B-4_16](https://huggingface.co/llama-moe/LLaMA-MoE-v1-3_5B-4_16) on [Deita-6k](https://huggingface.co/datasets/hkust-nlp/deita-6k-v0) for 2 epochs.


| Model                     | \#Activated Experts | \#Experts | \#Activated Params |                         Foundation Model                          |                              SFT Model                               |
| :------------------------ | :-----------------: | :-------: | :----------------: | :---------------------------------------------------------------: | :------------------------------------------------------------------: |
| **LLaMA-MoE-3.0B**        |          2          |    16     |        3.0B        | [🤗 base](https://huggingface.co/llama-moe/LLaMA-MoE-v1-3_0B-2_16) | [🤗 SFT](https://huggingface.co/llama-moe/LLaMA-MoE-v1-3_0B-2_16-sft) |
| **LLaMA-MoE-3.5B (4/16)** |          4          |    16     |        3.5B        | [🤗 base](https://huggingface.co/llama-moe/LLaMA-MoE-v1-3_5B-4_16) | [🤗 SFT](https://huggingface.co/llama-moe/LLaMA-MoE-v1-3_5B-4_16-sft) |
| **LLaMA-MoE-3.5B (2/8)**  |          2          |     8     |        3.5B        | [🤗 base](https://huggingface.co/llama-moe/LLaMA-MoE-v1-3_5B-2_8)  | [🤗 SFT](https://huggingface.co/llama-moe/LLaMA-MoE-v1-3_5B-2_8-sft)  |


## 🚀 QuickStart

```python
# python>=3.10

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_dir = "llama-moe/LLaMA-MoE-v1-3_5B-4_16-sft"
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_dir, torch_dtype=torch.bfloat16, trust_remote_code=True)
model.eval()
model.cuda()

input_text = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. human: Give me a three-day plan in Suzhou. gpt:"
inputs = tokenizer(input_text, return_tensors="pt")
input_ids = inputs["input_ids"].cuda()

pred = model.generate(input_ids, max_length=100, temperature=1.0, do_sample=True, use_cache=True)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
"""
Sure, I can provide you with a three-day itinerary in Suzhou. Here's what we can do:

Day 1:

* Visit Suzhou Industrial Park, a major commercial and manufacturing district ...
"""
```

## 📊 Performance

| Model                                  | MMLU  | ARC-c | HellaSeag | TruthfulQA | MT-Bench |
| :------------------------------------- | :---: | :---: | :-------: | :--------: | :------: |
| Sheared LLaMA-2.7B ShareGPT            | 28.41 | 41.04 |   71.21   |   47.65    |   3.79   |
| Sheared LLaMA-2.7B Deita6K (Our Impl.) | 25.24 | 43.69 |   71.70   |   49.00    |   4.06   |
| LLaMA-MoE-v1-3.0B (2/16)               | 23.61 | 43.43 |   72.28   |   44.24    |   4.15   |
| LLaMA-MoE-v1-3.5B (4/16)               | 26.49 | 48.29 |   75.10   |   45.91    |   4.60   |
| LLaMA-MoE-v1-3.5B (2/8)                | 25.53 | 45.99 |   74.95   |   44.39    |   4.72   |

## 📃 Citation

```bibtex
@misc{llama-moe2023,
  title={LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training},
  author={LLaMA-MoE Team},
  year={2023},
  publisher={Dec}
}
```