LLaMA-MoE-v1-3.5B (4/16) SFT

[πŸ’» Code] | [πŸ“œ Technical Report]

This is the supervised fine-tuned version of LLaMA-MoE-v1-3_5B-4_16 on Deita-6k for 2 epochs.

Model #Activated Experts #Experts #Activated Params Foundation Model SFT Model
LLaMA-MoE-3.0B 2 16 3.0B πŸ€— base πŸ€— SFT
LLaMA-MoE-3.5B (4/16) 4 16 3.5B πŸ€— base πŸ€— SFT
LLaMA-MoE-3.5B (2/8) 2 8 3.5B πŸ€— base πŸ€— SFT

πŸš€ QuickStart

# python>=3.10

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_dir = "llama-moe/LLaMA-MoE-v1-3_5B-4_16-sft"
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_dir, torch_dtype=torch.bfloat16, trust_remote_code=True)
model.eval()
model.cuda()

input_text = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. human: Give me a three-day plan in Suzhou. gpt:"
inputs = tokenizer(input_text, return_tensors="pt")
input_ids = inputs["input_ids"].cuda()

pred = model.generate(input_ids, max_length=100, temperature=1.0, do_sample=True, use_cache=True)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
"""
Sure, I can provide you with a three-day itinerary in Suzhou. Here's what we can do:

Day 1:

* Visit Suzhou Industrial Park, a major commercial and manufacturing district ...
"""

πŸ“Š Performance

Model MMLU ARC-c HellaSeag TruthfulQA MT-Bench
Sheared LLaMA-2.7B ShareGPT 28.41 41.04 71.21 47.65 3.79
Sheared LLaMA-2.7B Deita6K (Our Impl.) 25.24 43.69 71.70 49.00 4.06
LLaMA-MoE-v1-3.0B (2/16) 23.61 43.43 72.28 44.24 4.15
LLaMA-MoE-v1-3.5B (4/16) 26.49 48.29 75.10 45.91 4.60
LLaMA-MoE-v1-3.5B (2/8) 25.53 45.99 74.95 44.39 4.72

πŸ“ƒ Citation

@article{llama-moe,
  title={LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training},
  author={Tong Zhu and Xiaoye Qu and Daize Dong and Jiacheng Ruan and Jingqi Tong and Conghui He and Yu Cheng},
  journal={arXiv preprint arXiv:2406.16554},
  year={2024},
  url={https://arxiv.org/abs/2406.16554},
}
Downloads last month
20
Safetensors
Model size
6.74B params
Tensor type
BF16
Β·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.