UPDATE! GGUF Format is ready at cloudyu/Yi-34Bx2-MoE-60B-GGUF
Yi based MOE 2x34B with mixtral architecture
Highest score Model ranked by Open LLM Leaderboard (2024-01-11)
This is an English & Chinese MoE Model , slightly different with cloudyu/Mixtral_34Bx2_MoE_60B, and also based on
- [jondurbin/bagel-dpo-34b-v0.2]
- [SUSTech/SUS-Chat-34B]
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 76.72 |
AI2 Reasoning Challenge (25-Shot) | 71.08 |
HellaSwag (10-Shot) | 85.23 |
MMLU (5-Shot) | 77.47 |
TruthfulQA (0-shot) | 66.19 |
Winogrande (5-shot) | 84.85 |
GSM8k (5-shot) | 75.51 |
gpu code example
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import math
## v2 models
model_path = "cloudyu/Yi-34Bx2-MoE-60B"
tokenizer = AutoTokenizer.from_pretrained(model_path, use_default_system_prompt=False)
model = AutoModelForCausalLM.from_pretrained(
model_path, torch_dtype=torch.float32, device_map='auto',local_files_only=False, load_in_4bit=True
)
print(model)
prompt = input("please input prompt:")
while len(prompt) > 0:
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")
generation_output = model.generate(
input_ids=input_ids, max_new_tokens=500,repetition_penalty=1.2
)
print(tokenizer.decode(generation_output[0]))
prompt = input("please input prompt:")
CPU example
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import math
## v2 models
model_path = "cloudyu/Yi-34Bx2-MoE-60B"
tokenizer = AutoTokenizer.from_pretrained(model_path, use_default_system_prompt=False)
model = AutoModelForCausalLM.from_pretrained(
model_path, torch_dtype=torch.bfloat16, device_map='cpu'
)
print(model)
prompt = input("please input prompt:")
while len(prompt) > 0:
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
generation_output = model.generate(
input_ids=input_ids, max_new_tokens=500,repetition_penalty=1.2
)
print(tokenizer.decode(generation_output[0]))
prompt = input("please input prompt:")
- Downloads last month
- 3,560
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.