metadata
language:
- ko
- en
library_name: transformers
base_model:
- moreh/Llama-3-Motif-102B
Introduction
We introduce Llama-3-Motif, a new language model family of Moreh, specialized in Korean and English.
Llama-3-Motif-102B-Instruct is a chat model tuned from the base model Llama-3-Motif-102B.
Training Platform
- Llama-3-Motif-102B model family is trained on MoAI platform, refer to link for more information.
Quick Usage
You can chat directly with our model Llama-3-Motif through our Model hub.
Details
More details will be provided in the upcoming technical report.
Effective context length is 32k(avg 81) based on RULER benchmark.
Release Date
2024.12.02
Benchmark Results
Provider | Model | kmmlu_direct score | |
---|---|---|---|
Moreh | Llama-3-Motif-102B | 64.74 | + |
Moreh | Llama-3-Motif-102B-Instruct | 64.81 | + |
Meta | Llama3-70B-instruct | 54.5* | |
Meta | Llama3.1-70B-instruct | 52.1* | |
Meta | Llama3.1-405B-instruct | 65.8* | |
Alibaba | Qwen2-72B-instruct | 64.1* | |
OpenAI | GPT-4-0125-preview | 59.95* | |
OpenAI | GPT-4o-2024-05-13 | 64.11** | |
gemini pro | 50.18* | ||
LG | exaone 3.0 | 44.5* | + |
Naver | HyperCLOVA X | 53.4* | + |
Upstage | SOLAR-10.7B | 41.65* | + |
* : Community report
** : Measured by Moreh
+ : Claimed to have better capability in Korean
How to use
Use with vLLM
- Refer to this link to install vllm
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
# Change tensor_parallel_size to GPU numbers you can afford
model = LLM("moreh/Motif-102B-Instruct", tensor_parallel_size=4)
tokenizer = AutoTokenizer.from_pretrained("moreh/Llama-3-Motif-102B-Instruct")
messages = [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "์ ์น์์์๊ฒ ๋น
๋ฑ
์ด๋ก ์ ๊ฐ๋
์ ์ค๋ช
ํด๋ณด์ธ์"},
]
messages_batch = [tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)]
# vllm does not support generation_config of hf. So we have to set it like below
sampling_params = SamplingParams(max_tokens=512, temperature=0, repetition_penalty=1.0, stop_token_ids=[tokenizer.eos_token_id])
responses = model.generate(messages_batch, sampling_params=sampling_params)
print(responses[0].outputs[0].text)
Use with transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "moreh/Llama-3-Motif-102B-Instruct"
# all generation configs are set in generation_configs.json
model = AutoModelForCausalLM.from_pretrained(model_id).cuda()
tokenizer = AutoTokenizer.from_pretrained(model_id)
messages = [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "์ ์น์์์๊ฒ ๋น
๋ฑ
์ด๋ก ์ ๊ฐ๋
์ ์ค๋ช
ํด๋ณด์ธ์"},
]
messages_batch = tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)
input_ids = tokenizer(messages_batch, padding=True, return_tensors='pt')['input_ids'].cuda()
outputs = model.generate(input_ids)