metadata

language:
  - ko
  - en
library_name: transformers
base_model:
  - moreh/Llama-3-Motif-102B

Introduction

We introduce Llama-3-Motif, a new language model family of Moreh, specialized in Korean and English.
Llama-3-Motif-102B-Instruct is a chat model tuned from the base model Llama-3-Motif-102B.

Training Platform

Llama-3-Motif-102B model family is trained on MoAI platform, refer to link for more information.

Quick Usage

You can chat directly with our model Llama-3-Motif through our Model hub.

Details

More details will be provided in the upcoming technical report.
Effective context length is 32k(avg 81) based on RULER benchmark.

Release Date

2024.12.02

Benchmark Results

Provider	Model	kmmlu_direct score
Moreh	Llama-3-Motif-102B	64.74	+
Moreh	Llama-3-Motif-102B-Instruct	64.81	+
Meta	Llama3-70B-instruct	54.5*
Meta	Llama3.1-70B-instruct	52.1*
Meta	Llama3.1-405B-instruct	65.8*
Alibaba	Qwen2-72B-instruct	64.1*
OpenAI	GPT-4-0125-preview	59.95*
OpenAI	GPT-4o-2024-05-13	64.11**
Google	gemini pro	50.18*
LG	exaone 3.0	44.5*	+
Naver	HyperCLOVA X	53.4*	+
Upstage	SOLAR-10.7B	41.65*	+

* : Community report
** : Measured by Moreh
+ : Claimed to have better capability in Korean

How to use

Use with vLLM

Refer to this link to install vllm

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

# Change tensor_parallel_size to GPU numbers you can afford
model = LLM("moreh/Motif-102B-Instruct", tensor_parallel_size=4)
tokenizer = AutoTokenizer.from_pretrained("moreh/Llama-3-Motif-102B-Instruct")
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "유치원생에게 빅뱅 이론의 개념을 설명해보세요"},
]

messages_batch = [tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)]

# vllm does not support generation_config of hf. So we have to set it like below
sampling_params = SamplingParams(max_tokens=512, temperature=0, repetition_penalty=1.0, stop_token_ids=[tokenizer.eos_token_id])
responses = model.generate(messages_batch, sampling_params=sampling_params)

print(responses[0].outputs[0].text)

Use with transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "moreh/Llama-3-Motif-102B-Instruct"

# all generation configs are set in generation_configs.json
model = AutoModelForCausalLM.from_pretrained(model_id).cuda()
tokenizer = AutoTokenizer.from_pretrained(model_id)
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "유치원생에게 빅뱅 이론의 개념을 설명해보세요"},
]

messages_batch = tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)
input_ids = tokenizer(messages_batch, padding=True, return_tensors='pt')['input_ids'].cuda()

outputs = model.generate(input_ids)