|
--- |
|
language: |
|
- ko |
|
- en |
|
library_name: transformers |
|
base_model: |
|
- moreh/Llama-3-Motif-102B |
|
--- |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c0c845a04a514ba62bcd1a/RFpsPxlc_3cK0kmWj-tYR.png) |
|
|
|
# **Introduction** |
|
We introduce Llama-3-Motif, a new language model family of [**Moreh**](https://moreh.io/), specialized in Korean and English.\ |
|
Llama-3-Motif-102B-Instruct is a chat model tuned from the base model [Llama-3-Motif-102B](https://huggingface.co/moreh/Motif-102B). |
|
|
|
## Training Platform |
|
- Llama-3-Motif-102B model family is trained on [**MoAI platform**](https://moreh.io/product), refer to link for more information. |
|
|
|
## Quick Usage |
|
You can chat directly with our model Llama-3-Motif through our [Model hub](https://model-hub.moreh.io/). |
|
|
|
## Details |
|
More details will be provided in the upcoming technical report. |
|
Effective context length is 32k(avg 81) based on [RULER](https://github.com/NVIDIA/RULER) benchmark. |
|
|
|
### Release Date |
|
2024.12.02 |
|
|
|
### Benchmark Results |
|
|
|
|Provider|Model|kmmlu_direct score|| |
|
|---|---|---|---| |
|
|Moreh|Llama-3-Motif-102B|64.74|+| |
|
|Moreh|**Llama-3-Motif-102B-Instruct**|**64.81**|+| |
|
|Meta|Llama3-70B-instruct|54.5*|| |
|
|Meta|Llama3.1-70B-instruct|52.1*|| |
|
|Meta|Llama3.1-405B-instruct|65.8*|| |
|
|Alibaba|Qwen2-72B-instruct|64.1*|| |
|
|OpenAI|GPT-4-0125-preview|59.95*|| |
|
|OpenAI|GPT-4o-2024-05-13|64.11**|| |
|
|Google|gemini pro|50.18*|| |
|
|LG|exaone 3.0|44.5*|+| |
|
|Naver|HyperCLOVA X|53.4*|+| |
|
|Upstage|SOLAR-10.7B|41.65*|+| |
|
|
|
\* : Community report |
|
\*\* : Measured by Moreh |
|
\+ : Claimed to have better capability in Korean |
|
|
|
|
|
## How to use |
|
|
|
### Use with vLLM |
|
- Refer to this [link](https://github.com/vllm-project/vllm) to install vllm |
|
```python |
|
from transformers import AutoTokenizer |
|
from vllm import LLM, SamplingParams |
|
|
|
# Change tensor_parallel_size to GPU numbers you can afford |
|
model = LLM("moreh/Motif-102B-Instruct", tensor_parallel_size=4) |
|
tokenizer = AutoTokenizer.from_pretrained("moreh/Llama-3-Motif-102B-Instruct") |
|
messages = [ |
|
{"role": "system", "content": "You are a helpful assistant"}, |
|
{"role": "user", "content": "μ μΉμμμκ² λΉ
λ±
μ΄λ‘ μ κ°λ
μ μ€λͺ
ν΄λ³΄μΈμ"}, |
|
] |
|
|
|
messages_batch = [tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)] |
|
|
|
# vllm does not support generation_config of hf. So we have to set it like below |
|
sampling_params = SamplingParams(max_tokens=512, temperature=0, repetition_penalty=1.0, stop_token_ids=[tokenizer.eos_token_id]) |
|
responses = model.generate(messages_batch, sampling_params=sampling_params) |
|
|
|
print(responses[0].outputs[0].text) |
|
``` |
|
|
|
### Use with transformers |
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
import torch |
|
|
|
model_id = "moreh/Llama-3-Motif-102B-Instruct" |
|
|
|
# all generation configs are set in generation_configs.json |
|
model = AutoModelForCausalLM.from_pretrained(model_id).cuda() |
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
messages = [ |
|
{"role": "system", "content": "You are a helpful assistant"}, |
|
{"role": "user", "content": "μ μΉμμμκ² λΉ
λ±
μ΄λ‘ μ κ°λ
μ μ€λͺ
ν΄λ³΄μΈμ"}, |
|
] |
|
|
|
messages_batch = tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False) |
|
input_ids = tokenizer(messages_batch, padding=True, return_tensors='pt')['input_ids'].cuda() |
|
|
|
outputs = model.generate(input_ids) |
|
``` |