moreh
/

Llama-3-Motif-102B-Instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama-3-Motif-102B-Instruct / README.md

leejunhyeok's picture

Update README.md

5a21d40 verified 25 days ago

|

history blame contribute delete

3.38 kB

	---
	language:
	- ko
	- en
	library_name: transformers
	base_model:
	- moreh/Llama-3-Motif-102B
	---

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c0c845a04a514ba62bcd1a/RFpsPxlc_3cK0kmWj-tYR.png)

	# Introduction
	We introduce Llama-3-Motif, a new language model family of [Moreh](https://moreh.io/), specialized in Korean and English.\
	Llama-3-Motif-102B-Instruct is a chat model tuned from the base model [Llama-3-Motif-102B](https://huggingface.co/moreh/Motif-102B).

	## Training Platform
	- Llama-3-Motif-102B model family is trained on [MoAI platform](https://moreh.io/product), refer to link for more information.

	## Quick Usage
	You can chat directly with our model Llama-3-Motif through our [Model hub](https://model-hub.moreh.io/).

	## Details
	More details will be provided in the upcoming technical report.
	Effective context length is 32k(avg 81) based on [RULER](https://github.com/NVIDIA/RULER) benchmark.

	### Release Date
	2024.12.02

	### Benchmark Results

	\|Provider\|Model\|kmmlu_direct score\|\|
	\|---\|---\|---\|---\|
	\|Moreh\|Llama-3-Motif-102B\|64.74\|+\|
	\|Moreh\|Llama-3-Motif-102B-Instruct\|64.81\|+\|
	\|Meta\|Llama3-70B-instruct\|54.5*\|\|
	\|Meta\|Llama3.1-70B-instruct\|52.1*\|\|
	\|Meta\|Llama3.1-405B-instruct\|65.8*\|\|
	\|Alibaba\|Qwen2-72B-instruct\|64.1*\|\|
	\|OpenAI\|GPT-4-0125-preview\|59.95*\|\|
	\|OpenAI\|GPT-4o-2024-05-13\|64.11**\|\|
	\|Google\|gemini pro\|50.18*\|\|
	\|LG\|exaone 3.0\|44.5*\|+\|
	\|Naver\|HyperCLOVA X\|53.4*\|+\|
	\|Upstage\|SOLAR-10.7B\|41.65*\|+\|

	\* : Community report
	\\ : Measured by Moreh
	\+ : Claimed to have better capability in Korean


	## How to use

	### Use with vLLM
	- Refer to this [link](https://github.com/vllm-project/vllm) to install vllm
	```python
	from transformers import AutoTokenizer
	from vllm import LLM, SamplingParams

	# Change tensor_parallel_size to GPU numbers you can afford
	model = LLM("moreh/Motif-102B-Instruct", tensor_parallel_size=4)
	tokenizer = AutoTokenizer.from_pretrained("moreh/Llama-3-Motif-102B-Instruct")
	messages = [
	{"role": "system", "content": "You are a helpful assistant"},
	{"role": "user", "content": "유치원생에게 빅뱅 이론의 개념을 설명해보세요"},
	]

	messages_batch = [tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)]

	# vllm does not support generation_config of hf. So we have to set it like below
	sampling_params = SamplingParams(max_tokens=512, temperature=0, repetition_penalty=1.0, stop_token_ids=[tokenizer.eos_token_id])
	responses = model.generate(messages_batch, sampling_params=sampling_params)

	print(responses[0].outputs[0].text)
	```

	### Use with transformers
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_id = "moreh/Llama-3-Motif-102B-Instruct"

	# all generation configs are set in generation_configs.json
	model = AutoModelForCausalLM.from_pretrained(model_id).cuda()
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	messages = [
	{"role": "system", "content": "You are a helpful assistant"},
	{"role": "user", "content": "유치원생에게 빅뱅 이론의 개념을 설명해보세요"},
	]

	messages_batch = tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)
	input_ids = tokenizer(messages_batch, padding=True, return_tensors='pt')['input_ids'].cuda()

	outputs = model.generate(input_ids)
	```