komt : korean multi task instruction tuning model

multi task instruction tuning.jpg

Recently, due to the success of ChatGPT, numerous large language models have emerged in an attempt to catch up with ChatGPT's capabilities. However, when it comes to Korean language performance, it has been observed that many models still struggle to provide accurate answers or generate Korean text effectively. This study addresses these challenges by introducing a multi-task instruction technique that leverages supervised datasets from various tasks to create training data for Large Language Models (LLMs).

Model Details

  • Model Developers : davidkim(changyeon kim)
  • Repository : https://github.com/davidkim205/komt
  • Model Architecture : The komt-mistral-7b-v1 is is a fine-tuned version of the Mistral-7B-Instruct-v0.1.

Dataset

korean multi-task instruction dataset

Hardware and Software

  • nvidia driver : 535.54.03
  • CUDA Version: 12.2

Training

Refer https://github.com/davidkim205/komt

Prompt template: Mistral

<s>[INST] {prompt} [/INST]</s>

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import TextStreamer, GenerationConfig

model_name='davidkim205/komt-mistral-7b-v1'
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
streamer = TextStreamer(tokenizer)

def gen(x):
    generation_config = GenerationConfig(
        temperature=0.8,
        top_p=0.8,
        top_k=100,
        max_new_tokens=1024,
        early_stopping=True,
        do_sample=True,
    )
    q = f"[INST]{x} [/INST]"
    gened = model.generate(
        **tokenizer(
            q,
            return_tensors='pt',
            return_token_type_ids=False
        ).to('cuda'),
        generation_config=generation_config,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id,
        streamer=streamer,
    )
    result_str = tokenizer.decode(gened[0])

    start_tag = f"\n\n### Response: "
    start_index = result_str.find(start_tag)

    if start_index != -1:
        result_str = result_str[start_index + len(start_tag):].strip()
    return result_str

print(gen('μ œμ£Όλ„λ₯Ό 1λ°•2일둜 혼자 μ—¬ν–‰ν•˜λ €κ³  ν•˜λŠ”λ° μ—¬ν–‰ μ½”μŠ€λ₯Ό λ§Œλ“€μ–΄μ€˜'))

output

<s> [INST]μ œμ£Όλ„λ₯Ό 1λ°•2일둜 혼자 μ—¬ν–‰ν•˜λ €κ³  ν•˜λŠ”λ° μ—¬ν–‰ μ½”μŠ€λ₯Ό λ§Œλ“€μ–΄μ€˜ [/INST]1λ°• 2일 μ—¬ν–‰ μ½”μŠ€
제주 μ—¬ν–‰ μ½”μŠ€λŠ” λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€:

첫째 λ‚ :
* μ•„μΉ¨: μ œμ£Όμ‹œ μ‹ λΌλ‘œ μ΄λ™ν•˜μ—¬ ν•œλΌμ‚° ꡭ립곡원에 λ„μ°©ν•©λ‹ˆλ‹€. 였λ₯Έμͺ½ 발자ꡭ κ΄‘μž₯μ—μ„œ λˆˆμ— λ„λŠ” λ§₯μ£Όμˆμ„ μ°Ύμ•„ ν•œ λ³‘μ˜ λ§₯μ£Όλ₯Ό λ§ˆμ‹œκ³  κ³„λ‹¨μœΌλ‘œ μ˜¬λΌκ°€ ν•œλΌμ‚° 정상에 λ„μ°©ν•©λ‹ˆλ‹€.
* 점심: μ •μƒμ—μ„œ 계단을 내렀와 쀑도에 μœ„μΉ˜ν•œ 골λͺ©μ—μ„œ μŒμ‹μ„ μ¦κΉλ‹ˆλ‹€. μ “κ°ˆ, 곰탕, 전볡 λ“± λ§›μžˆλŠ” 제주 νŠΉμ‚°λ¬Όμ„ 맛보며 νžλ§μ„ μ¦κΉλ‹ˆλ‹€.
* μ˜€ν›„: μ€‘λ„μ—μ„œ λ‚¨ν•˜ν•˜μ—¬ λ‚¨μ„œμͺ½ 제주둜 μ΄λ™ν•©λ‹ˆλ‹€. λ‚¨μ„œμͺ½ μ œμ£ΌλŠ” μ„¬μ˜ μžμ—° κ²½κ΄€κ³Ό 백사μž₯, μ•„λ¦„λ‹€μš΄ λ°”λ‹€ λ“± μ•„λ¦„λ‹€μš΄ 풍광이 μžˆλŠ” κ³³μž…λ‹ˆλ‹€. μˆ™μ†Œμ— μˆ™λ°•ν•©λ‹ˆλ‹€.

λ‘˜μ§Έ λ‚ :
* μ•„μΉ¨: λ‚¨μ„œμͺ½ μ œμ£Όμ—μ„œ λΆμ„œμͺ½ 제주둜 μ΄λ™ν•©λ‹ˆλ‹€. 이 지역은 ν‘Έλ₯Έ μž”λ”” ν•΄λ³€κ³Ό 투λͺ…ν•œ λ°”λ‹€κ°€ 펼쳐져 μžˆλŠ” μ•„λ¦„λ‹€μš΄ ν’κ΄‘μž…λ‹ˆλ‹€. μˆ™μ†Œμ—μ„œ 아침을 λ¨Ήκ³  λ°”λ‹€λ‘œ ν–₯ν•˜μ—¬ ν•΄λ³€μ—μ„œ νœ΄μ‹μ„ μ·¨ν•©λ‹ˆλ‹€.
* 점심: λΆμ„œμͺ½ 제주의 μ•žλ°”λ‹€μ—μ„œ μ‹œμ›ν•œ λ°”λ‹€λ₯Ό 보며 ν•œ 끼의 해산물을 맛보고 κ³„μ ˆμ— 따라 ν•΄μ‚°λ¬Ό μš”λ¦¬λ₯Ό μ¦κΉλ‹ˆλ‹€.
* μ˜€ν›„: λ‚¨μ„œμͺ½ μ œμ£Όμ—μ„œ μ΄λ™ν•˜μ—¬ μ„œμͺ½ 제주둜 μ΄λ™ν•©λ‹ˆλ‹€. μ„œμͺ½ μ œμ£ΌλŠ” 절벽과 암석 ν•΄λ³€, 수수께끼의 돌 정원 λ“± λ…νŠΉν•œ 경관이 μžˆλŠ” κ³³μž…λ‹ˆλ‹€. μ΄κ³³μ—μ„œλŠ” μ„œμͺ½ 제주의 λŒ€ν‘œμ μΈ λͺ…μ†ŒμΈ μ²­λŸ‰μ‚¬λ₯Ό λ°©λ¬Έν•˜κ³  μ„œμͺ½ 제주의 μ•„λ¦„λ‹€μš΄ 풍광을 κ°μƒν•©λ‹ˆλ‹€.
* 저녁: μ„œμͺ½ μ œμ£Όμ—μ„œ 제주 μ‹œλ‘œ μ΄λ™ν•˜μ—¬ ν•œ 끼의 제주 νŠΉμ‚°λ¬Όμ„ 맛보고 λ„μ°©ν•œ 제주 λ„μ‹¬μ—μ„œ 저녁을 μ¦κΉλ‹ˆλ‹€.
* μ•Όκ°„: 제주 μ‹œμ˜ λ„μ‹¬μ—μ„œ μ•Όκ°„ ν™œλ™μ„ 즐기며 1λ°• 2일의 여행을 λ§ˆλ¬΄λ¦¬ν•©λ‹ˆλ‹€.

μ΄λ ‡κ²Œ 제주λ₯Ό 1λ°• 2일둜 혼자 μ—¬ν–‰ν•˜λ©΄ 제주의 μ•„λ¦„λ‹€μš΄ 풍광, ν‘Έλ₯Έ μž”λ”” ν•΄λ³€, 투λͺ…ν•œ λ°”λ‹€ 등을 κ²½ν—˜ν•  수 μžˆμŠ΅λ‹ˆλ‹€. 

Evaluation

For objective model evaluation, we initially used EleutherAI's lm-evaluation-harness but obtained unsatisfactory results. Consequently, we conducted evaluations using ChatGPT, a widely used model, as described in Self-Alignment with Instruction Backtranslation and Three Ways of Using Large Language Models to Evaluate Chat .

model score average(0~5) percentage
gpt-3.5-turbo(close) 147 3.97 79.45%
naver Cue(close) 140 3.78 75.67%
clova X(close) 136 3.67 73.51%
WizardLM-13B-V1.2(open) 96 2.59 51.89%
Llama-2-7b-chat-hf(open) 67 1.81 36.21%
Llama-2-13b-chat-hf(open) 73 1.91 38.37%
nlpai-lab/kullm-polyglot-12.8b-v2(open) 70 1.89 37.83%
kfkas/Llama-2-ko-7b-Chat(open) 96 2.59 51.89%
beomi/KoAlpaca-Polyglot-12.8B(open) 100 2.70 54.05%
komt-llama2-7b-v1 (open)(ours) 117 3.16 63.24%
komt-llama2-13b-v1 (open)(ours) 129 3.48 69.72%
komt-llama-30b-v1 (open)(ours) 129 3.16 63.24%
komt-mistral-7b-v1 (open)(ours) 131 3.54 70.81%
Downloads last month
6
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.