language:
- ko
- en
license: cc-by-sa-4.0
model-index:
- name: kiqu-70b
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
num_few_shot: 25
metrics:
- type: acc_norm
value: 72.1
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/kiqu-70b
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag (10-Shot)
type: hellaswag
split: validation
args:
num_few_shot: 10
metrics:
- type: acc_norm
value: 87.94
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/kiqu-70b
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU (5-Shot)
type: cais/mmlu
config: all
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 74.93
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/kiqu-70b
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA (0-shot)
type: truthful_qa
config: multiple_choice
split: validation
args:
num_few_shot: 0
metrics:
- type: mc2
value: 63.48
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/kiqu-70b
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande (5-shot)
type: winogrande
config: winogrande_xl
split: validation
args:
num_few_shot: 5
metrics:
- type: acc
value: 84.85
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/kiqu-70b
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 68.46
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/kiqu-70b
name: Open LLM Leaderboard
kiqu-70b (Arena Leaderboard)
kiqu-70b is a SFT+DPO trained model based on Miqu-70B-Alpaca-DPO using Korean datasets.
Since this model is finetune of miqu-1-70b using it on commercial purposes is at your own risk. โ leaked early version Mistral-Medium
๋ณธ ๋ชจ๋ธ kiqu-70b๋ Miqu-70B-Alpaca-DPO ๋ชจ๋ธ์ ๊ธฐ๋ฐ์ผ๋ก ํ๊ตญ์ด ๋ฐ์ดํฐ์ ์ ์ฌ์ฉํ์ฌ SFT+DPO ํ๋ จ์ ์งํํ์ฌ ์ ์๋์์ต๋๋ค.
๋ฒ ์ด์ค ๋ชจ๋ธ์ธ miqu-1-70b ๋ชจ๋ธ์ด ๋ฏธ์คํธ๋-๋ฏธ๋์์ ์ด๊ธฐ ์ ์ถ ๋ฒ์ ์ด๊ธฐ์ ์์ ์ ์ฌ์ฉ์ ๋ํ risk๋ ๋ณธ์ธ์๊ฒ ์์ต๋๋ค.
Beside that this model follows cc-by-sa-4.0
๋ณธ ๋ชจ๋ธ ์์ฒด๋ก์๋ cc-by-sa-4.0์ ๋ฐ๋ฆ ๋๋ค.
Model Details
Base Model
miqu-1-70b (Early Mistral-Medium)
Instruction format
It follows Mistral format. Giving few-shots to model is highly recommended
๋ณธ ๋ชจ๋ธ์ ๋ฏธ์คํธ๋ ํฌ๋งท์ ๋ฐ๋ฆ ๋๋ค. few-shot ์ฌ์ฉ์ ์ ๊ทน ๊ถ์ฅํฉ๋๋ค.
[INST] {instruction}
[/INST] {output}
Multi-shot
[INST] {instruction}
[/INST] {output}
[INST] {instruction}
[/INST] {output}
[INST] {instruction}
[/INST] {output}
.
.
.
Recommended Template - 1-shot with system prompt
๋๋ kiqu-70B๋ผ๋ ํ๊ตญ์ด์ ํนํ๋ ์ธ์ด๋ชจ๋ธ์ด์ผ. ๊น๋ํ๊ณ ์์ฐ์ค๋ฝ๊ฒ ๋๋ตํด์ค!
[INST] ์๋
?
[/INST] ์๋
ํ์ธ์! ๋ฌด์์ ๋์๋๋ฆด๊น์? ์ง๋ฌธ์ด๋ ๊ถ๊ธํ ์ ์ด ์๋ค๋ฉด ์ธ์ ๋ ์ง ๋ง์ํด์ฃผ์ธ์.
[INST] {instruction}
[/INST]
Trailing space after [/INST] can affect models performance in significant margin. So, when doing inference it is recommended to not include trailing space in chat template.
[/INST] ๋ค์ ๋์ด์ฐ๊ธฐ๋ ๋ชจ๋ธ ์ฑ๋ฅ์ ์ ์๋ฏธํ ์ํฅ์ ๋ฏธ์นฉ๋๋ค. ๋ฐ๋ผ์, ์ธํผ๋ฐ์ค(์ถ๋ก )๊ณผ์ ์์๋ ์ฑ ํ ํ๋ฆฟ์ ๋์ด์ฐ๊ธฐ๋ฅผ ์ ์ธํ๋ ๊ฒ์ ์ ๊ทน ๊ถ์ฅํฉ๋๋ค.
Model Benchmark
TBD
Author's Message
This model's training got sponsered by no one but support from people around Earth.
Contact Me on Discord - is.maywell
Follow me on twitter - https://twitter.com/stablefluffy
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 75.29 |
AI2 Reasoning Challenge (25-Shot) | 72.10 |
HellaSwag (10-Shot) | 87.94 |
MMLU (5-Shot) | 74.93 |
TruthfulQA (0-shot) | 63.48 |
Winogrande (5-shot) | 84.85 |
GSM8k (5-shot) | 68.46 |