kiqu-70b / README.md

Adding Evaluation Results

cf69eca verified 8 months ago

5.6 kB

	---
	language:
	- ko
	- en
	license: cc-by-sa-4.0
	model-index:
	- name: kiqu-70b
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 72.1
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/kiqu-70b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 87.94
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/kiqu-70b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 74.93
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/kiqu-70b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 63.48
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/kiqu-70b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 84.85
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/kiqu-70b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 68.46
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/kiqu-70b
	name: Open LLM Leaderboard
	---

	# kiqu-70b [(Arena Leaderboard)](https://huggingface.co/spaces/instructkr/ko-chatbot-arena-leaderboard)
	<img src="./kiqu.webp" alt="kiqu-70B" width="390"/>

	kiqu-70b is a SFT+DPO trained model based on Miqu-70B-Alpaca-DPO using Korean datasets.

	Since this model is finetune of miqu-1-70b using it on commercial purposes is at your own risk. — leaked early version Mistral-Medium

	본 모델 kiqu-70b는 Miqu-70B-Alpaca-DPO 모델을 기반으로 한국어 데이터셋을 사용하여 SFT+DPO 훈련을 진행하여 제작되었습니다.

	베이스 모델인 miqu-1-70b 모델이 미스트랄-미디움의 초기 유출 버전이기에 상업적 사용에 대한 risk는 본인에게 있습니다.

	Beside that this model follows cc-by-sa-4.0

	본 모델 자체로서는 cc-by-sa-4.0을 따릅니다.

	# Model Details

	Base Model
	miqu-1-70b (Early Mistral-Medium)

	Instruction format

	It follows Mistral format.
	Giving few-shots to model is highly recommended

	본 모델은 미스트랄 포맷을 따릅니다.
	few-shot 사용을 적극 권장합니다.
	```
	[INST] {instruction}
	[/INST] {output}
	```

	Multi-shot
	```
	[INST] {instruction}
	[/INST] {output}

	[INST] {instruction}
	[/INST] {output}

	[INST] {instruction}
	[/INST] {output}
	.
	.
	.
	```

	Recommended Template - 1-shot with system prompt
	```
	너는 kiqu-70B라는 한국어에 특화된 언어모델이야. 깔끔하고 자연스럽게 대답해줘!
	[INST] 안녕?
	[/INST] 안녕하세요! 무엇을 도와드릴까요? 질문이나 궁금한 점이 있다면 언제든지 말씀해주세요.

	[INST] {instruction}
	[/INST]
	```

	Trailing space after [/INST] can affect models performance in significant margin. So, when doing inference it is recommended to not include trailing space in chat template.

	[/INST] 뒤에 띄어쓰기는 모델 성능에 유의미한 영향을 미칩니다. 따라서, 인퍼런스(추론)과정에서는 챗 템플릿에 띄어쓰기를 제외하는 것을 적극 권장합니다.

	# Model Benchmark
	TBD


	# Author's Message

	This model's training got sponsered by no one but support from people around Earth.

	[Support Me](https://www.buymeacoffee.com/mwell)

	[Discord Server](https://discord.gg/MrBt3PXdXc)

	Contact Me on Discord - is.maywell

	Follow me on twitter - https://twitter.com/stablefluffy
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_maywell__kiqu-70b)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|75.29\|
	\|AI2 Reasoning Challenge (25-Shot)\|72.10\|
	\|HellaSwag (10-Shot) \|87.94\|
	\|MMLU (5-Shot) \|74.93\|
	\|TruthfulQA (0-shot) \|63.48\|
	\|Winogrande (5-shot) \|84.85\|
	\|GSM8k (5-shot) \|68.46\|