README.md · sequelbox/Llama3.1-8B-MOTH at refs/pr/2

Llama3.1-8B-MOTH / README.md

leaderboard-pr-bot

Adding Evaluation Results

b606698 verified 5 months ago

preview code

raw

history blame

4.11 kB

	---
	language:
	- en
	license: other
	tags:
	- supernova
	- moth
	- llama
	- llama-3.1
	- llama-3.1-instruct
	- llama-3.1-instruct-8b
	- llama-3
	- llama-3-instruct
	- llama-3-instruct-8b
	- 8b
	- general
	- conversational
	- chat
	- instruct
	base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
	datasets:
	- sequelbox/Supernova
	pipeline_tag: text-generation
	model_type: llama
	model-index:
	- name: Llama3.1-8B-MOTH
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: IFEval (0-Shot)
	type: HuggingFaceH4/ifeval
	args:
	num_few_shot: 0
	metrics:
	- type: inst_level_strict_acc and prompt_level_strict_acc
	value: 52.08
	name: strict accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sequelbox/Llama3.1-8B-MOTH
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BBH (3-Shot)
	type: BBH
	args:
	num_few_shot: 3
	metrics:
	- type: acc_norm
	value: 26.45
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sequelbox/Llama3.1-8B-MOTH
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MATH Lvl 5 (4-Shot)
	type: hendrycks/competition_math
	args:
	num_few_shot: 4
	metrics:
	- type: exact_match
	value: 11.86
	name: exact match
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sequelbox/Llama3.1-8B-MOTH
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GPQA (0-shot)
	type: Idavidrein/gpqa
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 2.57
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sequelbox/Llama3.1-8B-MOTH
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MuSR (0-shot)
	type: TAUR-Lab/MuSR
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 3.79
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sequelbox/Llama3.1-8B-MOTH
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU-PRO (5-shot)
	type: TIGER-Lab/MMLU-Pro
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 25.48
	name: accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sequelbox/Llama3.1-8B-MOTH
	name: Open LLM Leaderboard
	---

	- MOTH is a general chat AI.
	- MOTH is finetuned on [high quality synthetic data.](https://huggingface.co/datasets/sequelbox/Supernova)
	- MOTH is trained on a variety of skills and specialties.
	- This version of MOTH is trained on the [Llama 3.1 Instruct format.](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)
	- MOTH is also available for [Gemma 2;](https://huggingface.co/sequelbox/gemma-2-9B-MOTH) more MOTH finetunes for other models to follow.
	- MOTH has not been manually tested and uses automatically generated datasets.
	- Do as you will.



	(uses llama 3.1 license available at https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_sequelbox__Llama3.1-8B-MOTH)

	\| Metric \|Value\|
	\|-------------------\|----:\|
	\|Avg. \|20.37\|
	\|IFEval (0-Shot) \|52.08\|
	\|BBH (3-Shot) \|26.45\|
	\|MATH Lvl 5 (4-Shot)\|11.86\|
	\|GPQA (0-shot) \| 2.57\|
	\|MuSR (0-shot) \| 3.79\|
	\|MMLU-PRO (5-shot) \|25.48\|