metadata

license: apache-2.0
pipeline_tag: text-generation

Overview

This model is a finetune of Mistral7b on cleaned data from WizardLM Evol Instruct v2 196k. most instances of RLHF were removed from the dataset, so this should be treated as a unscensored model although it is not fully unscensored.

Benchmarks

Wizard Mistral was only finetuned on >200k rows of evol instruct multi turn data, however it achieves competetive results when evaluated. below is wizard mistrals benchmark scores compared to the most popular mistral7b finetunes.

Model	Average	ARC	HellaSwag	MMLU	TruthfulQA
unaidedelf87777/wizard-mistral-v0.1	64.18	61.77	83.51	63.99	47.46
Undi95/Mistral-11B-TestBench11	67.21	64.42	83.93	63.82	56.68
Undi95/Mistral-11B-TestBench9	67.13	64.08	84.24	64	56.19
ehartford/dolphin-2.1-mistral-7b	67.06	64.42	84.92	63.32	55.56
ehartford/dolphin-2.1-mistral-7b (Duplicate?)	67	63.99	85	63.44	55.57
Undi95/Mistral-11B-TestBench10	66.99	64.25	84.24	63.9	55.57
teknuim/CollectiveCognition-v1.1-Mistral-7B	66.56	62.12	84.17	62.35	57.62
Weyaxi/SlimOpenOrca-Mistral-7B	66.54	62.97	83.49	62.3	57.39
teknuim/CollectiveCognition-v1-Mistral-7B	66.28	62.37	85.5	62.76	54.48
ehartford/samantha-1-2-mistral-7b	65.87	64.08	85.08	63.91	50.4
Open-Orca/Mistral-7B-SlimOrca	65.85	62.54	83.86	62.77	54.23
Open-Orca/Mistral-7B-OpenOrca	65.84	64.08	83.99	62.24	53.05

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	51.58
ARC (25-shot)	61.77
HellaSwag (10-shot)	83.51
MMLU (5-shot)	63.99
TruthfulQA (0-shot)	47.46
Winogrande (5-shot)	78.3
GSM8K (5-shot)	19.03
DROP (3-shot)	7.01