metadata
license: apache-2.0
pipeline_tag: text-generation
Overview
This model is a finetune of Mistral7b on cleaned data from WizardLM Evol Instruct v2 196k. most instances of RLHF were removed from the dataset, so this should be treated as a unscensored model although it is not fully unscensored.
Benchmarks
Wizard Mistral was only finetuned on >200k rows of evol instruct multi turn data, however it achieves competetive results when evaluated. below is wizard mistrals benchmark scores compared to the most popular mistral7b finetunes.
Model | Average | ARC | HellaSwag | MMLU | TruthfulQA |
---|---|---|---|---|---|
unaidedelf87777/wizard-mistral-v0.1 | 64.18 | 61.77 | 83.51 | 63.99 | 47.46 |
Undi95/Mistral-11B-TestBench11 | 67.21 | 64.42 | 83.93 | 63.82 | 56.68 |
Undi95/Mistral-11B-TestBench9 | 67.13 | 64.08 | 84.24 | 64 | 56.19 |
ehartford/dolphin-2.1-mistral-7b | 67.06 | 64.42 | 84.92 | 63.32 | 55.56 |
ehartford/dolphin-2.1-mistral-7b (Duplicate?) | 67 | 63.99 | 85 | 63.44 | 55.57 |
Undi95/Mistral-11B-TestBench10 | 66.99 | 64.25 | 84.24 | 63.9 | 55.57 |
teknuim/CollectiveCognition-v1.1-Mistral-7B | 66.56 | 62.12 | 84.17 | 62.35 | 57.62 |
Weyaxi/SlimOpenOrca-Mistral-7B | 66.54 | 62.97 | 83.49 | 62.3 | 57.39 |
teknuim/CollectiveCognition-v1-Mistral-7B | 66.28 | 62.37 | 85.5 | 62.76 | 54.48 |
ehartford/samantha-1-2-mistral-7b | 65.87 | 64.08 | 85.08 | 63.91 | 50.4 |
Open-Orca/Mistral-7B-SlimOrca | 65.85 | 62.54 | 83.86 | 62.77 | 54.23 |
Open-Orca/Mistral-7B-OpenOrca | 65.84 | 64.08 | 83.99 | 62.24 | 53.05 |
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 51.58 |
ARC (25-shot) | 61.77 |
HellaSwag (10-shot) | 83.51 |
MMLU (5-shot) | 63.99 |
TruthfulQA (0-shot) | 47.46 |
Winogrande (5-shot) | 78.3 |
GSM8K (5-shot) | 19.03 |
DROP (3-shot) | 7.01 |