wizard-mistral-v0.1 / README.md
leaderboard-pr-bot's picture
Adding Evaluation Results
eef576b
|
raw
history blame
2.96 kB
metadata
license: apache-2.0
pipeline_tag: text-generation

DALL·E 2023-10-11 16.30.48 - Photo of a dark forest with tall, ancient trees. Their branches and leaves form intricate patterns, revealing a portal of swirling magical energy in t.png

Overview

This model is a finetune of Mistral7b on cleaned data from WizardLM Evol Instruct v2 196k. most instances of RLHF were removed from the dataset, so this should be treated as a unscensored model although it is not fully unscensored.

Benchmarks

Wizard Mistral was only finetuned on >200k rows of evol instruct multi turn data, however it achieves competetive results when evaluated. below is wizard mistrals benchmark scores compared to the most popular mistral7b finetunes.

Model Average ARC HellaSwag MMLU TruthfulQA
unaidedelf87777/wizard-mistral-v0.1 64.18 61.77 83.51 63.99 47.46
Undi95/Mistral-11B-TestBench11 67.21 64.42 83.93 63.82 56.68
Undi95/Mistral-11B-TestBench9 67.13 64.08 84.24 64 56.19
ehartford/dolphin-2.1-mistral-7b 67.06 64.42 84.92 63.32 55.56
ehartford/dolphin-2.1-mistral-7b (Duplicate?) 67 63.99 85 63.44 55.57
Undi95/Mistral-11B-TestBench10 66.99 64.25 84.24 63.9 55.57
teknuim/CollectiveCognition-v1.1-Mistral-7B 66.56 62.12 84.17 62.35 57.62
Weyaxi/SlimOpenOrca-Mistral-7B 66.54 62.97 83.49 62.3 57.39
teknuim/CollectiveCognition-v1-Mistral-7B 66.28 62.37 85.5 62.76 54.48
ehartford/samantha-1-2-mistral-7b 65.87 64.08 85.08 63.91 50.4
Open-Orca/Mistral-7B-SlimOrca 65.85 62.54 83.86 62.77 54.23
Open-Orca/Mistral-7B-OpenOrca 65.84 64.08 83.99 62.24 53.05

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 51.58
ARC (25-shot) 61.77
HellaSwag (10-shot) 83.51
MMLU (5-shot) 63.99
TruthfulQA (0-shot) 47.46
Winogrande (5-shot) 78.3
GSM8K (5-shot) 19.03
DROP (3-shot) 7.01