metadata

language:
  - en
license: apache-2.0
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - mistral
  - trl
  - sft
base_model: alpindale/Mistral-7B-v0.2

Mistral-7B-v0.2-OpenHermes

SFT Training Params:

Learning Rate: 2e-4
Batch Size: 8
Gradient Accumulation steps: 4
Dataset: teknium/OpenHermes-2.5 (200k split contains a slight bias towards rp and theory of life)
r: 16
Lora Alpha: 16

Training Time: 13 hours on A100

Prompt Template: ChatML

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What's the capital of France?<|im_end|>
<|im_start|>assistant
Paris.

Quantizations

GGUF

AWQ

Evaluations

Thanks to Maxime Labonne for the evalution:

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	Average
Mistral-7B-v0.2-OpenHermes	35.57	67.15	42.06	36.27	45.26

AGIEval

Task	Version	Metric	Value		Stderr
agieval_aqua_rat	0	acc	24.02	±	2.69
		acc_norm	21.65	±	2.59
agieval_logiqa_en	0	acc	28.11	±	1.76
		acc_norm	34.56	±	1.87
agieval_lsat_ar	0	acc	27.83	±	2.96
		acc_norm	23.48	±	2.80
agieval_lsat_lr	0	acc	33.73	±	2.10
		acc_norm	33.14	±	2.09
agieval_lsat_rc	0	acc	48.70	±	3.05
		acc_norm	39.78	±	2.99
agieval_sat_en	0	acc	67.48	±	3.27
		acc_norm	64.56	±	3.34
agieval_sat_en_without_passage	0	acc	38.83	±	3.40
		acc_norm	37.38	±	3.38
agieval_sat_math	0	acc	32.27	±	3.16
		acc_norm	30.00	±	3.10

Average: 35.57%

GPT4All

Task	Version	Metric	Value		Stderr
arc_challenge	0	acc	45.05	±	1.45
		acc_norm	48.46	±	1.46
arc_easy	0	acc	77.27	±	0.86
		acc_norm	73.78	±	0.90
boolq	1	acc	68.62	±	0.81
hellaswag	0	acc	59.63	±	0.49
		acc_norm	79.66	±	0.40
openbookqa	0	acc	31.40	±	2.08
		acc_norm	43.40	±	2.22
piqa	0	acc	80.25	±	0.93
		acc_norm	82.05	±	0.90
winogrande	0	acc	74.11	±	1.23

Average: 67.15%

TruthfulQA

Task	Version	Metric	Value		Stderr
truthfulqa_mc	1	mc1	27.54	±	1.56
		mc2	42.06	±	1.44

Average: 42.06%

Bigbench

Task	Version	Metric	Value		Stderr
bigbench_causal_judgement	0	multiple_choice_grade	56.32	±	3.61
bigbench_date_understanding	0	multiple_choice_grade	66.40	±	2.46
bigbench_disambiguation_qa	0	multiple_choice_grade	45.74	±	3.11
bigbench_geometric_shapes	0	multiple_choice_grade	10.58	±	1.63
		exact_str_match	0.00	±	0.00
bigbench_logical_deduction_five_objects	0	multiple_choice_grade	25.00	±	1.94
bigbench_logical_deduction_seven_objects	0	multiple_choice_grade	17.71	±	1.44
bigbench_logical_deduction_three_objects	0	multiple_choice_grade	37.33	±	2.80
bigbench_movie_recommendation	0	multiple_choice_grade	29.40	±	2.04
bigbench_navigate	0	multiple_choice_grade	50.00	±	1.58
bigbench_reasoning_about_colored_objects	0	multiple_choice_grade	42.50	±	1.11
bigbench_ruin_names	0	multiple_choice_grade	39.06	±	2.31
bigbench_salient_translation_error_detection	0	multiple_choice_grade	12.93	±	1.06
bigbench_snarks	0	multiple_choice_grade	69.06	±	3.45
bigbench_sports_understanding	0	multiple_choice_grade	49.80	±	1.59
bigbench_temporal_sequences	0	multiple_choice_grade	26.50	±	1.40
bigbench_tracking_shuffled_objects_five_objects	0	multiple_choice_grade	21.20	±	1.16
bigbench_tracking_shuffled_objects_seven_objects	0	multiple_choice_grade	16.06	±	0.88
bigbench_tracking_shuffled_objects_three_objects	0	multiple_choice_grade	37.33	±	2.80

Average: 36.27%

Average score: 45.26%

Elapsed time: 01:49:22

Developed by: macadeliccc
License: apache-2.0
Finetuned from model : alpindale/Mistral-7B-v0.2

This mistral model was trained 2x faster with Unsloth and Huggingface's TRL library.