macadeliccc's picture
Update README.md
8572d46 verified
|
raw
history blame
6.17 kB
metadata
language:
  - en
license: apache-2.0
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - mistral
  - trl
  - sft
base_model: alpindale/Mistral-7B-v0.2

Mistral-7B-v0.2-OpenHermes

image/webp

SFT Training Params:

  • Learning Rate: 2e-4
  • Batch Size: 8
  • Gradient Accumulation steps: 4
  • Dataset: teknium/OpenHermes-2.5 (200k split contains a slight bias towards rp and theory of life)
  • r: 16
  • Lora Alpha: 16

Training Time: 13 hours on A100

Prompt Template: ChatML

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What's the capital of France?<|im_end|>
<|im_start|>assistant
Paris.

Quantizations

GGUF

AWQ

Evaluations

Thanks to Maxime Labonne for the evalution:

Model AGIEval GPT4All TruthfulQA Bigbench Average
Mistral-7B-v0.2-OpenHermes 35.57 67.15 42.06 36.27 45.26

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 24.02 ± 2.69
acc_norm 21.65 ± 2.59
agieval_logiqa_en 0 acc 28.11 ± 1.76
acc_norm 34.56 ± 1.87
agieval_lsat_ar 0 acc 27.83 ± 2.96
acc_norm 23.48 ± 2.80
agieval_lsat_lr 0 acc 33.73 ± 2.10
acc_norm 33.14 ± 2.09
agieval_lsat_rc 0 acc 48.70 ± 3.05
acc_norm 39.78 ± 2.99
agieval_sat_en 0 acc 67.48 ± 3.27
acc_norm 64.56 ± 3.34
agieval_sat_en_without_passage 0 acc 38.83 ± 3.40
acc_norm 37.38 ± 3.38
agieval_sat_math 0 acc 32.27 ± 3.16
acc_norm 30.00 ± 3.10

Average: 35.57%

GPT4All

Task Version Metric Value Stderr
arc_challenge 0 acc 45.05 ± 1.45
acc_norm 48.46 ± 1.46
arc_easy 0 acc 77.27 ± 0.86
acc_norm 73.78 ± 0.90
boolq 1 acc 68.62 ± 0.81
hellaswag 0 acc 59.63 ± 0.49
acc_norm 79.66 ± 0.40
openbookqa 0 acc 31.40 ± 2.08
acc_norm 43.40 ± 2.22
piqa 0 acc 80.25 ± 0.93
acc_norm 82.05 ± 0.90
winogrande 0 acc 74.11 ± 1.23

Average: 67.15%

TruthfulQA

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 27.54 ± 1.56
mc2 42.06 ± 1.44

Average: 42.06%

Bigbench

Task Version Metric Value Stderr
bigbench_causal_judgement 0 multiple_choice_grade 56.32 ± 3.61
bigbench_date_understanding 0 multiple_choice_grade 66.40 ± 2.46
bigbench_disambiguation_qa 0 multiple_choice_grade 45.74 ± 3.11
bigbench_geometric_shapes 0 multiple_choice_grade 10.58 ± 1.63
exact_str_match 0.00 ± 0.00
bigbench_logical_deduction_five_objects 0 multiple_choice_grade 25.00 ± 1.94
bigbench_logical_deduction_seven_objects 0 multiple_choice_grade 17.71 ± 1.44
bigbench_logical_deduction_three_objects 0 multiple_choice_grade 37.33 ± 2.80
bigbench_movie_recommendation 0 multiple_choice_grade 29.40 ± 2.04
bigbench_navigate 0 multiple_choice_grade 50.00 ± 1.58
bigbench_reasoning_about_colored_objects 0 multiple_choice_grade 42.50 ± 1.11
bigbench_ruin_names 0 multiple_choice_grade 39.06 ± 2.31
bigbench_salient_translation_error_detection 0 multiple_choice_grade 12.93 ± 1.06
bigbench_snarks 0 multiple_choice_grade 69.06 ± 3.45
bigbench_sports_understanding 0 multiple_choice_grade 49.80 ± 1.59
bigbench_temporal_sequences 0 multiple_choice_grade 26.50 ± 1.40
bigbench_tracking_shuffled_objects_five_objects 0 multiple_choice_grade 21.20 ± 1.16
bigbench_tracking_shuffled_objects_seven_objects 0 multiple_choice_grade 16.06 ± 0.88
bigbench_tracking_shuffled_objects_three_objects 0 multiple_choice_grade 37.33 ± 2.80

Average: 36.27%

Average score: 45.26%

Elapsed time: 01:49:22

  • Developed by: macadeliccc
  • License: apache-2.0
  • Finetuned from model : alpindale/Mistral-7B-v0.2

This mistral model was trained 2x faster with Unsloth and Huggingface's TRL library.