speechless-code-mistral-orca-7b-v1.0

Use the following dataset to fine-tune Open-Orca/Mistral-7B-OpenOrca in order to improve the model's reasoning and planning abilities.

Total 201,981 samples.

  • jondurbin/airoboros-2.2: Filter categories related to coding, reasoning and planning. 23,462 samples.
  • Open-Orca/OpenOrca: Filter the 'cot' category in 1M GPT4 dataset. 74,440 samples.
  • garage-bAInd/Open-Platypus: 100%, 24,926 samples.
  • WizardLM/WizardLM_evol_instruct_V2_196k: Coding coversation part. 30,185 samples
  • TokenBender/python_eval_instruct_51k: “python” in output .40,309 samples
  • Spider: 8,659 samples

Code: https://github.com/uukuguy/speechless

HumanEval

Metric Value
humaneval-python 47.561

Big Code Models Leaderboard

CodeLlama-34B-Python: 53.29

CodeLlama-34B-Instruct: 50.79

CodeLlama-13B-Instruct: 50.6

CodeLlama-34B: 45.11

CodeLlama-13B-Python: 42.89

CodeLlama-13B: 35.07

lm-evaluation-harness

Open LLM Leaderboard

Metric Value
ARC 59.64
HellaSwag 82.25
MMLU 61.33
TruthfulQA 48.45
Average 62.92

Parameters

lr 2e-4
lr_scheduler_type cosine
weight_decay 0.0
optim paged_adamw_8bit
flash_attention True
rerope False
max_new_tokens 4096
num_train_epochs 2
bits 4
lora_r 64
lora_alpha 16
lora_dropout 0.05
double_quant True
quant_type nf4
dataset_format airoboros
mini_batch_size 2
grandient_accumulation_steps 32
bf16 True

A100-40G x 4

epoch 2.0
etrain_loss 0.4708
etrain_runtime 12:12:53.64
etrain_samples_per_second 9.002
etrain_steps_per_second 0.07
eeval_loss 0.4851
eeval_runtime 0:00:10.31
eeval_samples_per_second 19.385
eeval_steps_per_second 4.846

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 55.33
ARC (25-shot) 59.64
HellaSwag (10-shot) 82.25
MMLU (5-shot) 61.33
TruthfulQA (0-shot) 48.45
Winogrande (5-shot) 77.51
GSM8K (5-shot) 8.26
DROP (3-shot) 49.89
Downloads last month
1,099
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train uukuguy/speechless-code-mistral-orca-7b-v1.0

Collection including uukuguy/speechless-code-mistral-orca-7b-v1.0

Evaluation results