leaderboard-pr-bot's picture
Adding Evaluation Results
f009d33
|
raw
history blame
887 Bytes
metadata
license: cc-by-nc-4.0
datasets:
  - KnutJaegersberg/WizardLM_evol_instruct_V2_196k_instruct_format

Prompt example:

### Instruction: 
How do you fine tune a large language model? 
### Response:

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 26.63
ARC (25-shot) 26.37
HellaSwag (10-shot) 38.39
MMLU (5-shot) 23.6
TruthfulQA (0-shot) 41.19
Winogrande (5-shot) 52.33
GSM8K (5-shot) 0.0
DROP (3-shot) 4.54