Slerpeno / README.md
leaderboard-pr-bot's picture
Adding Evaluation Results
3e2cba7
|
raw
history blame
752 Bytes
metadata
license: cc-by-4.0

Uses the same models Stheno does but merging using SLERP method instead 13B model

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 52.5
ARC (25-shot) 61.69
HellaSwag (10-shot) 84.1
MMLU (5-shot) 56.77
TruthfulQA (0-shot) 48.05
Winogrande (5-shot) 76.4
GSM8K (5-shot) 12.51
DROP (3-shot) 28.0