File size: 1,748 Bytes
d919e00
 
 
 
 
 
 
 
4637d13
d919e00
 
d204a15
 
b1ef5ad
d204a15
e292511
 
799bf9e
 
fa76e5f
 
 
 
 
 
 
 
 
 
 
 
 
 
72ab9f1
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
license: other
datasets:
- euclaise/MiniCoT
- euclaise/SciCoT
- euclaise/symtune_mini
- euclaise/gsm8k_self_correct
- euclaise/mathoverflow-accepted
- euirim/goodwiki
---

A pre-finetuning finetuned version of Mistral 7B 0.1, focused on CoT reasoning tasks.

Probably decent at reasoning, but also probably not great as a chat assistant- it's designed to be finetuned further to give it a friendlier style. As such, it is intentionally somewhat undertrained.

Current benchmarks aren't great for instruct models, so I've temporarily omitted them.  I'm working on a benchmark suite for instruct models though, and will update this with scores when that is released.

Uses ChatML prompt formatting.

I reserve no rights to the model.  To the extent possible under law, I release it as public domain.  However, the datasets used have various licenses that may impact how the model may be used in your jurisdiction.
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_euclaise__Ferret-7B)

| Metric                | Value                     |
|-----------------------|---------------------------|
| Avg.                  | 47.81   |
| ARC (25-shot)         | 62.2          |
| HellaSwag (10-shot)   | 81.75    |
| MMLU (5-shot)         | 60.82         |
| TruthfulQA (0-shot)   | 40.94   |
| Winogrande (5-shot)   | 77.35   |
| GSM8K (5-shot)        | 5.76        |
| DROP (3-shot)         | 5.87         |

I'm not sure what's going on with GSM8K.  Since GSK8K (train split) data was included in the Ferret dataset, I suspect that either it is over-correcting itself or the eval is broken.