results update
Browse files
README.md
CHANGED
@@ -15,7 +15,7 @@ model-index:
|
|
15 |
num_few_shot: 0
|
16 |
metrics:
|
17 |
- type: inst_level_strict_acc and prompt_level_strict_acc
|
18 |
-
value:
|
19 |
name: strict accuracy
|
20 |
source:
|
21 |
url: >-
|
@@ -31,7 +31,7 @@ model-index:
|
|
31 |
num_few_shot: 3
|
32 |
metrics:
|
33 |
- type: acc_norm
|
34 |
-
value:
|
35 |
name: normalized accuracy
|
36 |
source:
|
37 |
url: >-
|
@@ -47,7 +47,7 @@ model-index:
|
|
47 |
num_few_shot: 4
|
48 |
metrics:
|
49 |
- type: exact_match
|
50 |
-
value:
|
51 |
name: exact match
|
52 |
source:
|
53 |
url: >-
|
@@ -63,7 +63,7 @@ model-index:
|
|
63 |
num_few_shot: 0
|
64 |
metrics:
|
65 |
- type: acc_norm
|
66 |
-
value:
|
67 |
name: acc_norm
|
68 |
source:
|
69 |
url: >-
|
@@ -79,7 +79,7 @@ model-index:
|
|
79 |
num_few_shot: 0
|
80 |
metrics:
|
81 |
- type: acc_norm
|
82 |
-
value:
|
83 |
name: acc_norm
|
84 |
source:
|
85 |
url: >-
|
@@ -97,7 +97,7 @@ model-index:
|
|
97 |
num_few_shot: 5
|
98 |
metrics:
|
99 |
- type: acc
|
100 |
-
value:
|
101 |
name: accuracy
|
102 |
source:
|
103 |
url: >-
|
@@ -153,14 +153,14 @@ As the model is still in training, performance and capabilities may vary. Users
|
|
153 |
The Model is designed to be used as a smart assistant but not as a knowledge source within your applications, systems, or environments. It is not intended to provide 100% accurate answers, especially in scenarios where high precision and accuracy are crucial.
|
154 |
|
155 |
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
|
156 |
-
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/
|
157 |
|
158 |
| Metric |Value|
|
159 |
|-------------------|----:|
|
160 |
-
|Avg. |
|
161 |
-
|IFEval (0-Shot) |
|
162 |
-
|BBH (3-Shot) |
|
163 |
-
|MATH Lvl 5 (4-Shot)|
|
164 |
-
|GPQA (0-shot) |
|
165 |
-
|MuSR (0-shot) |
|
166 |
-
|MMLU-PRO (5-shot) |
|
|
|
15 |
num_few_shot: 0
|
16 |
metrics:
|
17 |
- type: inst_level_strict_acc and prompt_level_strict_acc
|
18 |
+
value: 56.37
|
19 |
name: strict accuracy
|
20 |
source:
|
21 |
url: >-
|
|
|
31 |
num_few_shot: 3
|
32 |
metrics:
|
33 |
- type: acc_norm
|
34 |
+
value: 7.21
|
35 |
name: normalized accuracy
|
36 |
source:
|
37 |
url: >-
|
|
|
47 |
num_few_shot: 4
|
48 |
metrics:
|
49 |
- type: exact_match
|
50 |
+
value: 4.83
|
51 |
name: exact match
|
52 |
source:
|
53 |
url: >-
|
|
|
63 |
num_few_shot: 0
|
64 |
metrics:
|
65 |
- type: acc_norm
|
66 |
+
value: 1.01
|
67 |
name: acc_norm
|
68 |
source:
|
69 |
url: >-
|
|
|
79 |
num_few_shot: 0
|
80 |
metrics:
|
81 |
- type: acc_norm
|
82 |
+
value: 3.02
|
83 |
name: acc_norm
|
84 |
source:
|
85 |
url: >-
|
|
|
97 |
num_few_shot: 5
|
98 |
metrics:
|
99 |
- type: acc
|
100 |
+
value: 6.03
|
101 |
name: accuracy
|
102 |
source:
|
103 |
url: >-
|
|
|
153 |
The Model is designed to be used as a smart assistant but not as a knowledge source within your applications, systems, or environments. It is not intended to provide 100% accurate answers, especially in scenarios where high precision and accuracy are crucial.
|
154 |
|
155 |
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
|
156 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/meditsolutions__Llama-3.2-SUN-2.4B-v1.0.0-details)
|
157 |
|
158 |
| Metric |Value|
|
159 |
|-------------------|----:|
|
160 |
+
|Avg. |13.08|
|
161 |
+
|IFEval (0-Shot) |56.37|
|
162 |
+
|BBH (3-Shot) | 7.21|
|
163 |
+
|MATH Lvl 5 (4-Shot)| 4.83|
|
164 |
+
|GPQA (0-shot) | 1.01|
|
165 |
+
|MuSR (0-shot) | 3.02|
|
166 |
+
|MMLU-PRO (5-shot) | 6.03|
|