mkurman commited on
Commit
a81b50a
·
verified ·
1 Parent(s): f230161

results update

Browse files
Files changed (1) hide show
  1. README.md +14 -14
README.md CHANGED
@@ -15,7 +15,7 @@ model-index:
15
  num_few_shot: 0
16
  metrics:
17
  - type: inst_level_strict_acc and prompt_level_strict_acc
18
- value: 53.89
19
  name: strict accuracy
20
  source:
21
  url: >-
@@ -31,7 +31,7 @@ model-index:
31
  num_few_shot: 3
32
  metrics:
33
  - type: acc_norm
34
- value: 6.46
35
  name: normalized accuracy
36
  source:
37
  url: >-
@@ -47,7 +47,7 @@ model-index:
47
  num_few_shot: 4
48
  metrics:
49
  - type: exact_match
50
- value: 3.25
51
  name: exact match
52
  source:
53
  url: >-
@@ -63,7 +63,7 @@ model-index:
63
  num_few_shot: 0
64
  metrics:
65
  - type: acc_norm
66
- value: 0
67
  name: acc_norm
68
  source:
69
  url: >-
@@ -79,7 +79,7 @@ model-index:
79
  num_few_shot: 0
80
  metrics:
81
  - type: acc_norm
82
- value: 2.38
83
  name: acc_norm
84
  source:
85
  url: >-
@@ -97,7 +97,7 @@ model-index:
97
  num_few_shot: 5
98
  metrics:
99
  - type: acc
100
- value: 5.91
101
  name: accuracy
102
  source:
103
  url: >-
@@ -153,14 +153,14 @@ As the model is still in training, performance and capabilities may vary. Users
153
  The Model is designed to be used as a smart assistant but not as a knowledge source within your applications, systems, or environments. It is not intended to provide 100% accurate answers, especially in scenarios where high precision and accuracy are crucial.
154
 
155
  # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
156
- Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_meditsolutions__Llama-3.2-SUN-2.4B-v1.0.0)
157
 
158
  | Metric |Value|
159
  |-------------------|----:|
160
- |Avg. |11.98|
161
- |IFEval (0-Shot) |53.89|
162
- |BBH (3-Shot) | 6.46|
163
- |MATH Lvl 5 (4-Shot)| 3.25|
164
- |GPQA (0-shot) | 0.00|
165
- |MuSR (0-shot) | 2.38|
166
- |MMLU-PRO (5-shot) | 5.91|
 
15
  num_few_shot: 0
16
  metrics:
17
  - type: inst_level_strict_acc and prompt_level_strict_acc
18
+ value: 56.37
19
  name: strict accuracy
20
  source:
21
  url: >-
 
31
  num_few_shot: 3
32
  metrics:
33
  - type: acc_norm
34
+ value: 7.21
35
  name: normalized accuracy
36
  source:
37
  url: >-
 
47
  num_few_shot: 4
48
  metrics:
49
  - type: exact_match
50
+ value: 4.83
51
  name: exact match
52
  source:
53
  url: >-
 
63
  num_few_shot: 0
64
  metrics:
65
  - type: acc_norm
66
+ value: 1.01
67
  name: acc_norm
68
  source:
69
  url: >-
 
79
  num_few_shot: 0
80
  metrics:
81
  - type: acc_norm
82
+ value: 3.02
83
  name: acc_norm
84
  source:
85
  url: >-
 
97
  num_few_shot: 5
98
  metrics:
99
  - type: acc
100
+ value: 6.03
101
  name: accuracy
102
  source:
103
  url: >-
 
153
  The Model is designed to be used as a smart assistant but not as a knowledge source within your applications, systems, or environments. It is not intended to provide 100% accurate answers, especially in scenarios where high precision and accuracy are crucial.
154
 
155
  # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
156
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/meditsolutions__Llama-3.2-SUN-2.4B-v1.0.0-details)
157
 
158
  | Metric |Value|
159
  |-------------------|----:|
160
+ |Avg. |13.08|
161
+ |IFEval (0-Shot) |56.37|
162
+ |BBH (3-Shot) | 7.21|
163
+ |MATH Lvl 5 (4-Shot)| 4.83|
164
+ |GPQA (0-shot) | 1.01|
165
+ |MuSR (0-shot) | 3.02|
166
+ |MMLU-PRO (5-shot) | 6.03|