MaziyarPanahi leaderboard-pr-bot commited on
Commit
15fc8b0
1 Parent(s): 2c841db

Adding Evaluation Results (#4)

Browse files

- Adding Evaluation Results (dbf7b24deb10527df449eaa41e2c5819c9cbd463)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +116 -8
README.md CHANGED
@@ -2,7 +2,8 @@
2
  language:
3
  - fr
4
  - en
5
- pipeline_tag: text-generation
 
6
  tags:
7
  - chat
8
  - llama
@@ -11,15 +12,109 @@ tags:
11
  - french
12
  - legal
13
  - loi
14
- library_name: transformers
15
- inference: false
16
- model_creator: MaziyarPanahi
17
- quantized_by: MaziyarPanahi
18
  base_model: meta-llama/Llama-3.2-3B
19
- model_name: calme-3.1-llamaloi-3b
20
  datasets:
21
  - MaziyarPanahi/calme-legalkit-v0.2
22
- license: llama3.2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  ---
24
 
25
  <img src="./calme_3.png" alt="Calme-3 Models" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
@@ -86,4 +181,17 @@ model = AutoModelForCausalLM.from_pretrained("MaziyarPanahi/calme-3.1-llamaloi-3
86
 
87
  # Ethical Considerations
88
 
89
- As with any large language model, users should be aware of potential biases and limitations. We recommend implementing appropriate safeguards and human oversight when deploying this model in production environments.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  language:
3
  - fr
4
  - en
5
+ license: llama3.2
6
+ library_name: transformers
7
  tags:
8
  - chat
9
  - llama
 
12
  - french
13
  - legal
14
  - loi
 
 
 
 
15
  base_model: meta-llama/Llama-3.2-3B
 
16
  datasets:
17
  - MaziyarPanahi/calme-legalkit-v0.2
18
+ model_name: calme-3.1-llamaloi-3b
19
+ pipeline_tag: text-generation
20
+ inference: false
21
+ model_creator: MaziyarPanahi
22
+ quantized_by: MaziyarPanahi
23
+ model-index:
24
+ - name: calme-3.1-llamaloi-3b
25
+ results:
26
+ - task:
27
+ type: text-generation
28
+ name: Text Generation
29
+ dataset:
30
+ name: IFEval (0-Shot)
31
+ type: HuggingFaceH4/ifeval
32
+ args:
33
+ num_few_shot: 0
34
+ metrics:
35
+ - type: inst_level_strict_acc and prompt_level_strict_acc
36
+ value: 73.75
37
+ name: strict accuracy
38
+ source:
39
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.1-llamaloi-3b
40
+ name: Open LLM Leaderboard
41
+ - task:
42
+ type: text-generation
43
+ name: Text Generation
44
+ dataset:
45
+ name: BBH (3-Shot)
46
+ type: BBH
47
+ args:
48
+ num_few_shot: 3
49
+ metrics:
50
+ - type: acc_norm
51
+ value: 23.77
52
+ name: normalized accuracy
53
+ source:
54
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.1-llamaloi-3b
55
+ name: Open LLM Leaderboard
56
+ - task:
57
+ type: text-generation
58
+ name: Text Generation
59
+ dataset:
60
+ name: MATH Lvl 5 (4-Shot)
61
+ type: hendrycks/competition_math
62
+ args:
63
+ num_few_shot: 4
64
+ metrics:
65
+ - type: exact_match
66
+ value: 16.77
67
+ name: exact match
68
+ source:
69
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.1-llamaloi-3b
70
+ name: Open LLM Leaderboard
71
+ - task:
72
+ type: text-generation
73
+ name: Text Generation
74
+ dataset:
75
+ name: GPQA (0-shot)
76
+ type: Idavidrein/gpqa
77
+ args:
78
+ num_few_shot: 0
79
+ metrics:
80
+ - type: acc_norm
81
+ value: 4.14
82
+ name: acc_norm
83
+ source:
84
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.1-llamaloi-3b
85
+ name: Open LLM Leaderboard
86
+ - task:
87
+ type: text-generation
88
+ name: Text Generation
89
+ dataset:
90
+ name: MuSR (0-shot)
91
+ type: TAUR-Lab/MuSR
92
+ args:
93
+ num_few_shot: 0
94
+ metrics:
95
+ - type: acc_norm
96
+ value: 1.11
97
+ name: acc_norm
98
+ source:
99
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.1-llamaloi-3b
100
+ name: Open LLM Leaderboard
101
+ - task:
102
+ type: text-generation
103
+ name: Text Generation
104
+ dataset:
105
+ name: MMLU-PRO (5-shot)
106
+ type: TIGER-Lab/MMLU-Pro
107
+ config: main
108
+ split: test
109
+ args:
110
+ num_few_shot: 5
111
+ metrics:
112
+ - type: acc
113
+ value: 24.5
114
+ name: accuracy
115
+ source:
116
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.1-llamaloi-3b
117
+ name: Open LLM Leaderboard
118
  ---
119
 
120
  <img src="./calme_3.png" alt="Calme-3 Models" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
 
181
 
182
  # Ethical Considerations
183
 
184
+ As with any large language model, users should be aware of potential biases and limitations. We recommend implementing appropriate safeguards and human oversight when deploying this model in production environments.
185
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
186
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__calme-3.1-llamaloi-3b)
187
+
188
+ | Metric |Value|
189
+ |-------------------|----:|
190
+ |Avg. |24.01|
191
+ |IFEval (0-Shot) |73.75|
192
+ |BBH (3-Shot) |23.77|
193
+ |MATH Lvl 5 (4-Shot)|16.77|
194
+ |GPQA (0-shot) | 4.14|
195
+ |MuSR (0-shot) | 1.11|
196
+ |MMLU-PRO (5-shot) |24.50|
197
+