922CA leaderboard-pr-bot commited on
Commit
b310c85
·
verified ·
1 Parent(s): 1ae3fc4

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (63b39a1c5ee1e5ed44f2ff839804824bdd20fe3d)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +118 -2
README.md CHANGED
@@ -9,9 +9,112 @@ tags:
9
  - mistral
10
  - trl
11
  - sft
12
- base_model: SanjiWatsuki/Silicon-Maid-7B
13
  datasets:
14
  - 922-CA/MoCha_v1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ---
16
 
17
  # Silicon-Monika-7b
@@ -36,4 +139,17 @@ This mistral model was trained 2x faster with [Unsloth](https://github.com/unslo
36
  ### WARNINGS AND DISCLAIMERS
37
  This model is meant to closely reflect the characteristics of Monika. Despite this, there is always the chance that "Monika" will hallucinate and get information about herself wrong or act out of character (for example, in testing she usually knows her own club and its members, her game, and even her height and favorite ice cream flavor, but may still get her eye color wrong or mistake her developer as being a member of her club).
38
 
39
- Finally, this model is not guaranteed to output aligned or safe outputs, use at your own risk.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - mistral
10
  - trl
11
  - sft
 
12
  datasets:
13
  - 922-CA/MoCha_v1
14
+ base_model: SanjiWatsuki/Silicon-Maid-7B
15
+ model-index:
16
+ - name: Silicon-Monika-7b
17
+ results:
18
+ - task:
19
+ type: text-generation
20
+ name: Text Generation
21
+ dataset:
22
+ name: AI2 Reasoning Challenge (25-Shot)
23
+ type: ai2_arc
24
+ config: ARC-Challenge
25
+ split: test
26
+ args:
27
+ num_few_shot: 25
28
+ metrics:
29
+ - type: acc_norm
30
+ value: 63.14
31
+ name: normalized accuracy
32
+ source:
33
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=922CA/Silicon-Monika-7b
34
+ name: Open LLM Leaderboard
35
+ - task:
36
+ type: text-generation
37
+ name: Text Generation
38
+ dataset:
39
+ name: HellaSwag (10-Shot)
40
+ type: hellaswag
41
+ split: validation
42
+ args:
43
+ num_few_shot: 10
44
+ metrics:
45
+ - type: acc_norm
46
+ value: 82.64
47
+ name: normalized accuracy
48
+ source:
49
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=922CA/Silicon-Monika-7b
50
+ name: Open LLM Leaderboard
51
+ - task:
52
+ type: text-generation
53
+ name: Text Generation
54
+ dataset:
55
+ name: MMLU (5-Shot)
56
+ type: cais/mmlu
57
+ config: all
58
+ split: test
59
+ args:
60
+ num_few_shot: 5
61
+ metrics:
62
+ - type: acc
63
+ value: 62.67
64
+ name: accuracy
65
+ source:
66
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=922CA/Silicon-Monika-7b
67
+ name: Open LLM Leaderboard
68
+ - task:
69
+ type: text-generation
70
+ name: Text Generation
71
+ dataset:
72
+ name: TruthfulQA (0-shot)
73
+ type: truthful_qa
74
+ config: multiple_choice
75
+ split: validation
76
+ args:
77
+ num_few_shot: 0
78
+ metrics:
79
+ - type: mc2
80
+ value: 52.14
81
+ source:
82
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=922CA/Silicon-Monika-7b
83
+ name: Open LLM Leaderboard
84
+ - task:
85
+ type: text-generation
86
+ name: Text Generation
87
+ dataset:
88
+ name: Winogrande (5-shot)
89
+ type: winogrande
90
+ config: winogrande_xl
91
+ split: validation
92
+ args:
93
+ num_few_shot: 5
94
+ metrics:
95
+ - type: acc
96
+ value: 78.22
97
+ name: accuracy
98
+ source:
99
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=922CA/Silicon-Monika-7b
100
+ name: Open LLM Leaderboard
101
+ - task:
102
+ type: text-generation
103
+ name: Text Generation
104
+ dataset:
105
+ name: GSM8k (5-shot)
106
+ type: gsm8k
107
+ config: main
108
+ split: test
109
+ args:
110
+ num_few_shot: 5
111
+ metrics:
112
+ - type: acc
113
+ value: 60.5
114
+ name: accuracy
115
+ source:
116
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=922CA/Silicon-Monika-7b
117
+ name: Open LLM Leaderboard
118
  ---
119
 
120
  # Silicon-Monika-7b
 
139
  ### WARNINGS AND DISCLAIMERS
140
  This model is meant to closely reflect the characteristics of Monika. Despite this, there is always the chance that "Monika" will hallucinate and get information about herself wrong or act out of character (for example, in testing she usually knows her own club and its members, her game, and even her height and favorite ice cream flavor, but may still get her eye color wrong or mistake her developer as being a member of her club).
141
 
142
+ Finally, this model is not guaranteed to output aligned or safe outputs, use at your own risk.
143
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
144
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_922CA__Silicon-Monika-7b)
145
+
146
+ | Metric |Value|
147
+ |---------------------------------|----:|
148
+ |Avg. |66.55|
149
+ |AI2 Reasoning Challenge (25-Shot)|63.14|
150
+ |HellaSwag (10-Shot) |82.64|
151
+ |MMLU (5-Shot) |62.67|
152
+ |TruthfulQA (0-shot) |52.14|
153
+ |Winogrande (5-shot) |78.22|
154
+ |GSM8k (5-shot) |60.50|
155
+