mpasila commited on
Commit
14acb73
1 Parent(s): 45bbaf1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -3
README.md CHANGED
@@ -31,8 +31,8 @@ It uses Alpaca format but with a translated instruction at the start:
31
 
32
  | Model | Size | Type | FIN-bench (score) |
33
  |-------|------|------|-------|
34
- | **mpasila/Finnish-Alpaca-Small-7B** | 7B | Instruct | |
35
- | mpasila/Finnish-Alpaca-Tiny-V2-7B | 7B | Instruct | **0.4654** |
36
  | [mpasila/Alpacazord-Viking-7B](https://huggingface.co/mpasila/Alpacazord-Viking-7B) | 7B | Instruct | 0.4123 |
37
  | [mpasila/NordicAlpaca-Finnish-V1-7B](https://huggingface.co/mpasila/NordicAlpaca-Finnish-V1-7B) | 7B | Instruct | 0.3891 |
38
  | [mpasila/Finnish-Viking-Alpaca-V1-7B](https://huggingface.co/mpasila/Finnish-Viking-Alpaca-V1-7B) | 7B | Instruct | 0.3943 |
@@ -46,7 +46,44 @@ It uses Alpaca format but with a translated instruction at the start:
46
 
47
  #### FIN-bench scores:
48
 
49
- To be added.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
  # Uploaded model
52
 
 
31
 
32
  | Model | Size | Type | FIN-bench (score) |
33
  |-------|------|------|-------|
34
+ | **mpasila/Finnish-Alpaca-Small-7B** | 7B | Instruct | **0.4677** |
35
+ | [mpasila/Finnish-Alpaca-Tiny-V2-7B](https://huggingface.co/mpasila/Finnish-Alpaca-Tiny-V2-7B) | 7B | Instruct | 0.4654 |
36
  | [mpasila/Alpacazord-Viking-7B](https://huggingface.co/mpasila/Alpacazord-Viking-7B) | 7B | Instruct | 0.4123 |
37
  | [mpasila/NordicAlpaca-Finnish-V1-7B](https://huggingface.co/mpasila/NordicAlpaca-Finnish-V1-7B) | 7B | Instruct | 0.3891 |
38
  | [mpasila/Finnish-Viking-Alpaca-V1-7B](https://huggingface.co/mpasila/Finnish-Viking-Alpaca-V1-7B) | 7B | Instruct | 0.3943 |
 
46
 
47
  #### FIN-bench scores:
48
 
49
+ | Task |Version| Metric |Value | |Stderr|
50
+ |------------------------------------------------|------:|---------------------|-----:|---|-----:|
51
+ |bigbench_analogies | 0|multiple_choice_grade|0.5308|± |0.0439|
52
+ |bigbench_arithmetic_1_digit_addition | 0|multiple_choice_grade|0.5000|± |0.0503|
53
+ |bigbench_arithmetic_1_digit_division | 0|multiple_choice_grade|0.8261|± |0.0808|
54
+ |bigbench_arithmetic_1_digit_multiplication | 0|multiple_choice_grade|0.4600|± |0.0501|
55
+ |bigbench_arithmetic_1_digit_subtraction | 0|multiple_choice_grade|0.6000|± |0.0492|
56
+ |bigbench_arithmetic_2_digit_addition | 0|multiple_choice_grade|0.3800|± |0.0488|
57
+ |bigbench_arithmetic_2_digit_division | 0|multiple_choice_grade|0.5200|± |0.0502|
58
+ |bigbench_arithmetic_2_digit_multiplication | 0|multiple_choice_grade|0.2800|± |0.0451|
59
+ |bigbench_arithmetic_2_digit_subtraction | 0|multiple_choice_grade|0.5100|± |0.0502|
60
+ |bigbench_arithmetic_3_digit_addition | 0|multiple_choice_grade|0.5600|± |0.0499|
61
+ |bigbench_arithmetic_3_digit_division | 0|multiple_choice_grade|0.3800|± |0.0488|
62
+ |bigbench_arithmetic_3_digit_multiplication | 0|multiple_choice_grade|0.2700|± |0.0446|
63
+ |bigbench_arithmetic_3_digit_subtraction | 0|multiple_choice_grade|0.5400|± |0.0501|
64
+ |bigbench_arithmetic_4_digit_addition | 0|multiple_choice_grade|0.5400|± |0.0501|
65
+ |bigbench_arithmetic_4_digit_division | 0|multiple_choice_grade|0.4000|± |0.0492|
66
+ |bigbench_arithmetic_4_digit_multiplication | 0|multiple_choice_grade|0.3300|± |0.0473|
67
+ |bigbench_arithmetic_4_digit_subtraction | 0|multiple_choice_grade|0.6100|± |0.0490|
68
+ |bigbench_arithmetic_5_digit_addition | 0|multiple_choice_grade|0.6500|± |0.0479|
69
+ |bigbench_arithmetic_5_digit_division | 0|multiple_choice_grade|0.3300|± |0.0473|
70
+ |bigbench_arithmetic_5_digit_multiplication | 0|multiple_choice_grade|0.3200|± |0.0469|
71
+ |bigbench_arithmetic_5_digit_subtraction | 0|multiple_choice_grade|0.6500|± |0.0479|
72
+ |bigbench_cause_and_effect_one_sentence | 0|multiple_choice_grade|0.5490|± |0.0704|
73
+ |bigbench_cause_and_effect_one_sentence_no_prompt| 0|multiple_choice_grade|0.6471|± |0.0676|
74
+ |bigbench_cause_and_effect_two_sentences | 0|multiple_choice_grade|0.4314|± |0.0700|
75
+ |bigbench_emotions | 0|multiple_choice_grade|0.3500|± |0.0378|
76
+ |bigbench_empirical_judgments | 0|multiple_choice_grade|0.3131|± |0.0468|
77
+ |bigbench_general_knowledge | 0|multiple_choice_grade|0.2429|± |0.0516|
78
+ |bigbench_hhh_alignment_harmless | 0|multiple_choice_grade|0.3793|± |0.0643|
79
+ |bigbench_hhh_alignment_helpful | 0|multiple_choice_grade|0.3559|± |0.0629|
80
+ |bigbench_hhh_alignment_honest | 0|multiple_choice_grade|0.3559|± |0.0629|
81
+ |bigbench_hhh_alignment_other | 0|multiple_choice_grade|0.5581|± |0.0766|
82
+ |bigbench_intent_recognition | 0|multiple_choice_grade|0.2240|± |0.0159|
83
+ |bigbench_misconceptions | 0|multiple_choice_grade|0.5373|± |0.0432|
84
+ |bigbench_paraphrase | 0|multiple_choice_grade|0.5000|± |0.0354|
85
+ |bigbench_sentence_ambiguity | 0|multiple_choice_grade|0.4833|± |0.0651|
86
+ |bigbench_similarities_abstraction | 0|multiple_choice_grade|0.7237|± |0.0516|
87
 
88
  # Uploaded model
89