qiyang-zhao commited on
Commit
5aa3a09
1 Parent(s): 4d9a9de

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -2
README.md CHANGED
@@ -71,8 +71,61 @@ python run_inference.py -m models/Falcon3-1B-1.58bit/ggml-model-i2_s.gguf -p "Yo
71
  ```
72
 
73
  # Evaluation
74
-
75
- Coming soon ..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
 
77
  ## Useful links
78
  - View our [release blogpost](https://huggingface.co/blog/falcon3).
 
71
  ```
72
 
73
  # Evaluation
74
+ We report in the following table our internal pipeline benchmarks:
75
+
76
+ **Note evaluation results are normalized score from v2 leaderboard tasks - reported results of original models in the blogpost are raw scores**
77
+
78
+ <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
79
+ <colgroup>
80
+ <col style="width: 10%;">
81
+ <col style="width: 10%;">
82
+ <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
83
+ </colgroup>
84
+ <thead>
85
+ <tr>
86
+ <th>Benchmark</th>
87
+ <th>Llama3-8B-1.58-100B-tokens</th>
88
+ <th>Falcon3-1B-Instruct-1.58bit</th>
89
+ </tr>
90
+ </thead>
91
+ <tbody>
92
+ <tr>
93
+ <td>IFEval</td>
94
+ <td>17.91</td>
95
+ <td><b>44.5</b></td>
96
+ </tr>
97
+ <tr>
98
+ <td>MUSR</td>
99
+ <td>4.87</td>
100
+ <td><b>2.78</b></td>
101
+ </tr>
102
+ <tr>
103
+ <td>GPQA</td>
104
+ <td>1.83</td>
105
+ <td><b>0</b></td>
106
+ </tr>
107
+ <tr>
108
+ <td>BBH</td>
109
+ <td><b>5.36</b></td>
110
+ <td>2.24</td>
111
+ </tr>
112
+ <tr>
113
+ <td>MMLU-PRO</td>
114
+ <td><b>2.78</b></td>
115
+ <td>1.93</td>
116
+ </tr>
117
+ <tr>
118
+ <td>MATH</td>
119
+ <td>0.26</td>
120
+ <td><b>0.17</b></td>
121
+ </tr>
122
+ <tr>
123
+ <td>Average</td>
124
+ <td>5.5</td>
125
+ <td><b>8.6</b></td>
126
+ </tr>
127
+ </tbody>
128
+ </table>
129
 
130
  ## Useful links
131
  - View our [release blogpost](https://huggingface.co/blog/falcon3).