qiyang-zhao commited on
Commit
4b91549
1 Parent(s): d46cb97

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -2
README.md CHANGED
@@ -72,8 +72,61 @@ python run_inference.py -m models/Falcon3-10B-1.58bit/ggml-model-i2_s.gguf -p "Y
72
  ```
73
 
74
  # Evaluation
75
-
76
- Coming soon ..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77
 
78
  ## Useful links
79
  - View our [release blogpost](https://huggingface.co/blog/falcon3).
 
72
  ```
73
 
74
  # Evaluation
75
+ We report in the following table our internal pipeline benchmarks:
76
+
77
+ **Note evaluation results are normalized score from v2 leaderboard tasks - reported results of original models in the blogpost are raw scores**
78
+
79
+ <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
80
+ <colgroup>
81
+ <col style="width: 10%;">
82
+ <col style="width: 10%;">
83
+ <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
84
+ </colgroup>
85
+ <thead>
86
+ <tr>
87
+ <th>Benchmark</th>
88
+ <th>Llama3-8B-1.58-100B-tokens</th>
89
+ <th>Falcon3-10B-Instruct-1.58bit</th>
90
+ </tr>
91
+ </thead>
92
+ <tbody>
93
+ <tr>
94
+ <td>IFEval</td>
95
+ <td>17.91</td>
96
+ <td><b>54.37</b></td>
97
+ </tr>
98
+ <tr>
99
+ <td>MUSR</td>
100
+ <td><b>4.87</b></td>
101
+ <td>2.57</td>
102
+ </tr>
103
+ <tr>
104
+ <td>GPQA</td>
105
+ <td>1.83</td>
106
+ <td><b>4.27</b></td>
107
+ </tr>
108
+ <tr>
109
+ <td>BBH</td>
110
+ <td>5.36</td>
111
+ <td><b>6.59</b></td>
112
+ </tr>
113
+ <tr>
114
+ <td>MMLU-PRO</td>
115
+ <td>2.78</td>
116
+ <td><b>6.62</b></td>
117
+ </tr>
118
+ <tr>
119
+ <td>MATH</td>
120
+ <td>0.26</td>
121
+ <td><b>2.44</b></td>
122
+ </tr>
123
+ <tr>
124
+ <td>Average</td>
125
+ <td>5.5</td>
126
+ <td><b>12.81</b></td>
127
+ </tr>
128
+ </tbody>
129
+ </table>
130
 
131
  ## Useful links
132
  - View our [release blogpost](https://huggingface.co/blog/falcon3).