Update README.md
Browse files
README.md
CHANGED
@@ -27,6 +27,7 @@ This is a preview model, with the stable version set to be released soon.
|
|
27 |
|
28 |
## Benchmark
|
29 |
|
|
|
30 |
|
31 |
| Metric | Kyara-3b-it | Llama3.2-3b-it |
|
32 |
|--------------------------|----------|-------------|
|
@@ -37,5 +38,6 @@ This is a preview model, with the stable version set to be released soon.
|
|
37 |
|  - Social-Science | **44.16** | 41.98 |
|
38 |
| **[MMLU-Redux](https://github.com/yuchenlin/ZeroEval)** | **57.24**| 56.91 |
|
39 |
| **[GSM8K](https://github.com/yuchenlin/ZeroEval)** | **54.21**| 51.63 |
|
|
|
40 |
| **[CRUX](https://github.com/yuchenlin/ZeroEval)** | **31.25**| 25.25 |
|
41 |
| **[AlpacaEval](https://github.com/tatsu-lab/alpaca_eval)** | **23.87**| 19.35 |
|
|
|
27 |
|
28 |
## Benchmark
|
29 |
|
30 |
+
All evaluations are conducted in a zero-shot setting.
|
31 |
|
32 |
| Metric | Kyara-3b-it | Llama3.2-3b-it |
|
33 |
|--------------------------|----------|-------------|
|
|
|
38 |
|  - Social-Science | **44.16** | 41.98 |
|
39 |
| **[MMLU-Redux](https://github.com/yuchenlin/ZeroEval)** | **57.24**| 56.91 |
|
40 |
| **[GSM8K](https://github.com/yuchenlin/ZeroEval)** | **54.21**| 51.63 |
|
41 |
+
| **[MATH-L5](https://github.com/yuchenlin/ZeroEval)** | **19.97**| 16.23 |
|
42 |
| **[CRUX](https://github.com/yuchenlin/ZeroEval)** | **31.25**| 25.25 |
|
43 |
| **[AlpacaEval](https://github.com/tatsu-lab/alpaca_eval)** | **23.87**| 19.35 |
|