Update README.md
Browse files
README.md
CHANGED
@@ -28,8 +28,8 @@ Models output text only.
|
|
28 |
|
29 |
Arabic Benchmark evaluations on [Arabic MMLU](https://github.com/FreedomIntelligence/AceGPT) are conducted using accuracy scores as metrics, following the evaluation framework available at https://github.com/FreedomIntelligence/AceGPT/tree/main.
|
30 |
|
31 |
-
|
|
32 |
-
| ----------------- |
|
33 |
| Qwen1.5-7B | 42.14 | 46.41 | 38.34 | 75.17 | 75.88 | 54.21 | 45.56 | 53.96 |
|
34 |
| Jais-30B-v3 | 43.42 | 44.47 | 45.78 | 83.39 | 79.51 | 62.64 | 45.56 | 57.82 |
|
35 |
| Llama3-8B | 47.22 | 45.78 | 46.34 | 77.49 | 76.68 | 67.82 | 47.53 | 58.41 |
|
@@ -44,8 +44,8 @@ Arabic Benchmark evaluations on [Arabic MMLU](https://github.com/FreedomIntellig
|
|
44 |
|
45 |
Benchmarks for English and Chinese are conducted using the [OpenCompass](https://github.com/open-compass/OpenCompass/) framework.
|
46 |
|
47 |
-
|
|
48 |
-
| ----------------- |
|
49 |
| Jais-30B-v3 | 42.53 | 30.96 | 36.75 | 25.26 | 22.17 | 23.72 |30.23|
|
50 |
| **AceGPT-v2-8B** | 65.48 | 60.49 | 62.99 | 53.44 | 50.37 | 51.91 |57.45|
|
51 |
| Llama3-8B | 66.57 | 65.92 | 66.25 | 50.70 | 49.78 | 50.24 |58.24|
|
|
|
28 |
|
29 |
Arabic Benchmark evaluations on [Arabic MMLU](https://github.com/FreedomIntelligence/AceGPT) are conducted using accuracy scores as metrics, following the evaluation framework available at https://github.com/FreedomIntelligence/AceGPT/tree/main.
|
30 |
|
31 |
+
| | Arabic-trans MMLU | ArabicMMLU (koto et al.) | Arabic EXAMS | Arabic ACVA clean | Arabic ACVA all | Arabic AraTrust | Arabic ARC-C | Arabic Avg. |
|
32 |
+
| ----------------- | :-----------------: | :------------------------: | :------------: | :-----------------: | :---------------: | :---------------: | :------------: | :------------: |
|
33 |
| Qwen1.5-7B | 42.14 | 46.41 | 38.34 | 75.17 | 75.88 | 54.21 | 45.56 | 53.96 |
|
34 |
| Jais-30B-v3 | 43.42 | 44.47 | 45.78 | 83.39 | 79.51 | 62.64 | 45.56 | 57.82 |
|
35 |
| Llama3-8B | 47.22 | 45.78 | 46.34 | 77.49 | 76.68 | 67.82 | 47.53 | 58.41 |
|
|
|
44 |
|
45 |
Benchmarks for English and Chinese are conducted using the [OpenCompass](https://github.com/open-compass/OpenCompass/) framework.
|
46 |
|
47 |
+
| | MMLU | RACE | English Avg. | CMMLU | CEval | Chinese Avg. | Avg. |
|
48 |
+
| ----------------- | :------------: | :------------: | :------------: | :-------------: | :-------------: | :------------: | :------------: |
|
49 |
| Jais-30B-v3 | 42.53 | 30.96 | 36.75 | 25.26 | 22.17 | 23.72 |30.23|
|
50 |
| **AceGPT-v2-8B** | 65.48 | 60.49 | 62.99 | 53.44 | 50.37 | 51.91 |57.45|
|
51 |
| Llama3-8B | 66.57 | 65.92 | 66.25 | 50.70 | 49.78 | 50.24 |58.24|
|