JuhaoLiang commited on
Commit
451c0bd
·
verified ·
1 Parent(s): 0026c4e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -28,8 +28,8 @@ Models output text only.
28
 
29
  Arabic Benchmark evaluations on [Arabic MMLU](https://github.com/FreedomIntelligence/AceGPT) are conducted using accuracy scores as metrics, following the evaluation framework available at https://github.com/FreedomIntelligence/AceGPT/tree/main.
30
 
31
- | Models | Arabic-trans MMLU | ArabicMMLU (koto et al.) | Arabic EXAMS | Arabic ACVA clean | Arabic ACVA all | Arabic AraTrust | Arabic ARC-C | Arabic Avg. |
32
- | ----------------- | ----------------- | ------------------------ | ------------ | ----------------- | --------------- | --------------- | ------------ | ------------ |
33
  | Qwen1.5-7B | 42.14 | 46.41 | 38.34 | 75.17 | 75.88 | 54.21 | 45.56 | 53.96 |
34
  | Jais-30B-v3 | 43.42 | 44.47 | 45.78 | 83.39 | 79.51 | 62.64 | 45.56 | 57.82 |
35
  | Llama3-8B | 47.22 | 45.78 | 46.34 | 77.49 | 76.68 | 67.82 | 47.53 | 58.41 |
@@ -44,8 +44,8 @@ Arabic Benchmark evaluations on [Arabic MMLU](https://github.com/FreedomIntellig
44
 
45
  Benchmarks for English and Chinese are conducted using the [OpenCompass](https://github.com/open-compass/OpenCompass/) framework.
46
 
47
- | Models | MMLU | RACE | English Avg. | CMMLU | CEval | Chinese Avg. | Avg. |
48
- | ----------------- | ------------ | ------------ | ------------ | ------------- | ------------- | ------------ | ------------ |
49
  | Jais-30B-v3 | 42.53 | 30.96 | 36.75 | 25.26 | 22.17 | 23.72 |30.23|
50
  | **AceGPT-v2-8B** | 65.48 | 60.49 | 62.99 | 53.44 | 50.37 | 51.91 |57.45|
51
  | Llama3-8B | 66.57 | 65.92 | 66.25 | 50.70 | 49.78 | 50.24 |58.24|
 
28
 
29
  Arabic Benchmark evaluations on [Arabic MMLU](https://github.com/FreedomIntelligence/AceGPT) are conducted using accuracy scores as metrics, following the evaluation framework available at https://github.com/FreedomIntelligence/AceGPT/tree/main.
30
 
31
+ | | Arabic-trans MMLU | ArabicMMLU (koto et al.) | Arabic EXAMS | Arabic ACVA clean | Arabic ACVA all | Arabic AraTrust | Arabic ARC-C | Arabic Avg. |
32
+ | ----------------- | :-----------------: | :------------------------: | :------------: | :-----------------: | :---------------: | :---------------: | :------------: | :------------: |
33
  | Qwen1.5-7B | 42.14 | 46.41 | 38.34 | 75.17 | 75.88 | 54.21 | 45.56 | 53.96 |
34
  | Jais-30B-v3 | 43.42 | 44.47 | 45.78 | 83.39 | 79.51 | 62.64 | 45.56 | 57.82 |
35
  | Llama3-8B | 47.22 | 45.78 | 46.34 | 77.49 | 76.68 | 67.82 | 47.53 | 58.41 |
 
44
 
45
  Benchmarks for English and Chinese are conducted using the [OpenCompass](https://github.com/open-compass/OpenCompass/) framework.
46
 
47
+ | | MMLU | RACE | English Avg. | CMMLU | CEval | Chinese Avg. | Avg. |
48
+ | ----------------- | :------------: | :------------: | :------------: | :-------------: | :-------------: | :------------: | :------------: |
49
  | Jais-30B-v3 | 42.53 | 30.96 | 36.75 | 25.26 | 22.17 | 23.72 |30.23|
50
  | **AceGPT-v2-8B** | 65.48 | 60.49 | 62.99 | 53.44 | 50.37 | 51.91 |57.45|
51
  | Llama3-8B | 66.57 | 65.92 | 66.25 | 50.70 | 49.78 | 50.24 |58.24|