Update README.md
Browse files
README.md
CHANGED
@@ -6,40 +6,40 @@ language:
|
|
6 |
- en
|
7 |
---
|
8 |
|
9 |
-
# <b>
|
10 |
|
11 |
-
|
12 |
This is the repository for the version 2 of the 70B pre-trained model, developed based on [Meta-Llama-3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B).
|
13 |
|
14 |
---
|
15 |
## Model Details
|
16 |
-
We have released the
|
17 |
## Model Developers
|
18 |
We are from the King Abdullah University of Science and Technology (KAUST), the Chinese University of Hong Kong, Shenzhen (CUHKSZ) and the Shenzhen Research Institute of Big Data (SRIBD).
|
19 |
## Variations
|
20 |
-
|
21 |
-
## Paper
|
22 |
-
The paper can be accessed at [link](https://huggingface.co/FreedomIntelligence/AceGPT-v2-70B-Chat/blob/main/Alignment_at_Pre_training__a_Case_Study_of_Aligning_LLMs_in_Arabic.pdf).
|
23 |
## Input
|
24 |
Models input text only.
|
25 |
## Output
|
26 |
Models output text only.
|
27 |
## Model Evaluation Results
|
28 |
|
29 |
-
Arabic Benchmark evaluations on [Arabic MMLU](https://github.com/FreedomIntelligence/AceGPT) are conducted using accuracy scores as metrics, following the evaluation framework available at https://github.com/FreedomIntelligence/AceGPT/tree/main.
|
30 |
|
31 |
| | Arabic-trans MMLU | ArabicMMLU (koto et al.) | Arabic EXAMS | Arabic ACVA clean | Arabic ACVA all | Arabic AraTrust | Arabic ARC-C | Arabic Avg. |
|
32 |
| ----------------- | :-----------------: | :------------------------: | :------------: | :-----------------: | :---------------: | :---------------: | :------------: | :------------: |
|
33 |
| Qwen1.5-7B | 42.14 | 46.41 | 38.34 | 75.17 | 75.88 | 54.21 | 45.56 | 53.96 |
|
34 |
| Jais-30B-v3 | 43.42 | 44.47 | 45.78 | 83.39 | 79.51 | 62.64 | 45.56 | 57.82 |
|
35 |
| Llama3-8B | 47.22 | 45.78 | 46.34 | 77.49 | 76.68 | 67.82 | 47.53 | 58.41 |
|
36 |
-
| **
|
37 |
| ChatGPT 3.5 Turbo | 49.07 | 57.70 | 45.93 | 74.45 | 76.88 | 65.13 | 60.24 | 61.34 |
|
38 |
| Qwen1.5-32B | 55.90 | 55.94 | 52.84 | 78.91 | 80.07 | 69.34 | 67.66 | 65.81 |
|
39 |
| Qwen1.5-72B | 60.24 | 61.23 | 54.41 | 82.98 | <u>81.20</u> | 75.93 | 76.79 | 70.40 |
|
40 |
-
| **
|
41 |
| Llama3-70B | <u>65.16</u> | 65.67 | 54.78 | 83.48 | **82.92** | 74.84 | 77.30 | 72.02 |
|
42 |
-
| **
|
43 |
| GPT-4 | 65.06 | **72.50** | **57.76** | <u>84.06</u> | 79.43 | **90.04** | **85.67** | **76.36** |
|
44 |
|
45 |
Benchmarks for English and Chinese are conducted using the [OpenCompass](https://github.com/open-compass/OpenCompass/) framework.
|
@@ -47,15 +47,15 @@ Benchmarks for English and Chinese are conducted using the [OpenCompass](https:/
|
|
47 |
| | MMLU | RACE | English Avg. | CMMLU | CEval | Chinese Avg. | Avg. |
|
48 |
| ----------------- | :------------: | :------------: | :------------: | :-------------: | :-------------: | :------------: | :------------: |
|
49 |
| Jais-30B-v3 | 42.53 | 30.96 | 36.75 | 25.26 | 22.17 | 23.72 |30.23|
|
50 |
-
| **
|
51 |
| Llama3-8B | 66.57 | 65.92 | 66.25 | 50.70 | 49.78 | 50.24 |58.24|
|
52 |
| ChatGPT 3.5 Turbo | 69.03 | 83.00 | 76.02 | 53.90 | 52.50 | 53.20 |64.60|
|
53 |
| Qwen1.5-7B | 62.15 | 82.19 | 72.17 | 71.79 | 73.61 | 72.70 |72.44|
|
54 |
-
| **
|
55 |
| GPT-4 | **83.00** | **91.00** | **87.00** | 71.00 | 69.90 | 70.45 |78.73|
|
56 |
| Llama3-70B | <u>79.34</u> | 84.76 | <u>82.05</u> | 68.29 | 67.21 | 67.75 |74.90|
|
57 |
| Qwen1.5-32B | 75.10 | 83.29 | 79.20 | **83.12** | <u>82.68</u> | <u>82.90</u> |81.05|
|
58 |
-
| **
|
59 |
| Qwen1.5-72B | 75.78 | 88.23 | 82.01 | <u>83.11</u> | **83.04** | **83.08** |**82.54**|
|
60 |
|
61 |
|
@@ -76,7 +76,7 @@ Benchmarks for English and Chinese are conducted using the [OpenCompass](https:/
|
|
76 |
" C\n\nسؤال: ما هي الحجج"
|
77 |
|
78 |
|
79 |
-
# Reference
|
80 |
```
|
81 |
@inproceedings{liang2024alignment,
|
82 |
title={Alignment at Pre-training! Towards Native Alignment for Arabic {LLM}s},
|
@@ -93,4 +93,4 @@ Benchmarks for English and Chinese are conducted using the [OpenCompass](https:/
|
|
93 |
journal={},
|
94 |
year={2024}
|
95 |
}
|
96 |
-
```
|
|
|
6 |
- en
|
7 |
---
|
8 |
|
9 |
+
# <b>MgGPT</b>
|
10 |
|
11 |
+
MgGPT is a fully fine-tuned generative text model collection, particularly focused on the Arabic language domain.
|
12 |
This is the repository for the version 2 of the 70B pre-trained model, developed based on [Meta-Llama-3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B).
|
13 |
|
14 |
---
|
15 |
## Model Details
|
16 |
+
We have released the MgGPT family of large language models, which is a collection of fully fine-tuned generative text models, ranging from 8B to 70B parameters. Our models include two main categories: AceGPT and AceGPT-chat. AceGPT-chat is an optimized version specifically designed for dialogue applications. It is worth mentioning that our models have demonstrated superior performance compared to all currently available open-source Arabic dialogue models in multiple benchmark tests. Furthermore, in our human evaluations, our models have shown comparable satisfaction levels to some closed-source models, such as ChatGPT, in the Arabic language.
|
17 |
## Model Developers
|
18 |
We are from the King Abdullah University of Science and Technology (KAUST), the Chinese University of Hong Kong, Shenzhen (CUHKSZ) and the Shenzhen Research Institute of Big Data (SRIBD).
|
19 |
## Variations
|
20 |
+
MgGPT families come in a range of parameter sizes —— 8B, 13B, 32B and 70B, each size of model has a base category and a -chat category.
|
21 |
+
<!-- ## Paper -->
|
22 |
+
<!-- The paper can be accessed at [link](https://huggingface.co/FreedomIntelligence/AceGPT-v2-70B-Chat/blob/main/Alignment_at_Pre_training__a_Case_Study_of_Aligning_LLMs_in_Arabic.pdf). -->
|
23 |
## Input
|
24 |
Models input text only.
|
25 |
## Output
|
26 |
Models output text only.
|
27 |
## Model Evaluation Results
|
28 |
|
29 |
+
<!-- Arabic Benchmark evaluations on [Arabic MMLU](https://github.com/FreedomIntelligence/AceGPT) are conducted using accuracy scores as metrics, following the evaluation framework available at https://github.com/FreedomIntelligence/AceGPT/tree/main. -->
|
30 |
|
31 |
| | Arabic-trans MMLU | ArabicMMLU (koto et al.) | Arabic EXAMS | Arabic ACVA clean | Arabic ACVA all | Arabic AraTrust | Arabic ARC-C | Arabic Avg. |
|
32 |
| ----------------- | :-----------------: | :------------------------: | :------------: | :-----------------: | :---------------: | :---------------: | :------------: | :------------: |
|
33 |
| Qwen1.5-7B | 42.14 | 46.41 | 38.34 | 75.17 | 75.88 | 54.21 | 45.56 | 53.96 |
|
34 |
| Jais-30B-v3 | 43.42 | 44.47 | 45.78 | 83.39 | 79.51 | 62.64 | 45.56 | 57.82 |
|
35 |
| Llama3-8B | 47.22 | 45.78 | 46.34 | 77.49 | 76.68 | 67.82 | 47.53 | 58.41 |
|
36 |
+
| **MgGPT-8B** | 48.41 | 50.17 | 46.15 | 80.14 | 78.84 | 65.90 | 49.91 | 59.93 |
|
37 |
| ChatGPT 3.5 Turbo | 49.07 | 57.70 | 45.93 | 74.45 | 76.88 | 65.13 | 60.24 | 61.34 |
|
38 |
| Qwen1.5-32B | 55.90 | 55.94 | 52.84 | 78.91 | 80.07 | 69.34 | 67.66 | 65.81 |
|
39 |
| Qwen1.5-72B | 60.24 | 61.23 | 54.41 | 82.98 | <u>81.20</u> | 75.93 | 76.79 | 70.40 |
|
40 |
+
| **MgGPT-32B** | 58.71 | 65.67 | 52.74 | 82.66 | 81.04 | 80.46 | 71.69 | 70.42 |
|
41 |
| Llama3-70B | <u>65.16</u> | 65.67 | 54.78 | 83.48 | **82.92** | 74.84 | 77.30 | 72.02 |
|
42 |
+
| **MgGPT-70B** | **65.19** | <u>67.71</u> | <u>56.19</u> | **84.79** | 80.93 | <u>80.93</u> | <u>80.93</u> | <u>73.81</u> |
|
43 |
| GPT-4 | 65.06 | **72.50** | **57.76** | <u>84.06</u> | 79.43 | **90.04** | **85.67** | **76.36** |
|
44 |
|
45 |
Benchmarks for English and Chinese are conducted using the [OpenCompass](https://github.com/open-compass/OpenCompass/) framework.
|
|
|
47 |
| | MMLU | RACE | English Avg. | CMMLU | CEval | Chinese Avg. | Avg. |
|
48 |
| ----------------- | :------------: | :------------: | :------------: | :-------------: | :-------------: | :------------: | :------------: |
|
49 |
| Jais-30B-v3 | 42.53 | 30.96 | 36.75 | 25.26 | 22.17 | 23.72 |30.23|
|
50 |
+
| **MgGPT-8B** | 65.48 | 60.49 | 62.99 | 53.44 | 50.37 | 51.91 |57.45|
|
51 |
| Llama3-8B | 66.57 | 65.92 | 66.25 | 50.70 | 49.78 | 50.24 |58.24|
|
52 |
| ChatGPT 3.5 Turbo | 69.03 | 83.00 | 76.02 | 53.90 | 52.50 | 53.20 |64.60|
|
53 |
| Qwen1.5-7B | 62.15 | 82.19 | 72.17 | 71.79 | 73.61 | 72.70 |72.44|
|
54 |
+
| **MgGPT-70B** | 76.71 | 80.48 | 78.60 | 68.97 | 66.87 | 67.92 |73.26|
|
55 |
| GPT-4 | **83.00** | **91.00** | **87.00** | 71.00 | 69.90 | 70.45 |78.73|
|
56 |
| Llama3-70B | <u>79.34</u> | 84.76 | <u>82.05</u> | 68.29 | 67.21 | 67.75 |74.90|
|
57 |
| Qwen1.5-32B | 75.10 | 83.29 | 79.20 | **83.12** | <u>82.68</u> | <u>82.90</u> |81.05|
|
58 |
+
| **MgGPT-32B** | 74.52 | <u>88.68</u> | 81.60 | 81.36 | 82.41 | 81.89 |<u>81.74</u>|
|
59 |
| Qwen1.5-72B | 75.78 | 88.23 | 82.01 | <u>83.11</u> | **83.04** | **83.08** |**82.54**|
|
60 |
|
61 |
|
|
|
76 |
" C\n\nسؤال: ما هي الحجج"
|
77 |
|
78 |
|
79 |
+
<!-- # Reference
|
80 |
```
|
81 |
@inproceedings{liang2024alignment,
|
82 |
title={Alignment at Pre-training! Towards Native Alignment for Arabic {LLM}s},
|
|
|
93 |
journal={},
|
94 |
year={2024}
|
95 |
}
|
96 |
+
``` -->
|