Upload README.md
Browse files
README.md
CHANGED
@@ -587,15 +587,11 @@ All models are evaluated in chat mode (e.g. with the respective conversation tem
|
|
587 |
|
588 |
*: Grok results are reported by [X.AI](https://x.ai/).
|
589 |
|
590 |
-
<div>
|
591 |
-
<
|
592 |
-
5-shot:
|
593 |
</div>
|
594 |
|
595 |
-
|
596 |
-
|----------|-------|------------|----------------|-------|---------------|-------|
|
597 |
-
| ChatGPT | 47.81 | 55.68 | 56.5 | 62.66 | 50.69 | 55.51 |
|
598 |
-
| OpenChat | 38.7 | 45.99 | 48.32 | 50.23 | 43.27 | 45.85 |
|
599 |
|
600 |
<div>
|
601 |
<h3>Multi-Level Multi-Discipline Chinese Evaluation Suite (CEVAL)</h3>
|
@@ -606,6 +602,14 @@ All models are evaluated in chat mode (e.g. with the respective conversation tem
|
|
606 |
| ChatGPT | 54.4 | 52.9 | 61.8 | 50.9 | 53.6 |
|
607 |
| OpenChat | 47.29 | 45.22 | 52.49 | 48.52 | 45.08 |
|
608 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
609 |
|
610 |
<div align="center">
|
611 |
<h2> Limitations </h2>
|
|
|
587 |
|
588 |
*: Grok results are reported by [X.AI](https://x.ai/).
|
589 |
|
590 |
+
<div align="center">
|
591 |
+
<h2> 中文评估结果 / Chinese Evaluations </h2>
|
|
|
592 |
</div>
|
593 |
|
594 |
+
⚠️ Note that this model was not explicitly trained in Chinese (only < 0.1% of the data is in Chinese). 请注意本模型没有针对性训练中文(中文数据占比小于0.1%)。
|
|
|
|
|
|
|
595 |
|
596 |
<div>
|
597 |
<h3>Multi-Level Multi-Discipline Chinese Evaluation Suite (CEVAL)</h3>
|
|
|
602 |
| ChatGPT | 54.4 | 52.9 | 61.8 | 50.9 | 53.6 |
|
603 |
| OpenChat | 47.29 | 45.22 | 52.49 | 48.52 | 45.08 |
|
604 |
|
605 |
+
<div>
|
606 |
+
<h3>Massive Multitask Language Understanding in Chinese (CMMLU, 5-shot)</h3>
|
607 |
+
</div>
|
608 |
+
|
609 |
+
| Models | STEM | Humanities | SocialSciences | Other | ChinaSpecific | Avg |
|
610 |
+
|----------|-------|------------|----------------|-------|---------------|-------|
|
611 |
+
| ChatGPT | 47.81 | 55.68 | 56.5 | 62.66 | 50.69 | 55.51 |
|
612 |
+
| OpenChat | 38.7 | 45.99 | 48.32 | 50.23 | 43.27 | 45.85 |
|
613 |
|
614 |
<div align="center">
|
615 |
<h2> Limitations </h2>
|