Update README.md
Browse files
README.md
CHANGED
@@ -9,6 +9,9 @@ language:
|
|
9 |
- vi
|
10 |
- th
|
11 |
- ms
|
|
|
|
|
|
|
12 |
---
|
13 |
|
14 |
# *SeaLLM3* - Large Language Models for Southeast Asia
|
@@ -17,17 +20,21 @@ language:
|
|
17 |
<p align="center">
|
18 |
<a href="https://damo-nlp-sg.github.io/SeaLLMs/" target="_blank" rel="noopener">Website</a>
|
19 |
|
20 |
-
<a href="https://huggingface.co/SeaLLMs/
|
21 |
|
22 |
-
<a href="https://huggingface.co/spaces/SeaLLMs/SeaLLM-
|
23 |
|
24 |
<a href="https://github.com/DAMO-NLP-SG/SeaLLMs" target="_blank" rel="noopener">Github</a>
|
25 |
|
26 |
<a href="https://arxiv.org/pdf/2312.00738.pdf" target="_blank" rel="noopener">Technical Report</a>
|
27 |
</p>
|
28 |
|
29 |
-
We introduce **SeaLLM3**, the latest series of the SeaLLMs (Large Language Models for Southeast Asian languages) family. It achieves state-of-the-art performance among models with similar sizes, excelling across a diverse array of tasks such as world knowledge, mathematical reasoning, translation, and instruction following. In the meantime, it was specifically enhanced to be more trustworthy, exhibiting reduced hallucination and providing safe responses, particularly in queries
|
30 |
|
|
|
|
|
|
|
|
|
31 |
|
32 |
## Uses
|
33 |
|
@@ -149,7 +156,7 @@ By using our released weights, codes, and demos, you agree to and comply with th
|
|
149 |
We conduct our evaluation along two dimensions:
|
150 |
|
151 |
1. **Model Capability**: We assess the model's performance on human exam questions, its ability to follow instructions, its proficiency in mathematics, and its translation accuracy.
|
152 |
-
2. **Model Trustworthiness**: We evaluate the model's safety and tendency to hallucinate, particularly in the context of Southeast Asia
|
153 |
|
154 |
### Model Capability
|
155 |
|
@@ -172,9 +179,8 @@ We conduct our evaluation along two dimensions:
|
|
172 |
#### Multilingual Instruction-following Capability - SeaBench
|
173 |
SeaBench consists of multi-turn human instructions spanning various task types. It evaluates chat-based models on their ability to follow human instructions in both single and multi-turn settings and assesses their performance across different task types. The dataset and corresponding evaluation code will be released soon!
|
174 |
|
175 |
-
| model | id
|
176 |
|:----------------|------------:|------------:|---------:|------------:|------------:|---------:|------------:|------------:|---------:|------:|
|
177 |
-
| ChatGPT-0125 | 6.99 | 7.21 | 7.10 | 5.36 | 5.08 | 5.22 | 6.62 | 6.73 | 6.68 | 6.33 |
|
178 |
| Qwen2-7B-Instruct| 5.93 | 5.84 | 5.89 | 5.47 | 5.20 | 5.34 | 6.17 | 5.60 | 5.89 | 5.70 |
|
179 |
| SeaLLM-7B-v2.5 | 6.27 | 4.96 | 5.62 | 5.79 | 3.82 | 4.81 | 6.02 | 4.02 | 5.02 | 5.15 |
|
180 |
| Sailor-14B-Chat | 5.26 | 5.53 | 5.40 | 4.62 | 4.36 | 4.49 | 5.31 | 4.74 | 5.03 | 4.97 |
|
|
|
9 |
- vi
|
10 |
- th
|
11 |
- ms
|
12 |
+
tags:
|
13 |
+
- sea
|
14 |
+
- multilingual
|
15 |
---
|
16 |
|
17 |
# *SeaLLM3* - Large Language Models for Southeast Asia
|
|
|
20 |
<p align="center">
|
21 |
<a href="https://damo-nlp-sg.github.io/SeaLLMs/" target="_blank" rel="noopener">Website</a>
|
22 |
|
23 |
+
<a href="https://huggingface.co/SeaLLMs/SeaLLM3-7B-Chat" target="_blank" rel="noopener"> 🤗 Tech Memo</a>
|
24 |
|
25 |
+
<a href="https://huggingface.co/spaces/SeaLLMs/SeaLLM-Chat" target="_blank" rel="noopener"> 🤗 DEMO</a>
|
26 |
|
27 |
<a href="https://github.com/DAMO-NLP-SG/SeaLLMs" target="_blank" rel="noopener">Github</a>
|
28 |
|
29 |
<a href="https://arxiv.org/pdf/2312.00738.pdf" target="_blank" rel="noopener">Technical Report</a>
|
30 |
</p>
|
31 |
|
32 |
+
We introduce **SeaLLM3**, the latest series of the SeaLLMs (Large Language Models for Southeast Asian languages) family. It achieves state-of-the-art performance among models with similar sizes, excelling across a diverse array of tasks such as world knowledge, mathematical reasoning, translation, and instruction following. In the meantime, it was specifically enhanced to be more trustworthy, exhibiting reduced hallucination and providing safe responses, particularly in queries closed related to Southeast Asian culture.
|
33 |
|
34 |
+
## 🔥 Highlights
|
35 |
+
- State-of-the-art performance compared to open-source models of similar sizes, evaluated across various dimensions such as human exam questions, instruction-following, mathematics, and translation.
|
36 |
+
- Significantly enhanced instruction-following capability, especially in multi-turn settings.
|
37 |
+
- Ensures safety in usage with significantly reduced instances of hallucination and sensitivity to local contexts.
|
38 |
|
39 |
## Uses
|
40 |
|
|
|
156 |
We conduct our evaluation along two dimensions:
|
157 |
|
158 |
1. **Model Capability**: We assess the model's performance on human exam questions, its ability to follow instructions, its proficiency in mathematics, and its translation accuracy.
|
159 |
+
2. **Model Trustworthiness**: We evaluate the model's safety and tendency to hallucinate, particularly in the context of Southeast Asia.
|
160 |
|
161 |
### Model Capability
|
162 |
|
|
|
179 |
#### Multilingual Instruction-following Capability - SeaBench
|
180 |
SeaBench consists of multi-turn human instructions spanning various task types. It evaluates chat-based models on their ability to follow human instructions in both single and multi-turn settings and assesses their performance across different task types. The dataset and corresponding evaluation code will be released soon!
|
181 |
|
182 |
+
| model | id<br>turn-1 | id<br>turn-2 | id<br>avg | th<br>turn-1 | th<br>turn-2 | th<br>avg | vi<br>turn-1 | vi<br>turn-2 | vi<br>avg | avg |
|
183 |
|:----------------|------------:|------------:|---------:|------------:|------------:|---------:|------------:|------------:|---------:|------:|
|
|
|
184 |
| Qwen2-7B-Instruct| 5.93 | 5.84 | 5.89 | 5.47 | 5.20 | 5.34 | 6.17 | 5.60 | 5.89 | 5.70 |
|
185 |
| SeaLLM-7B-v2.5 | 6.27 | 4.96 | 5.62 | 5.79 | 3.82 | 4.81 | 6.02 | 4.02 | 5.02 | 5.15 |
|
186 |
| Sailor-14B-Chat | 5.26 | 5.53 | 5.40 | 4.62 | 4.36 | 4.49 | 5.31 | 4.74 | 5.03 | 4.97 |
|