File size: 15,799 Bytes
d9a4524 ddf04b8 d9a4524 9fa4d4d d9a4524 13f93e5 d9a4524 9366368 9e340e4 daeda74 9e340e4 d9a4524 f8b937b 235b918 5197666 72ac29c 5197666 72ac29c 5197666 df3e879 5197666 df3e879 5197666 df3e879 5197666 235b918 72ac29c 235b918 72ac29c 235b918 b5908c1 235b918 f8b937b 3a0d9ab df19a6b f14e6e4 df19a6b f14e6e4 df19a6b f14e6e4 df19a6b f14e6e4 df19a6b f14e6e4 df19a6b f14e6e4 df19a6b f14e6e4 df19a6b f14e6e4 df19a6b f14e6e4 df19a6b f14e6e4 df19a6b f14e6e4 df19a6b f14e6e4 df19a6b f14e6e4 df19a6b f14e6e4 df19a6b f8b937b 9e340e4 d9a4524 ac91b92 d9a4524 b1c4fe1 d9a4524 b1c4fe1 d9a4524 0d8c719 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 |
---
license: mit
language:
- en
- zh
- id
---
# MeRALiON-LLaMA-3-8B-Instruct
**MeRALiON-LLaMA-3-8B-Instruct** is a large language model (LLM) designed to excel in multilingual understanding and instruction-following tasks. This model builds on the Llama-3-8B architecture and continue pretrained from Llama-3-8B-Base, enhanced through an extensive and meticulously curated continued pretraining process and careful merging of model weights.
## Model Overview
MeRALiON-LLaMA-3-8B-Instruct is primarily trained on English, Chinese, and Indonesian, with a particular emphasis on elevating its understanding and generation capabilities in Southeast Asian languages—especially Chinese and Indonesian. By integrating corpus mixing strategies developed for regional multilingual datasets, we carefully diversified the training content through domain classification, hyperparameter tuning, and replay strategies. These measures not only help the model retain knowledge without catastrophic forgetting but also significantly enhance its performance in producing high-quality, contextually accurate responses within these Southeast Asian language contexts.
Key advancements include:
- **Extended Pretraining**: Continued pretraining on over 120 billion tokens of primarily English, Chinese, and Indonesian text.
- **SEA Multilingual Corpus Mixing**: Drawing on strategies from Southeast Asian multilingual corpora to enhance language understanding and generation capabilities.
- **Domain-Diversified Pretraining Corpus**: Careful selection and classification of training data from a wide range of topics and genres.
- **Optimized Training Techniques**: Implementing replay strategies and carefully selected hyperparameters to ensure stability, maintain quality, and avoid catastrophic forgetting.
- **Instruction Tuning via Model Merging**: Rather than a standard instruction-tuning pipeline, this model was derived by merging the official Llama-3.1-8B-base and Llama-3.1-8B-instruct models to produce superior instruction-following capabilities without additional supervised instruction data.
### Highlights
- **Enhanced Performance**: MeRALiON-LLaMA-3-8B-Instruct demonstrates improved results on benchmarks including cross-MMLU, cross-LogiQA, cross-XQuAD, IndoMMLU, and CNEval, surpassing the capabilities of the official Llama-3 models.
- **Extensive Multilingual Support**: Strong coverage of English, Chinese, and Indonesian text, coupled with strategies inspired by Southeast Asian multilingual approaches, ensures robust understanding of and responsiveness to diverse linguistic inputs.
### Model Specifications
- **Model Type**: Decoder
- **Architecture**: Llama-3.1-8B
- **Context Length**: 8192 tokens
- **Languages**: English, Chinese, Indonesian
- **License**: [Llama3 Community License](https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/LICENSE)
## Benchmark Performance
MeRALiON-LLaMA-3-8B-Instruct achieves notable improvements over official Llama-3 base and instruction-tuned models, highlighting the impact of our continued pretraining strategies. Through techniques such as corpus mixing, replay to prevent forgetting, and careful model merging, this model not only enhances general reasoning capabilities but also excels across multilingual and domain-specific benchmarks. In addition, we employed an LLM-based evaluation pipeline to standardize the judging process across varied output formats, ensuring fair and consistent comparisons. Building on the robust instruction-following proficiency of Llama-3.1-8B, MeRALiON-LLaMA-3-8B-Instruct extends its strengths to Southeast Asian languages, including Chinese and Indonesian.
### **Key highlights from the evaluations include**:
- **Cross-MMLU, Cross-LogiQA**: Enhanced reasoning and question-answering capabilities illustrate that continued pretraining improves multilingual understanding and accuracy over baseline Llama models.
- **IndoMMLU and CNEval**: Performance boosts in Indonesian and Chinese benchmarks highlight that careful corpus mixing and replay strategies help maintain and improve language-specific strengths.
### Cross-MMLU
<table>
<tr>
<th>Model Series</th>
<th>Model</th>
<th>Link</th>
<th>English</th>
<th>Chinese</th>
<th>Indonesian</th>
<th>Malay</th>
<th>Avg (En/Zh/Id/Ms)</th>
</tr>
<!-- LLaMA Series First -->
<tr>
<td rowspan="4">LLaMA Series</td>
<td><strong>MeRALiON-LLaMA-3-8B-Instruct</strong></td>
<td></td>
<td>0.847</td>
<td>0.693</td>
<td>0.713</td>
<td>0.613</td>
<td>0.717</td>
</tr>
<tr>
<td>Meta-Llama-3.1-8B-Instruct</td>
<td><a href="https://ai.meta.com/blog/meta-llama-3-1/">Link</a></td>
<td>0.82</td>
<td>0.633</td>
<td>0.66</td>
<td>0.647</td>
<td>0.690</td>
</tr>
<tr>
<td>Llama3-8B-CPT-SEA-LION-v2.1-Instruct</td>
<td><a href="https://huggingface.co/aisingapore/llama3-8b-cpt-sea-lionv2.1-instruct">Link</a></td>
<td>0.753</td>
<td>0.667</td>
<td>0.693</td>
<td>0.64</td>
<td>0.688</td>
</tr>
<tr>
<td>Meta-Llama-3-8B-Instruct</td>
<td><a href="https://ai.meta.com/blog/meta-llama-3/">Link</a></td>
<td>0.767</td>
<td>0.653</td>
<td>0.573</td>
<td>0.573</td>
<td>0.642</td>
</tr>
<!-- Non-LLaMA Series -->
<tr>
<td rowspan="5">Non-LLaMA Series</td>
<td><strong>GPT4o-0513</strong></td>
<td><a href="https://openai.com/index/hello-gpt-4o/">Link</a></td>
<td>0.927</td>
<td>0.887</td>
<td>0.88</td>
<td>0.907</td>
<td>0.900</td>
</tr>
<tr>
<td>Gemma-2-9B-IT</td>
<td><a href="https://huggingface.co/google/gemma-2-9b-it">Link</a></td>
<td>0.84</td>
<td>0.793</td>
<td>0.78</td>
<td>0.747</td>
<td>0.790</td>
</tr>
<tr>
<td>Gemma2-9B-CPT-SEA-Lion-v3-Instruct</td>
<td><a href="https://huggingface.co/aisingapore/gemma2-9b-cpt-sea-lionv3-instruct">Link</a></td>
<td>0.847</td>
<td>0.787</td>
<td>0.793</td>
<td>0.733</td>
<td>0.790</td>
</tr>
<tr>
<td>Qwen2.5-7B-Instruct</td>
<td><a href="https://huggingface.co/Qwen/Qwen2.5-7B-Instruct">Link</a></td>
<td>0.847</td>
<td>0.84</td>
<td>0.753</td>
<td>0.713</td>
<td>0.788</td>
</tr>
<tr>
<td>SeaLLMs-v3-7B-Chat</td>
<td><a href="https://arxiv.org/abs/2407.19672">Link</a></td>
<td>0.833</td>
<td>0.727</td>
<td>0.74</td>
<td>0.687</td>
<td>0.747</td>
</tr>
</table>
### Cross-LogiQA
<table>
<tr>
<th>Model Series</th>
<th>Model</th>
<th>Link</th>
<th>English</th>
<th>Chinese</th>
<th>Indonesian</th>
<th>Malay</th>
<th>Avg (En/Zh/Id/Ms)</th>
</tr>
<!-- LLaMA Series -->
<tr>
<td rowspan="3">LLaMA Series</td>
<td>Meta-Llama-3.1-8B-Instruct</td>
<td><a href="https://ai.meta.com/blog/meta-llama-3-1/">Link</a></td>
<td>0.585</td>
<td>0.585</td>
<td>0.455</td>
<td>0.523</td>
<td><strong>0.537</strong></td>
</tr>
<tr>
<td>MeRALiON-LLaMA-3-8B-Instruct</td>
<td></td>
<td>0.591</td>
<td>0.528</td>
<td>0.494</td>
<td>0.489</td>
<td>0.526</td>
</tr>
<tr>
<td>Llama3-8B-CPT-SEA-LION-v2.1-Instruct</td>
<td><a href="https://huggingface.co/aisingapore/llama3-8b-cpt-sea-lionv2.1-instruct">Link</a></td>
<td>0.528</td>
<td>0.517</td>
<td>0.403</td>
<td>0.443</td>
<td>0.473</td>
</tr>
<!-- Non-LLaMA Series -->
<tr>
<td rowspan="4">Non-LLaMA Series</td>
<td>Qwen2.5-7B-Instruct</td>
<td><a href="https://huggingface.co/Qwen/Qwen2.5-7B-Instruct">Link</a></td>
<td>0.693</td>
<td>0.71</td>
<td>0.631</td>
<td>0.534</td>
<td><strong>0.642</strong></td>
</tr>
<tr>
<td>Gemma-2-9B-IT</td>
<td><a href="https://huggingface.co/google/gemma-2-9b-it">Link</a></td>
<td>0.659</td>
<td>0.636</td>
<td>0.585</td>
<td>0.602</td>
<td>0.621</td>
</tr>
<tr>
<td>Gemma2-9B-CPT-SEA-Lion-v3-Instruct</td>
<td><a href="https://huggingface.co/aisingapore/gemma2-9b-cpt-sea-lionv3-instruct">Link</a></td>
<td>0.636</td>
<td>0.642</td>
<td>0.557</td>
<td>0.551</td>
<td>0.597</td>
</tr>
<tr>
<td>SeaLLMs-v3-7B-Chat</td>
<td><a href="https://arxiv.org/abs/2407.19672">Link</a></td>
<td>0.568</td>
<td>0.585</td>
<td>0.494</td>
<td>0.517</td>
<td>0.541</td>
</tr>
</table>
### IndoMMLU
<table>
<tr>
<th>Model Series</th>
<th>Model</th>
<th>Link</th>
<th>Accuracy</th>
</tr>
<!-- LLaMA Series -->
<tr>
<td rowspan="4">LLaMA Series</td>
<td><strong>MeRALiON-LLaMA-3-8B-Instruct</strong></td>
<td></td>
<td><strong>0.576</strong></td>
</tr>
<tr>
<td>Llama3-8B-CPT-SEA-LION-v2.1-Instruct</td>
<td><a href="https://huggingface.co/aisingapore/llama3-8b-cpt-sea-lionv2.1-instruct">Link</a></td>
<td>0.560</td>
</tr>
<tr>
<td>Meta-Llama-3.1-8B-Instruct</td>
<td><a href="https://ai.meta.com/blog/meta-llama-3-1/">Link</a></td>
<td>0.548</td>
</tr>
<tr>
<td>Meta-Llama-3-8B-Instruct</td>
<td><a href="https://ai.meta.com/blog/meta-llama-3/">Link</a></td>
<td>0.521</td>
</tr>
<!-- Non-LLaMA Series -->
<tr>
<td rowspan="5">Non-LLaMA Series</td>
<td><strong>GPT4o-0513</strong></td>
<td><a href="https://openai.com/index/hello-gpt-4o/">Link</a></td>
<td><strong>0.760</strong></td>
</tr>
<tr>
<td>Gemma2-9B-CPT-SEA-Lion-v3-Instruct</td>
<td><a href="https://huggingface.co/aisingapore/gemma2-9b-cpt-sea-lionv3-instruct">Link</a></td>
<td>0.626</td>
</tr>
<tr>
<td>Gemma-2-9B-IT</td>
<td><a href="https://huggingface.co/google/gemma-2-9b-it">Link</a></td>
<td>0.621</td>
</tr>
<tr>
<td>Qwen2.5-7B-Instruct</td>
<td><a href="https://huggingface.co/Qwen/Qwen2.5-7B-Instruct">Link</a></td>
<td>0.582</td>
</tr>
<tr>
<td>SeaLLMs-v3-7B-Chat</td>
<td><a href="https://arxiv.org/abs/2407.19672">Link</a></td>
<td>0.541</td>
</tr>
</table>
### CNEval
<table>
<tr>
<th>Model Series</th>
<th>Model</th>
<th>Link</th>
<th>Accuracy</th>
</tr>
<!-- LLaMA Series -->
<tr>
<td rowspan="5">LLaMA Series</td>
<td><strong>MeRALiON-LLaMA-3-8B-Instruct</strong></td>
<td></td>
<td><strong>0.514</strong></td>
</tr>
<tr>
<td>Llama3-8B-CPT-SEA-LION-v2.1-Instruct</td>
<td><a href="https://huggingface.co/aisingapore/llama3-8b-cpt-sea-lionv2.1-instruct">Link</a></td>
<td>0.505</td>
</tr>
<tr>
<td>Llama3-8B-CPT-SEA-Lion-v2-Instruct</td>
<td><a href="https://huggingface.co/aisingapore/llama3-8b-cpt-sea-lionv2.1-instruct">Link</a></td>
<td>0.495</td>
</tr>
<tr>
<td>Meta-Llama-3-8B-Instruct</td>
<td><a href="https://ai.meta.com/blog/meta-llama-3/">Link</a></td>
<td>0.467</td>
</tr>
<tr>
<td>Meta-Llama-3.1-8B-Instruct</td>
<td><a href="https://ai.meta.com/blog/meta-llama-3-1/">Link</a></td>
<td>0.457</td>
</tr>
<!-- Non-LLaMA Series -->
<tr>
<td rowspan="5">Non-LLaMA Series</td>
<td><strong>Qwen2-7B-Instruct</strong></td>
<td><a href="https://huggingface.co/Qwen/Qwen2-7B-Instruct">Link</a></td>
<td><strong>0.829</strong></td>
</tr>
<tr>
<td>GPT4o-0513</td>
<td><a href="https://openai.com/index/hello-gpt-4o/">Link</a></td>
<td>0.81</td>
</tr>
<tr>
<td>Qwen2.5-7B-Instruct</td>
<td><a href="https://huggingface.co/Qwen/Qwen2.5-7B-Instruct">Link</a></td>
<td>0.8</td>
</tr>
<tr>
<td>Gemma2-9B-CPT-SEA-Lion-v3-Instruct</td>
<td><a href="https://huggingface.co/aisingapore/gemma2-9b-cpt-sea-lionv3-instruct">Link</a></td>
<td>0.59</td>
</tr>
<tr>
<td>Gemma-2-9B-IT</td>
<td><a href="https://huggingface.co/google/gemma-2-9b-it">Link</a></td>
<td>0.581</td>
</tr>
</table>
These results collectively show how the MeRALiON-LLaMA-3-8B-Instruct model builds upon the strengths of official Llama-3.1 variants. The techniques we employed can serve as a blueprint, potentially guiding future refinements and adaptations for other models and language sets.
## Instruction-Following
By merging the official Llama-3.1-8B-base and Llama-3.1-8B-instruct weights, we inherit strong instruction-following behavior without additional instruction-tuning steps. The model can follow various user prompts accurately and coherently, producing well-structured, contextually relevant responses.
## Usage
MeRALiON-LLaMA-3-8B-Instruct can be deployed using the 🤗 Transformers library. With careful device mapping and dtype settings, users can achieve efficient and high-quality text generation.
Example:
```python
import transformers
import torch
model_id = "MERaLiON/MeRALiON-LLaMA-3-8B-Instruct"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
messages = [
{"role": "user", "content": "What is the sentiment of the following sentence?\nSentence: This book is incredibly dull.\nAnswer:"},
]
outputs = pipeline(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
```
**Note**: We use same chat format as official llama-3.1-8b-instruct.
## Caveats and Limitations
Like many LLMs, MeRALiON-LLaMA-3-8B-Instruct may hallucinate or produce irrelevant or incorrect content. While we have taken steps to mitigate these issues, users are advised to critically evaluate outputs, especially in high-stakes applications. The model has not undergone explicit safety alignment and filtering; users should implement their own safeguards, content moderation, and evaluation strategies.
## Safety and Liability
This model is not strongly safety-aligned. Users are responsible for implementing their own safety checks and mitigations. The authors and affiliated institutions are not liable for any damages or losses arising from the use of this model.
## Technical Specifications
MeRALiON-LLaMA-3-8B-Instruct underwent continued pretraining using computational resources provided by Singapore NSCC Aspire2A+ and The TPU Research Cloud. We utilized diverse data sources and adaptive strategies to ensure stable training without catastrophic forgetting.
## Data and Licensing
All data used for continued pretraining and model merging adheres to commercially permissible licenses. We have ensured that sources are free of restricted content to the best of our abilities. Details on the dataset and licensing will be provided in the future.
## Call for Contributions
We invite researchers, developers, and community members to contribute by:
- Identifying and reporting issues or biases.
- Providing additional pretraining or instruction data.
- Suggesting enhancements to documentation or evaluation metrics.
- Extending the model to support additional languages or domains.
Please visit our repository for more information and contribution guidelines.
## The Team
- Huang Xin
- Tarun Kumar Vangani
- Minh Duc Pham
- Wang Bin
- Liu Zhengyuan
## Acknowledgements
Our work is supported by the resources and platforms provided by Singapore NSCC Aspire2A+ and The TPU Research Cloud. We thank all contributors and collaborators who have made this effort possible.
## Contact
For additional information or inquiries, please reach out to us via [our contact form](#) (link to be provided) or check the GitHub repository for the latest updates and information.
## Disclaimer
This repository contains the weights for a model not specifically aligned for safety. Users are advised to perform their own due diligence, safety fine-tuning, and compliance measures. The authors disclaim liability for any direct or indirect damages resulting from model use.
``` |