Safetensors
English
Chinese
Indonesian
llama
File size: 15,799 Bytes
d9a4524
ddf04b8
d9a4524
 
 
 
 
 
 
 
9fa4d4d
d9a4524
 
 
13f93e5
d9a4524
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9366368
9e340e4
daeda74
9e340e4
 
 
 
d9a4524
f8b937b
235b918
 
 
 
 
 
 
 
 
 
5197666
72ac29c
5197666
 
72ac29c
5197666
df3e879
 
 
5197666
df3e879
5197666
 
 
 
 
 
 
df3e879
5197666
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
235b918
72ac29c
235b918
 
 
72ac29c
235b918
 
 
 
 
b5908c1
235b918
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f8b937b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3a0d9ab
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
df19a6b
 
 
 
 
 
f14e6e4
df19a6b
f14e6e4
df19a6b
 
f14e6e4
df19a6b
 
f14e6e4
df19a6b
 
f14e6e4
 
 
df19a6b
 
f14e6e4
df19a6b
f14e6e4
df19a6b
 
 
 
f14e6e4
df19a6b
f14e6e4
 
 
 
 
 
df19a6b
 
 
f14e6e4
 
 
 
 
 
df19a6b
f14e6e4
df19a6b
 
f14e6e4
 
 
df19a6b
 
 
 
f14e6e4
df19a6b
 
f14e6e4
 
 
df19a6b
 
f8b937b
9e340e4
d9a4524
 
 
 
 
 
 
ac91b92
d9a4524
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b1c4fe1
d9a4524
 
 
 
 
 
 
b1c4fe1
d9a4524
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0d8c719
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
---
license: mit
language:
- en
- zh
- id
---

# MeRALiON-LLaMA-3-8B-Instruct

**MeRALiON-LLaMA-3-8B-Instruct** is a large language model (LLM) designed to excel in multilingual understanding and instruction-following tasks. This model builds on the Llama-3-8B architecture and continue pretrained from Llama-3-8B-Base, enhanced through an extensive and meticulously curated continued pretraining process and careful merging of model weights.

## Model Overview

MeRALiON-LLaMA-3-8B-Instruct is primarily trained on English, Chinese, and Indonesian, with a particular emphasis on elevating its understanding and generation capabilities in Southeast Asian languages—especially Chinese and Indonesian. By integrating corpus mixing strategies developed for regional multilingual datasets, we carefully diversified the training content through domain classification, hyperparameter tuning, and replay strategies. These measures not only help the model retain knowledge without catastrophic forgetting but also significantly enhance its performance in producing high-quality, contextually accurate responses within these Southeast Asian language contexts.

Key advancements include:

- **Extended Pretraining**: Continued pretraining on over 120 billion tokens of primarily English, Chinese, and Indonesian text.
- **SEA Multilingual Corpus Mixing**: Drawing on strategies from Southeast Asian multilingual corpora to enhance language understanding and generation capabilities.
- **Domain-Diversified Pretraining Corpus**: Careful selection and classification of training data from a wide range of topics and genres.
- **Optimized Training Techniques**: Implementing replay strategies and carefully selected hyperparameters to ensure stability, maintain quality, and avoid catastrophic forgetting.
- **Instruction Tuning via Model Merging**: Rather than a standard instruction-tuning pipeline, this model was derived by merging the official Llama-3.1-8B-base and Llama-3.1-8B-instruct models to produce superior instruction-following capabilities without additional supervised instruction data.

### Highlights

- **Enhanced Performance**: MeRALiON-LLaMA-3-8B-Instruct demonstrates improved results on benchmarks including cross-MMLU, cross-LogiQA, cross-XQuAD, IndoMMLU, and CNEval, surpassing the capabilities of the official Llama-3 models.
- **Extensive Multilingual Support**: Strong coverage of English, Chinese, and Indonesian text, coupled with strategies inspired by Southeast Asian multilingual approaches, ensures robust understanding of and responsiveness to diverse linguistic inputs.

### Model Specifications

- **Model Type**: Decoder
- **Architecture**: Llama-3.1-8B
- **Context Length**: 8192 tokens
- **Languages**: English, Chinese, Indonesian
- **License**: [Llama3 Community License](https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/LICENSE)

## Benchmark Performance

MeRALiON-LLaMA-3-8B-Instruct achieves notable improvements over official Llama-3 base and instruction-tuned models, highlighting the impact of our continued pretraining strategies. Through techniques such as corpus mixing, replay to prevent forgetting, and careful model merging, this model not only enhances general reasoning capabilities but also excels across multilingual and domain-specific benchmarks. In addition, we employed an LLM-based evaluation pipeline to standardize the judging process across varied output formats, ensuring fair and consistent comparisons. Building on the robust instruction-following proficiency of Llama-3.1-8B, MeRALiON-LLaMA-3-8B-Instruct extends its strengths to Southeast Asian languages, including Chinese and Indonesian. 

### **Key highlights from the evaluations include**:

- **Cross-MMLU, Cross-LogiQA**: Enhanced reasoning and question-answering capabilities illustrate that continued pretraining improves multilingual understanding and accuracy over baseline Llama models.
  
- **IndoMMLU and CNEval**: Performance boosts in Indonesian and Chinese benchmarks highlight that careful corpus mixing and replay strategies help maintain and improve language-specific strengths.

### Cross-MMLU
<table>
  <tr>
    <th>Model Series</th>
    <th>Model</th>
    <th>Link</th>
    <th>English</th>
    <th>Chinese</th>
    <th>Indonesian</th>
    <th>Malay</th>
    <th>Avg (En/Zh/Id/Ms)</th>
  </tr>
  <!-- LLaMA Series First -->
  <tr>
    <td rowspan="4">LLaMA Series</td>
    <td><strong>MeRALiON-LLaMA-3-8B-Instruct</strong></td>
    <td></td>
    <td>0.847</td>
    <td>0.693</td>
    <td>0.713</td>
    <td>0.613</td>
    <td>0.717</td>
  </tr>
  <tr>
    <td>Meta-Llama-3.1-8B-Instruct</td>
    <td><a href="https://ai.meta.com/blog/meta-llama-3-1/">Link</a></td>
    <td>0.82</td>
    <td>0.633</td>
    <td>0.66</td>
    <td>0.647</td>
    <td>0.690</td>
  </tr>
  <tr>
    <td>Llama3-8B-CPT-SEA-LION-v2.1-Instruct</td>
    <td><a href="https://huggingface.co/aisingapore/llama3-8b-cpt-sea-lionv2.1-instruct">Link</a></td>
    <td>0.753</td>
    <td>0.667</td>
    <td>0.693</td>
    <td>0.64</td>
    <td>0.688</td>
  </tr>
  <tr>
    <td>Meta-Llama-3-8B-Instruct</td>
    <td><a href="https://ai.meta.com/blog/meta-llama-3/">Link</a></td>
    <td>0.767</td>
    <td>0.653</td>
    <td>0.573</td>
    <td>0.573</td>
    <td>0.642</td>
  </tr>

  <!-- Non-LLaMA Series -->
  <tr>
    <td rowspan="5">Non-LLaMA Series</td>
    <td><strong>GPT4o-0513</strong></td>
    <td><a href="https://openai.com/index/hello-gpt-4o/">Link</a></td>
    <td>0.927</td>
    <td>0.887</td>
    <td>0.88</td>
    <td>0.907</td>
    <td>0.900</td>
  </tr>
  <tr>
    <td>Gemma-2-9B-IT</td>
    <td><a href="https://huggingface.co/google/gemma-2-9b-it">Link</a></td>
    <td>0.84</td>
    <td>0.793</td>
    <td>0.78</td>
    <td>0.747</td>
    <td>0.790</td>
  </tr>
  <tr>
    <td>Gemma2-9B-CPT-SEA-Lion-v3-Instruct</td>
    <td><a href="https://huggingface.co/aisingapore/gemma2-9b-cpt-sea-lionv3-instruct">Link</a></td>
    <td>0.847</td>
    <td>0.787</td>
    <td>0.793</td>
    <td>0.733</td>
    <td>0.790</td>
  </tr>
  <tr>
    <td>Qwen2.5-7B-Instruct</td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-7B-Instruct">Link</a></td>
    <td>0.847</td>
    <td>0.84</td>
    <td>0.753</td>
    <td>0.713</td>
    <td>0.788</td>
  </tr>
  <tr>
    <td>SeaLLMs-v3-7B-Chat</td>
    <td><a href="https://arxiv.org/abs/2407.19672">Link</a></td>
    <td>0.833</td>
    <td>0.727</td>
    <td>0.74</td>
    <td>0.687</td>
    <td>0.747</td>
  </tr>
</table>

### Cross-LogiQA
<table>
  <tr>
    <th>Model Series</th>
    <th>Model</th>
    <th>Link</th>
    <th>English</th>
    <th>Chinese</th>
    <th>Indonesian</th>
    <th>Malay</th>
    <th>Avg (En/Zh/Id/Ms)</th>
  </tr>
  <!-- LLaMA Series -->
  <tr>
    <td rowspan="3">LLaMA Series</td>
    <td>Meta-Llama-3.1-8B-Instruct</td>
    <td><a href="https://ai.meta.com/blog/meta-llama-3-1/">Link</a></td>
    <td>0.585</td>
    <td>0.585</td>
    <td>0.455</td>
    <td>0.523</td>
    <td><strong>0.537</strong></td>
  </tr>
  <tr>
    <td>MeRALiON-LLaMA-3-8B-Instruct</td>
    <td></td>
    <td>0.591</td>
    <td>0.528</td>
    <td>0.494</td>
    <td>0.489</td>
    <td>0.526</td>
  </tr>
  <tr>
    <td>Llama3-8B-CPT-SEA-LION-v2.1-Instruct</td>
    <td><a href="https://huggingface.co/aisingapore/llama3-8b-cpt-sea-lionv2.1-instruct">Link</a></td>
    <td>0.528</td>
    <td>0.517</td>
    <td>0.403</td>
    <td>0.443</td>
    <td>0.473</td>
  </tr>

  <!-- Non-LLaMA Series -->
  <tr>
    <td rowspan="4">Non-LLaMA Series</td>
    <td>Qwen2.5-7B-Instruct</td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-7B-Instruct">Link</a></td>
    <td>0.693</td>
    <td>0.71</td>
    <td>0.631</td>
    <td>0.534</td>
    <td><strong>0.642</strong></td>
  </tr>
  <tr>
    <td>Gemma-2-9B-IT</td>
    <td><a href="https://huggingface.co/google/gemma-2-9b-it">Link</a></td>
    <td>0.659</td>
    <td>0.636</td>
    <td>0.585</td>
    <td>0.602</td>
    <td>0.621</td>
  </tr>
  <tr>
    <td>Gemma2-9B-CPT-SEA-Lion-v3-Instruct</td>
    <td><a href="https://huggingface.co/aisingapore/gemma2-9b-cpt-sea-lionv3-instruct">Link</a></td>
    <td>0.636</td>
    <td>0.642</td>
    <td>0.557</td>
    <td>0.551</td>
    <td>0.597</td>
  </tr>
  <tr>
    <td>SeaLLMs-v3-7B-Chat</td>
    <td><a href="https://arxiv.org/abs/2407.19672">Link</a></td>
    <td>0.568</td>
    <td>0.585</td>
    <td>0.494</td>
    <td>0.517</td>
    <td>0.541</td>
  </tr>
</table>

### IndoMMLU
<table>
  <tr>
    <th>Model Series</th>
    <th>Model</th>
    <th>Link</th>
    <th>Accuracy</th>
  </tr>
  <!-- LLaMA Series -->
  <tr>
    <td rowspan="4">LLaMA Series</td>
    <td><strong>MeRALiON-LLaMA-3-8B-Instruct</strong></td>
    <td></td>
    <td><strong>0.576</strong></td>
  </tr>
  <tr>
    <td>Llama3-8B-CPT-SEA-LION-v2.1-Instruct</td>
    <td><a href="https://huggingface.co/aisingapore/llama3-8b-cpt-sea-lionv2.1-instruct">Link</a></td>
    <td>0.560</td>
  </tr>
  <tr>
    <td>Meta-Llama-3.1-8B-Instruct</td>
    <td><a href="https://ai.meta.com/blog/meta-llama-3-1/">Link</a></td>
    <td>0.548</td>
  </tr>
  <tr>
    <td>Meta-Llama-3-8B-Instruct</td>
    <td><a href="https://ai.meta.com/blog/meta-llama-3/">Link</a></td>
    <td>0.521</td>
  </tr>
  
  <!-- Non-LLaMA Series -->
  <tr>
    <td rowspan="5">Non-LLaMA Series</td>
    <td><strong>GPT4o-0513</strong></td>
    <td><a href="https://openai.com/index/hello-gpt-4o/">Link</a></td>
    <td><strong>0.760</strong></td>
  </tr>
  <tr>
    <td>Gemma2-9B-CPT-SEA-Lion-v3-Instruct</td>
    <td><a href="https://huggingface.co/aisingapore/gemma2-9b-cpt-sea-lionv3-instruct">Link</a></td>
    <td>0.626</td>
  </tr>
  <tr>
    <td>Gemma-2-9B-IT</td>
    <td><a href="https://huggingface.co/google/gemma-2-9b-it">Link</a></td>
    <td>0.621</td>
  </tr>
  <tr>
    <td>Qwen2.5-7B-Instruct</td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-7B-Instruct">Link</a></td>
    <td>0.582</td>
  </tr>
  <tr>
    <td>SeaLLMs-v3-7B-Chat</td>
    <td><a href="https://arxiv.org/abs/2407.19672">Link</a></td>
    <td>0.541</td>
  </tr>
</table>

### CNEval
<table>
  <tr>
    <th>Model Series</th>
    <th>Model</th>
    <th>Link</th>
    <th>Accuracy</th>
  </tr>
  
  <!-- LLaMA Series -->
  <tr>
    <td rowspan="5">LLaMA Series</td>
    <td><strong>MeRALiON-LLaMA-3-8B-Instruct</strong></td>
    <td></td>
    <td><strong>0.514</strong></td>
  </tr>
  <tr>
    <td>Llama3-8B-CPT-SEA-LION-v2.1-Instruct</td>
    <td><a href="https://huggingface.co/aisingapore/llama3-8b-cpt-sea-lionv2.1-instruct">Link</a></td>
    <td>0.505</td>
  </tr>
  <tr>
    <td>Llama3-8B-CPT-SEA-Lion-v2-Instruct</td>
    <td><a href="https://huggingface.co/aisingapore/llama3-8b-cpt-sea-lionv2.1-instruct">Link</a></td>
    <td>0.495</td>
  </tr>
  <tr>
    <td>Meta-Llama-3-8B-Instruct</td>
    <td><a href="https://ai.meta.com/blog/meta-llama-3/">Link</a></td>
    <td>0.467</td>
  </tr>
  <tr>
    <td>Meta-Llama-3.1-8B-Instruct</td>
    <td><a href="https://ai.meta.com/blog/meta-llama-3-1/">Link</a></td>
    <td>0.457</td>
  </tr>
  
  <!-- Non-LLaMA Series -->
  <tr>
    <td rowspan="5">Non-LLaMA Series</td>
    <td><strong>Qwen2-7B-Instruct</strong></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2-7B-Instruct">Link</a></td>
    <td><strong>0.829</strong></td>
  </tr>
  <tr>
    <td>GPT4o-0513</td>
    <td><a href="https://openai.com/index/hello-gpt-4o/">Link</a></td>
    <td>0.81</td>
  </tr>
  <tr>
    <td>Qwen2.5-7B-Instruct</td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-7B-Instruct">Link</a></td>
    <td>0.8</td>
  </tr>
  <tr>
    <td>Gemma2-9B-CPT-SEA-Lion-v3-Instruct</td>
    <td><a href="https://huggingface.co/aisingapore/gemma2-9b-cpt-sea-lionv3-instruct">Link</a></td>
    <td>0.59</td>
  </tr>
  <tr>
    <td>Gemma-2-9B-IT</td>
    <td><a href="https://huggingface.co/google/gemma-2-9b-it">Link</a></td>
    <td>0.581</td>
  </tr>
</table>

These results collectively show how the MeRALiON-LLaMA-3-8B-Instruct model builds upon the strengths of official Llama-3.1 variants. The techniques we employed can serve as a blueprint, potentially guiding future refinements and adaptations for other models and language sets.

## Instruction-Following

By merging the official Llama-3.1-8B-base and Llama-3.1-8B-instruct weights, we inherit strong instruction-following behavior without additional instruction-tuning steps. The model can follow various user prompts accurately and coherently, producing well-structured, contextually relevant responses.

## Usage

MeRALiON-LLaMA-3-8B-Instruct can be deployed using the 🤗 Transformers library. With careful device mapping and dtype settings, users can achieve efficient and high-quality text generation.

Example:
```python
import transformers
import torch

model_id = "MERaLiON/MeRALiON-LLaMA-3-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)
messages = [
    {"role": "user", "content": "What is the sentiment of the following sentence?\nSentence: This book is incredibly dull.\nAnswer:"},
]

outputs = pipeline(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
```

**Note**: We use same chat format as official llama-3.1-8b-instruct.

## Caveats and Limitations

Like many LLMs, MeRALiON-LLaMA-3-8B-Instruct may hallucinate or produce irrelevant or incorrect content. While we have taken steps to mitigate these issues, users are advised to critically evaluate outputs, especially in high-stakes applications. The model has not undergone explicit safety alignment and filtering; users should implement their own safeguards, content moderation, and evaluation strategies.

## Safety and Liability

This model is not strongly safety-aligned. Users are responsible for implementing their own safety checks and mitigations. The authors and affiliated institutions are not liable for any damages or losses arising from the use of this model.

## Technical Specifications

MeRALiON-LLaMA-3-8B-Instruct underwent continued pretraining using computational resources provided by Singapore NSCC Aspire2A+ and The TPU Research Cloud. We utilized diverse data sources and adaptive strategies to ensure stable training without catastrophic forgetting.

## Data and Licensing

All data used for continued pretraining and model merging adheres to commercially permissible licenses. We have ensured that sources are free of restricted content to the best of our abilities. Details on the dataset and licensing will be provided in the future.

## Call for Contributions

We invite researchers, developers, and community members to contribute by:

- Identifying and reporting issues or biases.
- Providing additional pretraining or instruction data.
- Suggesting enhancements to documentation or evaluation metrics.
- Extending the model to support additional languages or domains.

Please visit our repository for more information and contribution guidelines.

## The Team

- Huang Xin  
- Tarun Kumar Vangani  
- Minh Duc Pham  
- Wang Bin  
- Liu Zhengyuan

## Acknowledgements

Our work is supported by the resources and platforms provided by Singapore NSCC Aspire2A+ and The TPU Research Cloud. We thank all contributors and collaborators who have made this effort possible.

## Contact

For additional information or inquiries, please reach out to us via [our contact form](#) (link to be provided) or check the GitHub repository for the latest updates and information.

## Disclaimer

This repository contains the weights for a model not specifically aligned for safety. Users are advised to perform their own due diligence, safety fine-tuning, and compliance measures. The authors disclaim liability for any direct or indirect damages resulting from model use.
```