lyu-boxuan
commited on
Commit
•
21ca12e
1
Parent(s):
4eaed6f
Update README.md
Browse files
README.md
CHANGED
@@ -13,31 +13,11 @@ tags:
|
|
13 |
---
|
14 |
|
15 |
# Overview
|
16 |
-
The model is based on rinna's [rinna/llama-3-youko-8b], fine-tuned using LoRA on a small number of parallel sentences from English to Japanese. The model has a COMET (Unbabel/wmt22-comet-da) of 0.9011 on flores200 devtest.
|
17 |
|
18 |
* **Model architecture**
|
19 |
|
20 |
A 32-layer, 4096-hidden-size transformer-based language model. Refer to the [Llama 3 Model Card](https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md) for architecture details.
|
21 |
-
* **Training: Built with Meta Llama 3**
|
22 |
-
|
23 |
-
The model was initialized with the [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) model and continually trained on around **22B** tokens from a mixture of the following corpora
|
24 |
-
- [Japanese CC-100](https://huggingface.co/datasets/cc100)
|
25 |
-
- [Japanese C4](https://huggingface.co/datasets/mc4)
|
26 |
-
- [Japanese OSCAR](https://huggingface.co/datasets/oscar-corpus/colossal-oscar-1.0)
|
27 |
-
- [The Pile](https://huggingface.co/datasets/EleutherAI/pile)
|
28 |
-
- [Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
|
29 |
-
- rinna curated Japanese dataset
|
30 |
-
|
31 |
-
* **Contributors**
|
32 |
-
- [Koh Mitsuda](https://huggingface.co/mitsu-koh)
|
33 |
-
- [Kei Sawada](https://huggingface.co/keisawada)
|
34 |
-
|
35 |
-
---
|
36 |
-
|
37 |
-
# Benchmarking
|
38 |
-
|
39 |
-
Please refer to [rinna's LM benchmark page](https://rinnakk.github.io/research/benchmarks/lm/index.html).
|
40 |
-
|
41 |
---
|
42 |
|
43 |
# How to use the model
|
|
|
13 |
---
|
14 |
|
15 |
# Overview
|
16 |
+
The model is based on rinna's [rinna/llama-3-youko-8b], fine-tuned using LoRA on a small number of parallel sentences from English to Japanese. The model has a COMET (Unbabel/wmt22-comet-da) of 0.9011 and BLEU ("tok": "ja-mecab-0.996-IPA") of 33.1 on flores200 devtest.
|
17 |
|
18 |
* **Model architecture**
|
19 |
|
20 |
A 32-layer, 4096-hidden-size transformer-based language model. Refer to the [Llama 3 Model Card](https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md) for architecture details.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
---
|
22 |
|
23 |
# How to use the model
|