lyu-boxuan commited on
Commit
21ca12e
1 Parent(s): 4eaed6f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -21
README.md CHANGED
@@ -13,31 +13,11 @@ tags:
13
  ---
14
 
15
  # Overview
16
- The model is based on rinna's [rinna/llama-3-youko-8b], fine-tuned using LoRA on a small number of parallel sentences from English to Japanese. The model has a COMET (Unbabel/wmt22-comet-da) of 0.9011 on flores200 devtest.
17
 
18
  * **Model architecture**
19
 
20
  A 32-layer, 4096-hidden-size transformer-based language model. Refer to the [Llama 3 Model Card](https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md) for architecture details.
21
- * **Training: Built with Meta Llama 3**
22
-
23
- The model was initialized with the [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) model and continually trained on around **22B** tokens from a mixture of the following corpora
24
- - [Japanese CC-100](https://huggingface.co/datasets/cc100)
25
- - [Japanese C4](https://huggingface.co/datasets/mc4)
26
- - [Japanese OSCAR](https://huggingface.co/datasets/oscar-corpus/colossal-oscar-1.0)
27
- - [The Pile](https://huggingface.co/datasets/EleutherAI/pile)
28
- - [Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
29
- - rinna curated Japanese dataset
30
-
31
- * **Contributors**
32
- - [Koh Mitsuda](https://huggingface.co/mitsu-koh)
33
- - [Kei Sawada](https://huggingface.co/keisawada)
34
-
35
- ---
36
-
37
- # Benchmarking
38
-
39
- Please refer to [rinna's LM benchmark page](https://rinnakk.github.io/research/benchmarks/lm/index.html).
40
-
41
  ---
42
 
43
  # How to use the model
 
13
  ---
14
 
15
  # Overview
16
+ The model is based on rinna's [rinna/llama-3-youko-8b], fine-tuned using LoRA on a small number of parallel sentences from English to Japanese. The model has a COMET (Unbabel/wmt22-comet-da) of 0.9011 and BLEU ("tok": "ja-mecab-0.996-IPA") of 33.1 on flores200 devtest.
17
 
18
  * **Model architecture**
19
 
20
  A 32-layer, 4096-hidden-size transformer-based language model. Refer to the [Llama 3 Model Card](https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md) for architecture details.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ---
22
 
23
  # How to use the model