lyu-boxuan
/

llama-3-youko-8b-En-Ja-MT-LoRA

text-generation

machine translation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

lyu-boxuan commited on May 10

Commit

21ca12e

•

1 Parent(s): 4eaed6f

Update README.md

Files changed (1) hide show

README.md +1 -21

README.md CHANGED Viewed

@@ -13,31 +13,11 @@ tags:
 ---
 # Overview
-The model is based on rinna's [rinna/llama-3-youko-8b], fine-tuned using LoRA on a small number of parallel sentences from English to Japanese. The model has a COMET (Unbabel/wmt22-comet-da) of 0.9011 on flores200 devtest.
 * **Model architecture**
     A 32-layer, 4096-hidden-size transformer-based language model. Refer to the [Llama 3 Model Card](https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md) for architecture details.
-* **Training: Built with Meta Llama 3**
-    The model was initialized with the [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) model and continually trained on around **22B** tokens from a mixture of the following corpora
-    - [Japanese CC-100](https://huggingface.co/datasets/cc100)
-    - [Japanese C4](https://huggingface.co/datasets/mc4)
-    - [Japanese OSCAR](https://huggingface.co/datasets/oscar-corpus/colossal-oscar-1.0)
-    - [The Pile](https://huggingface.co/datasets/EleutherAI/pile)
-    - [Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
-    - rinna curated Japanese dataset
-* **Contributors**
-    - [Koh Mitsuda](https://huggingface.co/mitsu-koh)
-    - [Kei Sawada](https://huggingface.co/keisawada)
----
-# Benchmarking
-Please refer to [rinna's LM benchmark page](https://rinnakk.github.io/research/benchmarks/lm/index.html).
 ---
 # How to use the model

 ---
 # Overview
+The model is based on rinna's [rinna/llama-3-youko-8b], fine-tuned using LoRA on a small number of parallel sentences from English to Japanese. The model has a COMET (Unbabel/wmt22-comet-da) of 0.9011 and BLEU ("tok": "ja-mecab-0.996-IPA") of 33.1 on flores200 devtest.
 * **Model architecture**
     A 32-layer, 4096-hidden-size transformer-based language model. Refer to the [Llama 3 Model Card](https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md) for architecture details.
 ---
 # How to use the model