update the technical report link and lm-harness scores
Browse files
README.md
CHANGED
@@ -5,10 +5,48 @@ license: apache-2.0
|
|
5 |
## Model Description
|
6 |
RakutenAI-7B is a systematic initiative that brings the latest technologies to the world of Japanese LLMs. RakutenAI-7B achieves the best scores on the Japanese language understanding benchmarks while maintaining a competitive performance on the English test sets among similar models such as OpenCalm, Elyza, Youri, Nekomata and Swallow. RakutenAI-7B leverages the Mistral model architecture and is based on [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) pre-trained checkpoint, exemplifying a successful retrofitting of the pre-trained model weights. Moreover, we extend Mistral's vocabulary from 32k to 48k to offer a better character-per-token rate for Japanese.
|
7 |
|
|
|
|
|
8 |
*If you are looking for a foundation model, check [RakutenAI-7B](https://huggingface.co/Rakuten/RakutenAI-7B)*.
|
9 |
|
10 |
*If you are looking for a chat-tuned model, check [RakutenAI-7B-chat](https://huggingface.co/Rakuten/RakutenAI-7B-chat)*.
|
11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
## Usage
|
13 |
|
14 |
```python
|
@@ -66,11 +104,11 @@ The suite of RakutenAI-7B models is capable of generating human-like text on a w
|
|
66 |
For citing our work on the suite of RakutenAI-7B models, please use:
|
67 |
|
68 |
```
|
69 |
-
@misc{
|
70 |
title={RakutenAI-7B: Extending Large Language Models for Japanese},
|
71 |
-
author={Rakuten Group, Inc.},
|
72 |
year={2024},
|
73 |
-
eprint={},
|
74 |
archivePrefix={arXiv},
|
75 |
primaryClass={cs.CL}
|
76 |
}
|
|
|
5 |
## Model Description
|
6 |
RakutenAI-7B is a systematic initiative that brings the latest technologies to the world of Japanese LLMs. RakutenAI-7B achieves the best scores on the Japanese language understanding benchmarks while maintaining a competitive performance on the English test sets among similar models such as OpenCalm, Elyza, Youri, Nekomata and Swallow. RakutenAI-7B leverages the Mistral model architecture and is based on [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) pre-trained checkpoint, exemplifying a successful retrofitting of the pre-trained model weights. Moreover, we extend Mistral's vocabulary from 32k to 48k to offer a better character-per-token rate for Japanese.
|
7 |
|
8 |
+
*The technical report can be accessed at [arXiv](https://arxiv.org/abs/2403.15484).*
|
9 |
+
|
10 |
*If you are looking for a foundation model, check [RakutenAI-7B](https://huggingface.co/Rakuten/RakutenAI-7B)*.
|
11 |
|
12 |
*If you are looking for a chat-tuned model, check [RakutenAI-7B-chat](https://huggingface.co/Rakuten/RakutenAI-7B-chat)*.
|
13 |
|
14 |
+
## Model Evaluation Results
|
15 |
+
|
16 |
+
| Model Name | 7-Avg. excl. XLSum-ja | Avg. | JCS | JNLI | MARC-ja | JSQuAD | Jaqket v2 | XLSum-ja | xWino | MGSM |
|
17 |
+
|-------------------------------|:--------:|:-----:|:-------:|:-------:|:-------:|:-------:|:---------:|:--------:|:------:|:-------:|
|
18 |
+
| | | | accuracy | accuracy | accuracy | exact-match | exact-match | rouge-2 | accuracy | accuracy |
|
19 |
+
| | | | 3-shots | 3-shots | 3-shots | 2-shots | 1-shot | 1-shot | 0-shot | 5-shots |
|
20 |
+
| rakuten-ai-7b-instruct | 77.32 | 68.74 | 93.03 | 90.39 | 96.00 | 80.44 | 81.79 | 8.67 | 75.18 | 24.40 |
|
21 |
+
| youri-7b-instruction | 73.35 | 66.84 | 86.06 | 70.13 | 97.03 | 82.53 | 79.47 | 21.29 | 79.04 | 19.20 |
|
22 |
+
| japanese-stablelm-instruct-gamma-7b | 65.46 | 59.98 | 83.82 | 16.97 | 95.68 | 76.20 | 81.87 | 21.58 | 82.06 | 21.60 |
|
23 |
+
| swallow-7b-instruct | 64.29 | 58.25 | 83.38 | 26.50 | 94.46 | 75.62 | 81.01 | 16.01 | 76.23 | 12.80 |
|
24 |
+
| elyza-japanese-Llama-2-7b-instruct | 60.04 | 53.19 | 65.15 | 57.44 | 91.51 | 67.29 | 58.51 | 5.20 | 70.80 | 9.60 |
|
25 |
+
| elyza-japanese-Llama-2-7b-fast-instruct | 57.22 | 50.48 | 70.69 | 36.48 | 92.75 | 68.87 | 62.29 | 3.36 | 59.44 | 10.00 |
|
26 |
+
| nekomata-7b-instruction | 49.04 | 44.14 | 85.08 | 42.48 | 96.99 | 8.51 | 10.91 | 9.81 | 76.12 | 23.20 |
|
27 |
+
|
28 |
+
<div style="text-align: center;">Table1: RakutenAI-7B-instruct model performance on Japanese LM-Harness metrics in comparison with other models.</div>
|
29 |
+
|
30 |
+
Our model achieves the highest average score, more than 3 points ahead of the next best model. The models are sorted by 7-Avg. We use the following commit https://github.com/Stability-AI/lm-evaluation-harness/tree/0fa86429679f521161d5b81a94c0c385e0a0976d for Japanese LM-Harness with v0.3 prompt version.
|
31 |
+
|
32 |
+
| Model Name | Avg. | ARC | HellaSwag | MMLU | TruthfulQA |
|
33 |
+
|---------------------------------|:----------------:|:------------------------:|:------------------------:|:-----------------------:|:-----------------------:|
|
34 |
+
| | | accuracy | accuracy | accuracy | accuracy |
|
35 |
+
| | | 25-shots | 10-shots | 5-shots | 6-shots |
|
36 |
+
| rakuten-ai-7b-instruct | 61.32 | 58.62 | 82.70 | 60.32 | 43.63 |
|
37 |
+
| japanese-stablelm-instruct-gamma-7b | 55.91 | 50.43 | 77.10 | 54.61 | 41.50 |
|
38 |
+
| elyza-japanese-Llama-2-7b-fast-instruct | 54.21 | 53.58 | 77.69 | 46.91 | 38.67 |
|
39 |
+
| elyza-japanese-Llama-2-7b-instruct | 54.07 | 52.05 | 78.33 | 47.09 | 38.83 |
|
40 |
+
| nekomata-7b-instruction | 52.84 | 50.34 | 73.67 | 48.53 | 38.81 |
|
41 |
+
| youri-7b-instruction | 52.11 | 48.98 | 75.66 | 45.41 | 38.38 |
|
42 |
+
| swallow-7b-instruct | 50.32 | 47.61 | 72.27 | 40.77 | 40.62 |
|
43 |
+
|
44 |
+
<div style="text-align: center;">Table2: RakutenAI-7B-instruct model performance on English LM-Harness metrics in comparison with other models. </div>
|
45 |
+
|
46 |
+
Our model achieves the highest average score, more than 5 points ahead of the next best model. We use the following commit for English LM-Harness https://github.com/EleutherAI/lm-evaluation-harness/tree/b281b0921b636bc36ad05c0b0b0763bd6dd43463.
|
47 |
+
|
48 |
+
An independent evaluation by Kamata et.al. for [Nejumi LLMリーダーボード Neo](https://wandb.ai/wandb-japan/llm-leaderboard/reports/Nejumi-LLM-Neo--Vmlldzo2MTkyMTU0#総合評価) using a weighted average of [llm-jp-eval](https://github.com/llm-jp/llm-jp-eval) and [Japanese MT-bench](https://github.com/Stability-AI/FastChat/tree/jp-stable/fastchat/llm_judge) also confirms the highest performance of instruct/chat versions of RakutenAI-7B.
|
49 |
+
|
50 |
## Usage
|
51 |
|
52 |
```python
|
|
|
104 |
For citing our work on the suite of RakutenAI-7B models, please use:
|
105 |
|
106 |
```
|
107 |
+
@misc{rakutengroup2024rakutenai7b,
|
108 |
title={RakutenAI-7B: Extending Large Language Models for Japanese},
|
109 |
+
author={{Rakuten Group, Inc.} and Aaron Levine and Connie Huang and Chenguang Wang and Eduardo Batista and Ewa Szymanska and Hongyi Ding and Hou Wei Chou and Jean-François Pessiot and Johanes Effendi and Justin Chiu and Kai Torben Ohlhus and Karan Chopra and Keiji Shinzato and Koji Murakami and Lee Xiong and Lei Chen and Maki Kubota and Maksim Tkachenko and Miroku Lee and Naoki Takahashi and Prathyusha Jwalapuram and Ryutaro Tatsushima and Saurabh Jain and Sunil Kumar Yadav and Ting Cai and Wei-Te Chen and Yandi Xia and Yuki Nakayama and Yutaka Higashiyama},
|
110 |
year={2024},
|
111 |
+
eprint={2403.15484},
|
112 |
archivePrefix={arXiv},
|
113 |
primaryClass={cs.CL}
|
114 |
}
|