File size: 7,087 Bytes
52649bc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ce04b71
52649bc
ce04b71
52649bc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
license: apache-2.0
base_model:
- Rakuten/RakutenAI-7B
---
---
license: apache-2.0
---
# RakutenAI-2.0-8x7B
## Model Description
RakutenAI-2.0-8x7B is an MoE-based foundation model derived from [RakutenAI-7B](https://huggingface.co/Rakuten/RakutenAI-7B), first introduced in March 2024. As part of a broader initiative to advance Japanese LLM technology, RakutenAI-2.0-8x7B adopts a Mixture of Experts (MoE) architecture with two active experts, resulting in **13B active parameters**. This design enables dynamic expert selection based on input tokens, enhancing computational efficiency while maintaining high performance. RakutenAI-2.0-8x7B achieves state-of-the-art results on Japanese language understanding benchmarks while also demonstrating competitive performance on English evaluation tasks compared to similar models, including Swallow-MX-8x7B-NVE-0.1, Llama-3-Swallow-70B-v0.1, Sarashina2-70B, and PLaMo 100B.

*If you are looking for an instruction-tuned model, check [RakutenAI-2.0-8x7B-instruct](https://huggingface.co/Rakuten/RakutenAI-2.0-8x7B-instruct)*.

## Model Evaluation Results

| Foundation Model Name                          | Japanese Score | English Score | Average |
|-----------------------------------------------|---------------|--------------|---------|
| Rakuten/RakutenAI-7B                         | 62.93         | 34.86        | 48.90   |
| **Rakuten/RakutenAI-2.0-8x7B**                   | **72.29**         | 41.32        | 56.80   |
| Tokyotech/Swallow-MX-8x7B-NVE-0.1             | 66.17         | 44.33        | 55.25   |
| Tokyotech/Llama-3-Swallow-70B-v0.1            | 68.15         | **51.52**        | **59.84**   |
| SBIntuitions/Sarashina2-70B                   | 71.09         | 39.22        | 55.16   |
| PreferredNetworks/PLaMo 100B                  | 71.45         | 36.48        | 53.96   |

<div style="text-align: center;">Table1: RakutenAI-2.0-8x7B foundation model average performance scores on LM-Harness in comparison with other Japanese open models.</div>

Detailed scores are as follows:

| Metric           | jcommonsense_qa | jnli  | marc_ja | jsquad | jaqket_v2 | xlsum_ja | xwinograd | mgsm  | arc_challenge | hellaswag | mmlu  | truthfulqa_mc2 | gsm8k | winogrande | musr  | math_hard | gpqa  | bbh   | ifeval | mmlu_pro |
|----------------------|-----------------|-------|---------|--------|-----------|----------|-----------|-------|---------------|-----------|-------|----------------|-------|------------|-------|-----------|-------|-------|--------|----------|
| **Model Name**         | accuracy-3shot           | accuracy-3shot | accuracy-3shot   | exact_match-2shot  | exact_match-1shot     | rouge2-1shot    | accuracy-0shot     | accuracy-5shot | accuracy_norm-25shot         | accuracy_norm-10shot     | accuracy-5shot | accuracy-0shot          | exact_match-5shot  | accuracy-5shot      | accuracy_norm-0shot  | exact_match-4shot      | accuracy_norm-0shot  | accuracy_norm-3shot | avg_inst_prompt_strict_acc-0shot  | accuracy-5shot    |
| RakutenAI-7B         | 85.88           | 56.61 | 96.52   | 69.56  | 81.44     | 15.69    | 74.14     | 23.60 | 60.75         | 82.26     | 59.83 | 38.33          | 32.6  | 77.43      | 4.93  | 2.16      | 5.02  | 20.34 | 14.04  | 20.57    |
| RakutenAI-2.0-8x7B   | 93.12           | 87.43 | 97.72   | 74.49  | 86.00     | 15.70    | 78.62     | 45.20 | 66.38         | 85.84     | 65.50 | 48.19          | 51.40 | 80.51      | 13.88 | 3.30      | 5.71  | 27.02 | 22.90  | 25.22    |
| Swallow-MX-8x7B-NVE-0.1              | 89.28           | 43.06 | 97.15   | 76.29  | 87.37     | 17.09    | 82.69     | 40.40 | 65.87         | 85.13     | 69.48 | 50.38          | 58.45 | 82.87      | 8.78  | 7.50      | 13.33 | 29.41 | 28.38  | 32.32    |
| Llama-3-Swallow-70B-v0.1              | 92.58           | 66.15 | 93.46   | 70.94  | 71.74     | 12.58    | 83.32     | 54.40 | 67.58         | 87.53     | 77.47 | 55.29          | 81.50 | 85.16      | 22.05 | 13.92     | 16.60 | 49.53 | 20.91  | 40.70    |
| Sarashina2-70B              | 95.35           | 60.44 | 94.50   | 76.90  | 88.49     | 18.24    | 80.81     | 54.00 | 62.63         | 83.23     | 63.10 | 48.68          | 24.49 | 79.95      | 13.52 | 5.29      | 5.54  | 29.73 | 30.32  | 24.13    |
| PLaMo 100B              | 92.05           | 68.82 | 97.49   | 78.01  | 89.43     | 20.38    | 81.02     | 44.40 | 49.91         | 80.98     | 55.17 | 44.91          | 56.10 | 71.35      | 6.67  | 0.00      | 4.00  | 23.99 | 23.39  | 21.31    |

<div style="text-align: center;">Table2: RakutenAI-2.0-8x7B foundation model performance on LM-Harness metrics in comparison with other Japanese open models.</div>

## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "Rakuten/RakutenAI-2.0-8x7B"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype="auto", device_map="auto")
model.eval()

requests = [
    "南硫黄島原生自然環境保全地域は、自然",
    "The capybara is a giant cavy rodent",
]

for req in requests:
    input_text = tokenizer(req, return_tensors="pt").to(device=model.device)
    tokens = model.generate(
        **input_text,
        max_new_tokens=512,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
    )
    out = tokenizer.decode(tokens[0], skip_special_tokens=True)
    print("INPUT:\n" + req)
    print("OUTPUT:\n" + out)

```
**Note on Evaluation Scores:**
- Evaluation tests were carried out on LM Evaluation Harness during October - December 2024. We use default task definitions from the following commit: https://github.com/EleutherAI/lm-evaluation-harness/commit/26f607f5432e1d09c55b25488c43523e7ecde657
- The tasks considered for Japanese evaluations are listed here: https://github.com/EleutherAI/lm-evaluation-harness/blob/26f607f5432e1d09c55b25488c43523e7ecde657/lm_eval/tasks/japanese_leaderboard/README.md
- The tasks considered for English evaluations are listed here: https://huggingface.co/docs/leaderboards/en/open_llm_leaderboard/archive
https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/leaderboard/README.md

## Model Details

* **Developed by**: [Rakuten Group, Inc.](https://ai.rakuten.com/)
* **Language(s)**: Japanese, English
* **License**: This model is licensed under [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
* **Model Architecture**: Mixture of Experts (2 active experts)

### Limitations and Bias

The suite of RakutenAI-2.0 models is capable of generating human-like text on a wide range of topics. However, like all LLMs, they have limitations and can produce biased, inaccurate, or unsafe outputs. Please exercise caution and judgement while interacting with them.

## Citation
For citing our work on the suite of RakutenAI-2.0 models, please use: 

```
@misc{rakutengroup2025rakutenai2.0,
  author = {Rakuten Group, Inc.},
  title = {RakutenAI-2.0},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Rakuten},
}

```