Update README.md
Browse files
README.md
CHANGED
@@ -11,6 +11,39 @@ This model is based on the [modernBERT-base](https://arxiv.org/abs/2412.13663) a
|
|
11 |
It was trained using the Japanese subset (3.4TB) of the llm-jp-corpus v4 and supports a max sequence length of 8192.
|
12 |
|
13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
## Training
|
15 |
|
16 |
This model was trained with a max_seq_len of 1024 in stage 1, and then with a max_seq_len of 8192 in stage 2.
|
|
|
11 |
It was trained using the Japanese subset (3.4TB) of the llm-jp-corpus v4 and supports a max sequence length of 8192.
|
12 |
|
13 |
|
14 |
+
## Usage
|
15 |
+
|
16 |
+
Please install the transformers library.
|
17 |
+
```bash
|
18 |
+
pip install "transformers>=4.48.0"
|
19 |
+
```
|
20 |
+
|
21 |
+
If your GPU supports flash-attn 2, it is recommended to install flash-attn.
|
22 |
+
```
|
23 |
+
pip install flash-attn --no-build-isolation
|
24 |
+
```
|
25 |
+
|
26 |
+
Using AutoModelForMaskedLM:
|
27 |
+
```
|
28 |
+
from transformers import AutoTokenizer, AutoModelForMaskedLM
|
29 |
+
|
30 |
+
model_id = "speed/llm-jp-modernbert-base-v4-ja-stage2-200k"
|
31 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
32 |
+
model = AutoModelForMaskedLM.from_pretrained(model_id)
|
33 |
+
|
34 |
+
text = "日本の首都は<MASK|LLM-jp>です。"
|
35 |
+
inputs = tokenizer(text, return_tensors="pt")
|
36 |
+
outputs = model(**inputs)
|
37 |
+
|
38 |
+
# To get predictions for the mask:
|
39 |
+
masked_index = inputs["input_ids"][0].tolist().index(tokenizer.mask_token_id)
|
40 |
+
predicted_token_id = outputs.logits[0, masked_index].argmax(axis=-1)
|
41 |
+
predicted_token = tokenizer.decode(predicted_token_id)
|
42 |
+
print("Predicted token:", predicted_token)
|
43 |
+
# Predicted token: 東京
|
44 |
+
```
|
45 |
+
|
46 |
+
|
47 |
## Training
|
48 |
|
49 |
This model was trained with a max_seq_len of 1024 in stage 1, and then with a max_seq_len of 8192 in stage 2.
|