llm-jp
/

llm-jp-modernbert-base

Model card Files Files and versions Community

speed commited on Mar 19

Commit

d6a6844

·

verified ·

1 Parent(s): 1cf964c

Update README.md

Files changed (1) hide show

README.md +33 -0

README.md CHANGED Viewed

@@ -11,6 +11,39 @@ This model is based on the [modernBERT-base](https://arxiv.org/abs/2412.13663) a
 It was trained using the Japanese subset (3.4TB) of the llm-jp-corpus v4 and supports a max sequence length of 8192.
 ## Training
 This model was trained with a max_seq_len of 1024 in stage 1, and then with a max_seq_len of 8192 in stage 2.

 It was trained using the Japanese subset (3.4TB) of the llm-jp-corpus v4 and supports a max sequence length of 8192.
+## Usage
+Please install the transformers library.
+```bash
+pip install "transformers>=4.48.0"
+```
+If your GPU supports flash-attn 2, it is recommended to install flash-attn.
+```
+pip install flash-attn --no-build-isolation
+```
+Using AutoModelForMaskedLM:
+```
+from transformers import AutoTokenizer, AutoModelForMaskedLM
+model_id = "speed/llm-jp-modernbert-base-v4-ja-stage2-200k"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForMaskedLM.from_pretrained(model_id)
+text = "日本の首都は<MASK|LLM-jp>です。"
+inputs = tokenizer(text, return_tensors="pt")
+outputs = model(**inputs)
+# To get predictions for the mask:
+masked_index = inputs["input_ids"][0].tolist().index(tokenizer.mask_token_id)
+predicted_token_id = outputs.logits[0, masked_index].argmax(axis=-1)
+predicted_token = tokenizer.decode(predicted_token_id)
+print("Predicted token:", predicted_token)
+# Predicted token:  東京
+```
 ## Training
 This model was trained with a max_seq_len of 1024 in stage 1, and then with a max_seq_len of 8192 in stage 2.