# Model description | |
We pretrained a RoBERTa-based Japanese masked language model on paper abstracts from the academic database CiNii Articles. | |
[A Japanese Masked Language Model for Academic Domain](https://aclanthology.org/2022.sdp-1.16/) | |
# Vocabulary | |
The vocabulary consists of 32000 tokens including subwords induced by the unigram language model of sentencepiece. | |
--- | |
license: apache-2.0 <br> | |
language:ja | |
--- |