Model description
We pretrained a RoBERTa-based Japanese masked language model on paper abstracts from the academic database CiNii Articles.
A Japanese Masked Language Model for Academic Domain
Vocabulary
The vocabulary consists of 32000 tokens including subwords induced by the unigram language model of sentencepiece.