metadata
language: ja
license: cc-by-sa-4.0
datasets:
- YACIS corpus
yacis-electra-small
This is ELECTRA Small model for Japanese pretrained on 354 million sentences / 5.6 billion words of YACIS blog corpus.
The corpus was tokenized for pretraining with MeCab. Subword tokenization was peroformed with WordPiece.