Update README.md

dbc2b61 almost 3 years ago

450 Bytes

metadata

language: ja
license: cc-by-sa-4.0
datasets:
  - YACIS corpus

yacis-electra-small

This is ELECTRA Small model for Japanese pretrained on 354 million sentences / 5.6 billion words of YACIS blog corpus.

The corpus was tokenized for pretraining with MeCab. Subword tokenization was peroformed with WordPiece.