ptaszynski's picture
Update README.md
dbc2b61
|
raw
history blame
450 Bytes
metadata
language: ja
license: cc-by-sa-4.0
datasets:
  - YACIS corpus

yacis-electra-small

This is ELECTRA Small model for Japanese pretrained on 354 million sentences / 5.6 billion words of YACIS blog corpus.

The corpus was tokenized for pretraining with MeCab. Subword tokenization was peroformed with WordPiece.