language: ja
license: cc-by-sa-4.0
datasets:
- YACIS corpus
yacis-electra-small
This is ELECTRA Small model for Japanese pretrained on 354 million sentences / 5.6 billion words of YACIS blog corpus.
The corpus was tokenized for pretraining with MeCab. Subword tokenization was peroformed with WordPiece.
Model architecture
This model uses the original ELECTRA Small model; 12 layers, 128 dimensions of hidden states, and 12 attention heads.
Vocabulary size was 32,000 tokens.
Licenses
The pretrained model with all attached files is distributed under the terms of the CC BY-SA 4.0 license.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Citations
Please, cite the model using the following citation.
@inproceedings{ptaszynski2012yacis,
title={YACIS: A five-billion-word corpus of Japanese blogs fully annotated with syntactic and affective information},
author={Ptaszynski, Michal and Dybala, Pawel and Rzepka, Rafal and Araki, Kenji and Momouchi, Yoshio},
booktitle={Proceedings of the AISB/IACAP world congress},
pages={40--49},
year={2012}
}
The model was build using sentences from YACIS corpus, which should be cited using at least one of the following refrences.
@inproceedings{ptaszynski2012yacis,
title={YACIS: A five-billion-word corpus of Japanese blogs fully annotated with syntactic and affective information},
author={Ptaszynski, Michal and Dybala, Pawel and Rzepka, Rafal and Araki, Kenji and Momouchi, Yoshio},
booktitle={Proceedings of the AISB/IACAP world congress},
pages={40--49},
year={2012},
howpublished = "\url{https://github.com/ptaszynski/yacis-corpus}"
}
@article{ptaszynski2014automatically,
title={Automatically annotating a five-billion-word corpus of Japanese blogs for sentiment and affect analysis},
author={Ptaszynski, Michal and Rzepka, Rafal and Araki, Kenji and Momouchi, Yoshio},
journal={Computer Speech \& Language},
volume={28},
number={1},
pages={38--55},
year={2014},
publisher={Elsevier},
howpublished = "\url{https://github.com/ptaszynski/yacis-corpus}"
}