bert-ko-base / README.md
seopbo's picture
Upload from seopbo
b34d35d
|
raw
history blame
993 Bytes
metadata
license: apache-2.0
language: ko
tags:
  - fill-mask
  - korean
  - lassl
mask_token: '[MASK]'
widget:
  - text: 대한민국의 수도는 [MASK] 입니다.

LASSL bert-ko-base

This model was trained from 702,437 examples (whose have 3,596,465,664 tokens). 702,437 examples are extracted from below corpora. If you want to get information for training, you should see config.json.

corpora/
├── [707M]  kowiki_latest.txt
├── [ 26M]  modu_dialogue_v1.2.txt
├── [1.3G]  modu_news_v1.1.txt
├── [9.7G]  modu_news_v2.0.txt
├── [ 15M]  modu_np_v1.1.txt
├── [1008M]  modu_spoken_v1.2.txt
├── [6.5G]  modu_written_v1.0.txt
└── [413M]  petition.txt

Evaulation results will be released soon.

How to use

from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("lassl/bert-ko-base")
tokenizer = AutoTokenizer.from_pretrained("lassl/bert-ko-base")