metadata
license: mit
language:
- ko
Kconvo-roberta: Korean conversation RoBERTa (github)
- There are many PLMs (Pretrained Language Models) for Korean, but most of them are trained with written language.
- Here, we introduce a retrained PLM for prediction of Korean conversation data where we use verbal data for training.
Usage
# Kconvo-roberta
from transformers import RobertaTokenizerFast, RobertaModel
tokenizer_roberta = RobertaTokenizerFast.from_pretrained("yeongjoon/Kconvo-roberta")
model_roberta = RobertaModel.from_pretrained("yeongjoon/Kconvo-roberta")
Domain Robust Retraining of Pretrained Language Model
- Kconvo-roberta uses klue/roberta-base as the base model and retrained additionaly with the conversation dataset.
- The retrained dataset was collected through the National Institute of the Korean Language and AI-Hub, and the collected dataset is as follows.
- National Institute of the Korean Language
* ์จ๋ผ์ธ ๋ํ ๋ง๋ญ์น 2021
* ์ผ์ ๋ํ ๋ง๋ญ์น 2020
* ๊ตฌ์ด ๋ง๋ญ์น
* ๋ฉ์ ์ ๋ง๋ญ์น
- AI-Hub
* ์จ๋ผ์ธ ๊ตฌ์ด์ฒด ๋ง๋ญ์น ๋ฐ์ดํฐ
* ์๋ด ์์ฑ
* ํ๊ตญ์ด ์์ฑ
* ์์ ๋ํ ์์ฑ(์ผ๋ฐ๋จ์ฌ)
* ์ผ์์ํ ๋ฐ ๊ตฌ์ด์ฒด ํ-์ ๋ฒ์ญ ๋ณ๋ ฌ ๋ง๋ญ์น ๋ฐ์ดํฐ
* ํ๊ตญ์ธ ๋ํ์์ฑ
* ๊ฐ์ฑ ๋ํ ๋ง๋ญ์น
* ์ฃผ์ ๋ณ ํ
์คํธ ์ผ์ ๋ํ ๋ฐ์ดํฐ
* ์ฉ๋๋ณ ๋ชฉ์ ๋ํ ๋ฐ์ดํฐ
* ํ๊ตญ์ด SNS