KoBigBird-RoBERTa-large

This is a large-sized Korean BigBird model introduced in our paper. The model draws heavily from the parameters of klue/roberta-large to ensure high performance. By employing the BigBird architecture and incorporating the newly proposed TAPER, the language model accommodates even longer input lengths.

How to Use

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("vaiv/kobigbird-roberta-large")
model = AutoModelForMaskedLM.from_pretrained("vaiv/kobigbird-roberta-large")

Hyperparameters

image/png

Results

Measurement on validation sets of the KLUE benchmark datasets

image/png

Limitations

While our model achieves great results even without additional pretraining, further pretraining can refine the positional representations more.

Citation Information

@article{yang2023kobigbird,
    title={KoBigBird-large: Transformation of Transformer for Korean Language Understanding},
    author={Yang, Kisu and Jang, Yoonna and Lee, Taewoo and Seong, Jinwoo and Lee, Hyungjin and Jang, Hwanseok and Lim, Heuiseok},
    journal={arXiv preprint arXiv:2309.10339},
    year={2023}
}
Downloads last month
578
Safetensors
Model size
341M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.