mDeBERTa-v3-base-kor-further

๐Ÿ’ก ์•„๋ž˜ ํ”„๋กœ์ ํŠธ๋Š” KPMG Lighthouse Korea์—์„œ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
KPMG Lighthouse Korea์—์„œ๋Š”, Financial area์˜ ๋‹ค์–‘ํ•œ ๋ฌธ์ œ๋“ค์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด Edge Technology์˜ NLP/Vision AI๋ฅผ ๋ชจ๋ธ๋งํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. https://kpmgkr.notion.site/

What is DeBERTa?

  • DeBERTa๋Š” Disentangled Attention + Enhanced Mask Decoder ๋ฅผ ์ ์šฉํ•˜์—ฌ ๋‹จ์–ด์˜ positional information์„ ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์ด์™€ ๊ฐ™์€ ์•„์ด๋””์–ด๋ฅผ ํ†ตํ•ด, ๊ธฐ์กด์˜ BERT, RoBERTa์—์„œ ์‚ฌ์šฉํ–ˆ๋˜ absolute position embedding๊ณผ๋Š” ๋‹ฌ๋ฆฌ DeBERTa๋Š” ๋‹จ์–ด์˜ ์ƒ๋Œ€์ ์ธ ์œ„์น˜ ์ •๋ณด๋ฅผ ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„ํ•˜์—ฌ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ, BERT, RoBERTA ์™€ ๋น„๊ตํ–ˆ์„ ๋•Œ ๋” ์ค€์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.
  • DeBERTa-v3์—์„œ๋Š”, ์ด์ „ ๋ฒ„์ „์—์„œ ์‚ฌ์šฉํ–ˆ๋˜ MLM (Masked Language Model) ์„ RTD (Replaced Token Detection) Task ๋กœ ๋Œ€์ฒดํ•œ ELECTRA ์Šคํƒ€์ผ์˜ ์‚ฌ์ „ํ•™์Šต ๋ฐฉ๋ฒ•๊ณผ, Gradient-Disentangled Embedding Sharing ์„ ์ ์šฉํ•˜์—ฌ ๋ชจ๋ธ ํ•™์Šต์˜ ํšจ์œจ์„ฑ์„ ๊ฐœ์„ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • DeBERTa์˜ ์•„ํ‚คํ…์ฒ˜๋กœ ํ’๋ถ€ํ•œ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด์„œ, mDeBERTa-v3-base-kor-further ๋Š” microsoft ๊ฐ€ ๋ฐœํ‘œํ•œ mDeBERTa-v3-base ๋ฅผ ์•ฝ 40GB์˜ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ ์ถ”๊ฐ€์ ์ธ ์‚ฌ์ „ํ•™์Šต์„ ์ง„ํ–‰ํ•œ ์–ธ์–ด ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

How to Use

  • Requirements
    pip install transformers
    pip install sentencepiece
    
  • Huggingface Hub
    from transformers import AutoModel, AutoTokenizer
    
    model = AutoModel.from_pretrained("lighthouse/mdeberta-v3-base-kor-further")  # DebertaV2ForModel
    tokenizer = AutoTokenizer.from_pretrained("lighthouse/mdeberta-v3-base-kor-further")  # DebertaV2Tokenizer (SentencePiece)
    

Pre-trained Models

  • ๋ชจ๋ธ์˜ ์•„ํ‚คํ…์ฒ˜๋Š” ๊ธฐ์กด microsoft์—์„œ ๋ฐœํ‘œํ•œ mdeberta-v3-base์™€ ๋™์ผํ•œ ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.

    Vocabulary(K) Backbone Parameters(M) Hidden Size Layers Note
    mdeberta-v3-base-kor-further (mdeberta-v3-base์™€ ๋™์ผ) 250 86 768 12 250K new SPM vocab

Further Pretraing Details (MLM Task)

  • mDeBERTa-v3-base-kor-further ๋Š” microsoft/mDeBERTa-v3-base ๋ฅผ ์•ฝ 40GB์˜ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ MLM Task๋ฅผ ์ ์šฉํ•˜์—ฌ ์ถ”๊ฐ€์ ์ธ ์‚ฌ์ „ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

    Max length Learning Rate Batch Size Train Steps Warm-up Steps
    mdeberta-v3-base-kor-further 512 2e-5 8 5M 50k

Datasets

  • ๋ชจ๋‘์˜ ๋ง๋ญ‰์น˜(์‹ ๋ฌธ, ๊ตฌ์–ด, ๋ฌธ์–ด), ํ•œ๊ตญ์–ด Wiki, ๊ตญ๋ฏผ์ฒญ์› ๋“ฑ ์•ฝ 40 GB ์˜ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์…‹์ด ์ถ”๊ฐ€์ ์ธ ์‚ฌ์ „ํ•™์Šต์— ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
    • Train: 10M lines, 5B tokens
    • Valid: 2M lines, 1B tokens
    • cf) ๊ธฐ์กด mDeBERTa-v3์€ XLM-R ๊ณผ ๊ฐ™์ด cc-100 ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ํ•™์Šต๋˜์—ˆ์œผ๋ฉฐ, ๊ทธ ์ค‘ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์…‹์˜ ํฌ๊ธฐ๋Š” 54GB์ž…๋‹ˆ๋‹ค.

Fine-tuning on NLU Tasks - Base Model

Model Size NSMC(acc) Naver NER(F1) PAWS (acc) KorNLI (acc) KorSTS (spearman) Question Pair (acc) KorQuaD (Dev) (EM/F1) Korean-Hate-Speech (Dev) (F1)
XLM-Roberta-Base 1.03G 89.03 86.65 82.80 80.23 78.45 93.80 64.70 / 88.94 64.06
mdeberta-base 534M 90.01 87.43 85.55 80.41 82.65 94.06 65.48 / 89.74 62.91
mdeberta-base-kor-further (Ours) 534M 90.52 87.87 85.85 80.65 81.90 94.98 66.07 / 90.35 68.16

KPMG Lighthouse KR

https://kpmgkr.notion.site/

Citation

@misc{he2021debertav3,
      title={DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing}, 
      author={Pengcheng He and Jianfeng Gao and Weizhu Chen},
      year={2021},
      eprint={2111.09543},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@inproceedings{
he2021deberta,
title={DEBERTA: DECODING-ENHANCED BERT WITH DISENTANGLED ATTENTION},
author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=XPZIaotutsD}
}

Reference

Downloads last month
15,332
Inference API
Unable to determine this modelโ€™s pipeline type. Check the docs .