KoMiniLM / README.md
BM-K's picture
Create README.md
6b02058
|
raw
history blame
4.03 kB

KoMiniLM

💪 Korean mini language model
https://github.com/BM-K/KoMiniLM

Overview

Current language models usually consist of hundreds of millions of parameters which brings challenges for fine-tuning and online serving in real-life applications due to latency and capacity constraints. In this project, we release a light weight korean language model to address the aforementioned shortcomings of existing language models.

Quick tour

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("BM-K/KoMiniLM")
model = AutoModel.from_pretrained("BM-K/KoMiniLM")

inputs = tokenizer("안녕 세상아!", return_tensors="pt")
outputs = model(**inputs)

Pre-training

Teacher Model: KLUE-BERT(base)

Object

Self-Attention Distribution and Self-Attention Value-Relation [Wang et al., 2020] were distilled from each discrete layer of the teacher model to the student model. Wang et al. distilled in the last layer of the transformer, but that was not the case in this project.

Data set

Data News comments News article
size 10G 10G
  • Performance can be further improved by adding wiki data to training.

Config

  • KoMiniLM-23M
{
  "architectures": [
    "BertForPreTraining"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 384,
  "initializer_range": 0.02,
  "intermediate_size": 1536,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 6,
  "output_attentions": true,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "return_dict": false,
  "torch_dtype": "float32",
  "transformers_version": "4.13.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 32000
}

Performance on subtasks

  • The results of our fine-tuning experiments are an average of 3 runs for each task.
cd KoMiniLM-Finetune
bash scripts/run_all_kominilm.sh
#Param NSMC
(Acc)
Naver NER
(F1)
PAWS
(Acc)
KorNLI
(Acc)
KorSTS
(Spearman)
Question Pair
(Acc)
KorQuaD
(Dev)
(EM/F1)
KoBERT(KLUE) 110M 90.20±0.07 87.11±0.05 81.36±0.21 81.06±0.33 82.47±0.14 95.03±0.44 84.43±0.18 /
93.05±0.04
KcBERT 108M 89.60±0.10 84.34±0.13 67.02±0.42 74.17±0.52 76.57±0.51 93.97±0.27 60.87±0.27 /
85.01±0.14
KoBERT(SKT) 92M 89.28±0.42 87.54±0.04 80.93±0.91 78.18±0.45 75.98±2.81 94.37±0.31 51.94±0.60 /
79.69±0.66
DistilKoBERT 28M 88.39±0.08 84.22±0.01 61.74±0.45 70.22±0.14 72.11±0.27 92.65±0.16 52.52±0.48 /
76.00±0.71
KoMiniLM 23M 89.67±0.03 84.79±0.09 78.67±0.45 78.10±0.07 78.90±0.11 94.81±0.12 82.11±0.42 /
91.21±0.29
  • NSMC (Naver Sentiment Movie Corpus)
  • Naver NER (NER task on Naver NLP Challenge 2018)
  • PAWS (Korean Paraphrase Adversaries from Word Scrambling)
  • KorNLI/KorSTS (Korean Natural Language Understanding)
  • Question Pair (Paired Question)
  • KorQuAD (The Korean Question Answering Dataset)

User Contributed Examples

-

Reference

ToDo

  • An average of 3 runs for each task
  • Huggingface model porting
  • Add kowiki data