BM-K
/

KoMiniLM

Feature Extraction

Transformers

PyTorch

Safetensors

bert

text-embeddings-inference

Model card Files Files and versions Community

BM-K commited on Jun 23, 2022

Commit

2c991f2

1 Parent(s): b3f0380

Update README.md

Browse files

Files changed (1) hide show

README.md +24 -24

README.md CHANGED Viewed

@@ -1,11 +1,5 @@
----
-language: ko
-tags:
-  - korean
----
 # KoMiniLM
-💪 Korean mini language model <br> https://github.com/BM-K/KoMiniLM
 ## Overview
 Current language models usually consist of hundreds of millions of parameters which brings challenges for fine-tuning and online serving in real-life applications due to latency and capacity constraints. In this project, we release a light weight korean language model to address the aforementioned shortcomings of existing language models.
@@ -14,24 +8,33 @@ Current language models usually consist of hundreds of millions of parameters wh
 ```python
 from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained("BM-K/KoMiniLM")
 model = AutoModel.from_pretrained("BM-K/KoMiniLM")
 inputs = tokenizer("안녕 세상아!", return_tensors="pt")
 outputs = model(**inputs)
 ```
 ## Pre-training
 `Teacher Model`: [KLUE-BERT(base)](https://github.com/KLUE-benchmark/KLUE)
 ### Object
 Self-Attention Distribution and Self-Attention Value-Relation [[Wang et al., 2020]](https://arxiv.org/abs/2002.10957) were distilled from each discrete layer of the teacher model to the student model. Wang et al. distilled in the last layer of the transformer, but that was not the case in this project.
-### Data set
 |Data|News comments|News article|
 |:----:|:----:|:----:|
 |size|10G|10G|
-- Performance can be further improved by adding wiki data to training.
 ### Config
 - **KoMiniLM-23M**
@@ -71,14 +74,15 @@ cd KoMiniLM-Finetune
 bash scripts/run_all_kominilm.sh
 ```
-|| #Param | NSMC<br>(Acc) | Naver NER<br>(F1) | PAWS<br>(Acc) | KorNLI<br>(Acc) | KorSTS<br>(Spearman) | Question Pair<br>(Acc) | KorQuaD<br>(Dev)<br>(EM/F1) |
-|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|
-|KoBERT(KLUE)| 110M | 90.20±0.07 | 87.11±0.05 | 81.36±0.21 | 81.06±0.33 | 82.47±0.14 | 95.03±0.44 | 84.43±0.18 / <br>93.05±0.04 |
-|KcBERT| 108M | 89.60±0.10 | 84.34±0.13 | 67.02±0.42| 74.17±0.52 | 76.57±0.51 | 93.97±0.27 | 60.87±0.27 / <br>85.01±0.14 |
-|KoBERT(SKT)| 92M | 89.28±0.42 | 87.54±0.04 | 80.93±0.91 | 78.18±0.45 | 75.98±2.81 | 94.37±0.31  | 51.94±0.60 / <br>79.69±0.66 |
-|DistilKoBERT| 28M | 88.39±0.08 | 84.22±0.01 | 61.74±0.45 | 70.22±0.14 | 72.11±0.27 | 92.65±0.16 | 52.52±0.48 / <br>76.00±0.71 |
 |  |  |  |  |  |  |  |  |  |
-|**KoMiniLM<sup>†</sup>**| **23M** | 89.67±0.03 | 84.79±0.09 | 78.67±0.45 | 78.10±0.07 | 78.90±0.11 | 94.81±0.12 | 82.11±0.42 / <br>91.21±0.29 |
 - [NSMC](https://github.com/e9t/nsmc) (Naver Sentiment Movie Corpus)
 - [Naver NER](https://github.com/naver/nlp-challenge) (NER task on Naver NLP Challenge 2018)
@@ -87,6 +91,8 @@ bash scripts/run_all_kominilm.sh
 - [Question Pair](https://github.com/songys/Question_pair) (Paired Question)
 - [KorQuAD](https://korquad.github.io/) (The Korean Question Answering Dataset)
 ### User Contributed Examples
 -
@@ -95,10 +101,4 @@ bash scripts/run_all_kominilm.sh
 - [KcBERT](https://github.com/Beomi/KcBERT)
 - [SKT KoBERT](https://github.com/SKTBrain/KoBERT)
 - [DistilKoBERT](https://github.com/monologg/DistilKoBERT)
-- [lassl](https://github.com/lassl/lassl)
-## ToDo
-- [X] An average of 3 runs for each task
-- [X] Huggingface model porting
-- [ ] Add kowiki data

 # KoMiniLM
+🐣 Korean mini language model
 ## Overview
 Current language models usually consist of hundreds of millions of parameters which brings challenges for fine-tuning and online serving in real-life applications due to latency and capacity constraints. In this project, we release a light weight korean language model to address the aforementioned shortcomings of existing language models.
 ```python
 from transformers import AutoTokenizer, AutoModel
+tokenizer = AutoTokenizer.from_pretrained("BM-K/KoMiniLM") # 23M model
 model = AutoModel.from_pretrained("BM-K/KoMiniLM")
 inputs = tokenizer("안녕 세상아!", return_tensors="pt")
 outputs = model(**inputs)
 ```
+## Update history
+** Updates on 2022.06.20 **
+- Release KoMiniLM-bert-68M
+** Updates on 2022.05.24 **
+- Release KoMiniLM-bert-23M
 ## Pre-training
 `Teacher Model`: [KLUE-BERT(base)](https://github.com/KLUE-benchmark/KLUE)
 ### Object
 Self-Attention Distribution and Self-Attention Value-Relation [[Wang et al., 2020]](https://arxiv.org/abs/2002.10957) were distilled from each discrete layer of the teacher model to the student model. Wang et al. distilled in the last layer of the transformer, but that was not the case in this project.
+### Data sets
 |Data|News comments|News article|
 |:----:|:----:|:----:|
 |size|10G|10G|
+> **Note**<br>
+> - Performance can be further improved by adding wiki data to training.
+> - The crawling and preprocessing code for the *News article* is [here](https://github.com/2unju/DaumNewsCrawler).
 ### Config
 - **KoMiniLM-23M**
 bash scripts/run_all_kominilm.sh
 ```
+|| #Param | Average | NSMC<br>(Acc) | Naver NER<br>(F1) | PAWS<br>(Acc) | KorNLI<br>(Acc) | KorSTS<br>(Spearman) | Question Pair<br>(Acc) | KorQuaD<br>(Dev)<br>(EM/F1) |
+|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|
+|KoBERT(KLUE)| 110M | 86.84 | 90.20±0.07 | 87.11±0.05 | 81.36±0.21 | 81.06±0.33 | 82.47±0.14 | 95.03±0.44 | 84.43±0.18 / <br>93.05±0.04 |
+|KcBERT| 108M | 78.94 | 89.60±0.10 | 84.34±0.13 | 67.02±0.42| 74.17±0.52 | 76.57±0.51 | 93.97±0.27 | 60.87±0.27 / <br>85.01±0.14 |
+|KoBERT(SKT)| 92M | 79.73 | 89.28±0.42 | 87.54±0.04 | 80.93±0.91 | 78.18±0.45 | 75.98±2.81 | 94.37±0.31  | 51.94±0.60 / <br>79.69±0.66 |
+|DistilKoBERT| 28M | 74.73 | 88.39±0.08 | 84.22±0.01 | 61.74±0.45 | 70.22±0.14 | 72.11±0.27 | 92.65±0.16 | 52.52±0.48 / <br>76.00±0.71 |
 |  |  |  |  |  |  |  |  |  |
+|**KoMiniLM<sup>†</sup>**| **68M** | 85.90 | 89.84±0.02 | 85.98±0.09 | 80.78±0.30 | 79.28±0.17 | 81.00±0.07 | 94.89±0.37 | 83.27±0.08 / <br>92.08±0.06 |
+|**KoMiniLM<sup>†</sup>**| **23M** | 84.79 | 89.67±0.03 | 84.79±0.09 | 78.67±0.45 | 78.10±0.07 | 78.90±0.11 | 94.81±0.12 | 82.11±0.42 / <br>91.21±0.29 |
 - [NSMC](https://github.com/e9t/nsmc) (Naver Sentiment Movie Corpus)
 - [Naver NER](https://github.com/naver/nlp-challenge) (NER task on Naver NLP Challenge 2018)
 - [Question Pair](https://github.com/songys/Question_pair) (Paired Question)
 - [KorQuAD](https://korquad.github.io/) (The Korean Question Answering Dataset)
+<img src = "https://user-images.githubusercontent.com/55969260/174229747-279122dc-9d27-4da9-a6e7-f9f1fe1651f7.png"> <br>
 ### User Contributed Examples
 -
 - [KcBERT](https://github.com/Beomi/KcBERT)
 - [SKT KoBERT](https://github.com/SKTBrain/KoBERT)
 - [DistilKoBERT](https://github.com/monologg/DistilKoBERT)
+- [lassl](https://github.com/lassl/lassl)