BM-K commited on
Commit
2c991f2
·
1 Parent(s): b3f0380

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -24
README.md CHANGED
@@ -1,11 +1,5 @@
1
- ---
2
- language: ko
3
- tags:
4
- - korean
5
- ---
6
-
7
  # KoMiniLM
8
- 💪 Korean mini language model <br> https://github.com/BM-K/KoMiniLM
9
 
10
  ## Overview
11
  Current language models usually consist of hundreds of millions of parameters which brings challenges for fine-tuning and online serving in real-life applications due to latency and capacity constraints. In this project, we release a light weight korean language model to address the aforementioned shortcomings of existing language models.
@@ -14,24 +8,33 @@ Current language models usually consist of hundreds of millions of parameters wh
14
  ```python
15
  from transformers import AutoTokenizer, AutoModel
16
 
17
- tokenizer = AutoTokenizer.from_pretrained("BM-K/KoMiniLM")
18
  model = AutoModel.from_pretrained("BM-K/KoMiniLM")
19
 
20
  inputs = tokenizer("안녕 세상아!", return_tensors="pt")
21
  outputs = model(**inputs)
22
  ```
23
 
 
 
 
 
 
 
 
24
  ## Pre-training
25
  `Teacher Model`: [KLUE-BERT(base)](https://github.com/KLUE-benchmark/KLUE)
26
 
27
  ### Object
28
  Self-Attention Distribution and Self-Attention Value-Relation [[Wang et al., 2020]](https://arxiv.org/abs/2002.10957) were distilled from each discrete layer of the teacher model to the student model. Wang et al. distilled in the last layer of the transformer, but that was not the case in this project.
29
 
30
- ### Data set
31
  |Data|News comments|News article|
32
  |:----:|:----:|:----:|
33
  |size|10G|10G|
34
- - Performance can be further improved by adding wiki data to training.
 
 
35
 
36
  ### Config
37
  - **KoMiniLM-23M**
@@ -71,14 +74,15 @@ cd KoMiniLM-Finetune
71
  bash scripts/run_all_kominilm.sh
72
  ```
73
 
74
- || #Param | NSMC<br>(Acc) | Naver NER<br>(F1) | PAWS<br>(Acc) | KorNLI<br>(Acc) | KorSTS<br>(Spearman) | Question Pair<br>(Acc) | KorQuaD<br>(Dev)<br>(EM/F1) |
75
- |:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|
76
- |KoBERT(KLUE)| 110M | 90.20±0.07 | 87.11±0.05 | 81.36±0.21 | 81.06±0.33 | 82.47±0.14 | 95.03±0.44 | 84.43±0.18 / <br>93.05±0.04 |
77
- |KcBERT| 108M | 89.60±0.10 | 84.34±0.13 | 67.02±0.42| 74.17±0.52 | 76.57±0.51 | 93.97±0.27 | 60.87±0.27 / <br>85.01±0.14 |
78
- |KoBERT(SKT)| 92M | 89.28±0.42 | 87.54±0.04 | 80.93±0.91 | 78.18±0.45 | 75.98±2.81 | 94.37±0.31 | 51.94±0.60 / <br>79.69±0.66 |
79
- |DistilKoBERT| 28M | 88.39±0.08 | 84.22±0.01 | 61.74±0.45 | 70.22±0.14 | 72.11±0.27 | 92.65±0.16 | 52.52±0.48 / <br>76.00±0.71 |
80
  | | | | | | | | | |
81
- |**KoMiniLM<sup>†</sup>**| **23M** | 89.67±0.03 | 84.79±0.09 | 78.67±0.45 | 78.10±0.07 | 78.90±0.11 | 94.81±0.12 | 82.11±0.42 / <br>91.21±0.29 |
 
82
 
83
  - [NSMC](https://github.com/e9t/nsmc) (Naver Sentiment Movie Corpus)
84
  - [Naver NER](https://github.com/naver/nlp-challenge) (NER task on Naver NLP Challenge 2018)
@@ -87,6 +91,8 @@ bash scripts/run_all_kominilm.sh
87
  - [Question Pair](https://github.com/songys/Question_pair) (Paired Question)
88
  - [KorQuAD](https://korquad.github.io/) (The Korean Question Answering Dataset)
89
 
 
 
90
  ### User Contributed Examples
91
  -
92
 
@@ -95,10 +101,4 @@ bash scripts/run_all_kominilm.sh
95
  - [KcBERT](https://github.com/Beomi/KcBERT)
96
  - [SKT KoBERT](https://github.com/SKTBrain/KoBERT)
97
  - [DistilKoBERT](https://github.com/monologg/DistilKoBERT)
98
- - [lassl](https://github.com/lassl/lassl)
99
-
100
-
101
- ## ToDo
102
- - [X] An average of 3 runs for each task
103
- - [X] Huggingface model porting
104
- - [ ] Add kowiki data
 
 
 
 
 
 
 
1
  # KoMiniLM
2
+ 🐣 Korean mini language model
3
 
4
  ## Overview
5
  Current language models usually consist of hundreds of millions of parameters which brings challenges for fine-tuning and online serving in real-life applications due to latency and capacity constraints. In this project, we release a light weight korean language model to address the aforementioned shortcomings of existing language models.
 
8
  ```python
9
  from transformers import AutoTokenizer, AutoModel
10
 
11
+ tokenizer = AutoTokenizer.from_pretrained("BM-K/KoMiniLM") # 23M model
12
  model = AutoModel.from_pretrained("BM-K/KoMiniLM")
13
 
14
  inputs = tokenizer("안녕 세상아!", return_tensors="pt")
15
  outputs = model(**inputs)
16
  ```
17
 
18
+ ## Update history
19
+ ** Updates on 2022.06.20 **
20
+ - Release KoMiniLM-bert-68M
21
+
22
+ ** Updates on 2022.05.24 **
23
+ - Release KoMiniLM-bert-23M
24
+
25
  ## Pre-training
26
  `Teacher Model`: [KLUE-BERT(base)](https://github.com/KLUE-benchmark/KLUE)
27
 
28
  ### Object
29
  Self-Attention Distribution and Self-Attention Value-Relation [[Wang et al., 2020]](https://arxiv.org/abs/2002.10957) were distilled from each discrete layer of the teacher model to the student model. Wang et al. distilled in the last layer of the transformer, but that was not the case in this project.
30
 
31
+ ### Data sets
32
  |Data|News comments|News article|
33
  |:----:|:----:|:----:|
34
  |size|10G|10G|
35
+ > **Note**<br>
36
+ > - Performance can be further improved by adding wiki data to training.
37
+ > - The crawling and preprocessing code for the *News article* is [here](https://github.com/2unju/DaumNewsCrawler).
38
 
39
  ### Config
40
  - **KoMiniLM-23M**
 
74
  bash scripts/run_all_kominilm.sh
75
  ```
76
 
77
+ || #Param | Average | NSMC<br>(Acc) | Naver NER<br>(F1) | PAWS<br>(Acc) | KorNLI<br>(Acc) | KorSTS<br>(Spearman) | Question Pair<br>(Acc) | KorQuaD<br>(Dev)<br>(EM/F1) |
78
+ |:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|
79
+ |KoBERT(KLUE)| 110M | 86.84 | 90.20±0.07 | 87.11±0.05 | 81.36±0.21 | 81.06±0.33 | 82.47±0.14 | 95.03±0.44 | 84.43±0.18 / <br>93.05±0.04 |
80
+ |KcBERT| 108M | 78.94 | 89.60±0.10 | 84.34±0.13 | 67.02±0.42| 74.17±0.52 | 76.57±0.51 | 93.97±0.27 | 60.87±0.27 / <br>85.01±0.14 |
81
+ |KoBERT(SKT)| 92M | 79.73 | 89.28±0.42 | 87.54±0.04 | 80.93±0.91 | 78.18±0.45 | 75.98±2.81 | 94.37±0.31 | 51.94±0.60 / <br>79.69±0.66 |
82
+ |DistilKoBERT| 28M | 74.73 | 88.39±0.08 | 84.22±0.01 | 61.74±0.45 | 70.22±0.14 | 72.11±0.27 | 92.65±0.16 | 52.52±0.48 / <br>76.00±0.71 |
83
  | | | | | | | | | |
84
+ |**KoMiniLM<sup>†</sup>**| **68M** | 85.90 | 89.84±0.02 | 85.98±0.09 | 80.78±0.30 | 79.28±0.17 | 81.00±0.07 | 94.89±0.37 | 83.27±0.08 / <br>92.08±0.06 |
85
+ |**KoMiniLM<sup>†</sup>**| **23M** | 84.79 | 89.67±0.03 | 84.79±0.09 | 78.67±0.45 | 78.10±0.07 | 78.90±0.11 | 94.81±0.12 | 82.11±0.42 / <br>91.21±0.29 |
86
 
87
  - [NSMC](https://github.com/e9t/nsmc) (Naver Sentiment Movie Corpus)
88
  - [Naver NER](https://github.com/naver/nlp-challenge) (NER task on Naver NLP Challenge 2018)
 
91
  - [Question Pair](https://github.com/songys/Question_pair) (Paired Question)
92
  - [KorQuAD](https://korquad.github.io/) (The Korean Question Answering Dataset)
93
 
94
+ <img src = "https://user-images.githubusercontent.com/55969260/174229747-279122dc-9d27-4da9-a6e7-f9f1fe1651f7.png"> <br>
95
+
96
  ### User Contributed Examples
97
  -
98
 
 
101
  - [KcBERT](https://github.com/Beomi/KcBERT)
102
  - [SKT KoBERT](https://github.com/SKTBrain/KoBERT)
103
  - [DistilKoBERT](https://github.com/monologg/DistilKoBERT)
104
+ - [lassl](https://github.com/lassl/lassl)