BM-K commited on
Commit
6b02058
1 Parent(s): b5c1baa

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +98 -0
README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # KoMiniLM
2
+ 💪 Korean mini language model <br> https://github.com/BM-K/KoMiniLM
3
+
4
+ ## Overview
5
+ Current language models usually consist of hundreds of millions of parameters which brings challenges for fine-tuning and online serving in real-life applications due to latency and capacity constraints. In this project, we release a light weight korean language model to address the aforementioned shortcomings of existing language models.
6
+
7
+ ## Quick tour
8
+ ```python
9
+ from transformers import AutoTokenizer, AutoModel
10
+
11
+ tokenizer = AutoTokenizer.from_pretrained("BM-K/KoMiniLM")
12
+ model = AutoModel.from_pretrained("BM-K/KoMiniLM")
13
+
14
+ inputs = tokenizer("안녕 세상아!", return_tensors="pt")
15
+ outputs = model(**inputs)
16
+ ```
17
+
18
+ ## Pre-training
19
+ `Teacher Model`: [KLUE-BERT(base)](https://github.com/KLUE-benchmark/KLUE)
20
+
21
+ ### Object
22
+ Self-Attention Distribution and Self-Attention Value-Relation [[Wang et al., 2020]](https://arxiv.org/abs/2002.10957) were distilled from each discrete layer of the teacher model to the student model. Wang et al. distilled in the last layer of the transformer, but that was not the case in this project.
23
+
24
+ ### Data set
25
+ |Data|News comments|News article|
26
+ |:----:|:----:|:----:|
27
+ |size|10G|10G|
28
+ - Performance can be further improved by adding wiki data to training.
29
+
30
+ ### Config
31
+ - **KoMiniLM-23M**
32
+ ```json
33
+ {
34
+ "architectures": [
35
+ "BertForPreTraining"
36
+ ],
37
+ "attention_probs_dropout_prob": 0.1,
38
+ "classifier_dropout": null,
39
+ "hidden_act": "gelu",
40
+ "hidden_dropout_prob": 0.1,
41
+ "hidden_size": 384,
42
+ "initializer_range": 0.02,
43
+ "intermediate_size": 1536,
44
+ "layer_norm_eps": 1e-12,
45
+ "max_position_embeddings": 512,
46
+ "model_type": "bert",
47
+ "num_attention_heads": 12,
48
+ "num_hidden_layers": 6,
49
+ "output_attentions": true,
50
+ "pad_token_id": 0,
51
+ "position_embedding_type": "absolute",
52
+ "return_dict": false,
53
+ "torch_dtype": "float32",
54
+ "transformers_version": "4.13.0",
55
+ "type_vocab_size": 2,
56
+ "use_cache": true,
57
+ "vocab_size": 32000
58
+ }
59
+ ```
60
+
61
+ ### Performance on subtasks
62
+ - The results of our fine-tuning experiments are an average of 3 runs for each task.
63
+ ```
64
+ cd KoMiniLM-Finetune
65
+ bash scripts/run_all_kominilm.sh
66
+ ```
67
+
68
+ || #Param | NSMC<br>(Acc) | Naver NER<br>(F1) | PAWS<br>(Acc) | KorNLI<br>(Acc) | KorSTS<br>(Spearman) | Question Pair<br>(Acc) | KorQuaD<br>(Dev)<br>(EM/F1) |
69
+ |:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|
70
+ |KoBERT(KLUE)| 110M | 90.20±0.07 | 87.11±0.05 | 81.36±0.21 | 81.06±0.33 | 82.47±0.14 | 95.03±0.44 | 84.43±0.18 / <br>93.05±0.04 |
71
+ |KcBERT| 108M | 89.60±0.10 | 84.34±0.13 | 67.02±0.42| 74.17±0.52 | 76.57±0.51 | 93.97±0.27 | 60.87±0.27 / <br>85.01±0.14 |
72
+ |KoBERT(SKT)| 92M | 89.28±0.42 | 87.54±0.04 | 80.93±0.91 | 78.18±0.45 | 75.98±2.81 | 94.37±0.31 | 51.94±0.60 / <br>79.69±0.66 |
73
+ |DistilKoBERT| 28M | 88.39±0.08 | 84.22±0.01 | 61.74±0.45 | 70.22±0.14 | 72.11±0.27 | 92.65±0.16 | 52.52±0.48 / <br>76.00±0.71 |
74
+ | | | | | | | | | |
75
+ |**KoMiniLM<sup>†</sup>**| **23M** | 89.67±0.03 | 84.79±0.09 | 78.67±0.45 | 78.10±0.07 | 78.90±0.11 | 94.81±0.12 | 82.11±0.42 / <br>91.21±0.29 |
76
+
77
+ - [NSMC](https://github.com/e9t/nsmc) (Naver Sentiment Movie Corpus)
78
+ - [Naver NER](https://github.com/naver/nlp-challenge) (NER task on Naver NLP Challenge 2018)
79
+ - [PAWS](https://github.com/google-research-datasets/paws) (Korean Paraphrase Adversaries from Word Scrambling)
80
+ - [KorNLI/KorSTS](https://github.com/kakaobrain/KorNLUDatasets) (Korean Natural Language Understanding)
81
+ - [Question Pair](https://github.com/songys/Question_pair) (Paired Question)
82
+ - [KorQuAD](https://korquad.github.io/) (The Korean Question Answering Dataset)
83
+
84
+ ### User Contributed Examples
85
+ -
86
+
87
+ ## Reference
88
+ - [KLUE BERT](https://github.com/KLUE-benchmark/KLUE)
89
+ - [KcBERT](https://github.com/Beomi/KcBERT)
90
+ - [SKT KoBERT](https://github.com/SKTBrain/KoBERT)
91
+ - [DistilKoBERT](https://github.com/monologg/DistilKoBERT)
92
+ - [lassl](https://github.com/lassl/lassl)
93
+
94
+
95
+ ## ToDo
96
+ - [X] An average of 3 runs for each task
97
+ - [X] Huggingface model porting
98
+ - [ ] Add kowiki data