mssongit commited on
Commit
ead155c
ยท
verified ยท
1 Parent(s): 60ed6f7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +182 -3
README.md CHANGED
@@ -1,3 +1,182 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: sentence-similarity
3
+ tags:
4
+ - sentence-transformers
5
+ - feature-extraction
6
+ - sentence-similarity
7
+ - transformers
8
+ datasets:
9
+ - kornlu
10
+ language:
11
+ - ko
12
+ license: cc-by-4.0
13
+ ---
14
+
15
+ # bi-matrix/gmatrix-embedding
16
+
17
+ ํ•ด๋‹น ๋ชจ๋ธ์€ [KF-DeBERTa](https://huggingface.co/kakaobank/kf-deberta-base) ๋ชจ๋ธ๊ณผ KorSTS, KorNLI ๋ฐ์ดํ„ฐ์…‹์„ ํ™œ์šฉํ•˜์˜€์œผ๋ฉฐ, sentence-transformers์˜ ๊ณต์‹ ๋ฌธ์„œ ๋‚ด ์†Œ๊ฐœ๋œ [continue-learning](https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/sts/training_stsbenchmark_continue_training.py) ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ์•„๋ž˜์™€ ๊ฐ™์ด ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
18
+ 1. NLI ๋ฐ์ดํ„ฐ์…‹์„ ํ†ตํ•ด nagative sampling ํ›„ MultipleNegativeRankingLoss ํ™œ์šฉ ๋ฐ STS ๋ฐ์ดํ„ฐ์…‹์„ ํ†ตํ•ด CosineSimilarityLoss๋ฅผ ํ™œ์šฉํ•˜์—ฌ Multi-task Learning ํ•™์Šต 10epoch ์ง„ํ–‰
19
+ 2. Learning Rate๋ฅผ 1e-06์œผ๋กœ ์ค„์—ฌ์„œ 4epoch ์ถ”๊ฐ€ Multi-task ํ•™์Šต ์ง„ํ–‰
20
+
21
+ ---
22
+ This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
23
+
24
+ <!--- Describe your model here -->
25
+
26
+ ## Usage (Sentence-Transformers)
27
+
28
+ Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
29
+
30
+ ```
31
+ pip install -U sentence-transformers
32
+ ```
33
+
34
+ Then you can use the model like this:
35
+
36
+ ```python
37
+ from sentence_transformers import SentenceTransformer
38
+ sentences = ["This is an example sentence", "Each sentence is converted"]
39
+
40
+ model = SentenceTransformer("bi-matrix/gmatrix-embedding")
41
+ embeddings = model.encode(sentences)
42
+ print(embeddings)
43
+ ```
44
+
45
+
46
+
47
+ ## Usage (HuggingFace Transformers)
48
+ Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
49
+
50
+ ```python
51
+ from transformers import AutoTokenizer, AutoModel
52
+ import torch
53
+
54
+
55
+ #Mean Pooling - Take attention mask into account for correct averaging
56
+ def mean_pooling(model_output, attention_mask):
57
+ token_embeddings = model_output[0] #First element of model_output contains all token embeddings
58
+ input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
59
+ return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
60
+
61
+
62
+ # Sentences we want sentence embeddings for
63
+ sentences = ['This is an example sentence', 'Each sentence is converted']
64
+
65
+ # Load model from HuggingFace Hub
66
+ tokenizer = AutoTokenizer.from_pretrained("bi-matrix/gmatrix-embedding")
67
+ model = AutoModel.from_pretrained("bi-matrix/gmatrix-embedding")
68
+
69
+ # Tokenize sentences
70
+ encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
71
+
72
+ # Compute token embeddings
73
+ with torch.no_grad():
74
+ model_output = model(**encoded_input)
75
+
76
+ # Perform pooling. In this case, mean pooling.
77
+ sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
78
+
79
+ print("Sentence embeddings:")
80
+ print(sentence_embeddings)
81
+ ```
82
+
83
+
84
+ ## Evaluation Results
85
+
86
+ <!--- Describe how your model was evaluated -->
87
+
88
+ KorSTS ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ํ‰๊ฐ€ํ•œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.
89
+
90
+ - Cosine Pearson: 85.77
91
+ - Cosine Spearman: 86.30
92
+ - Manhattan Pearson: 84.84
93
+ - Manhattan Spearman: 85.33
94
+ - Euclidean Pearson: 84.82
95
+ - Euclidean Spearman: 85.29
96
+ - Dot Pearson: 83.19
97
+ - Dot Spearman: 83.19
98
+
99
+ <br>
100
+
101
+ |model|cosine_pearson|cosine_spearman|euclidean_pearson|euclidean_spearman|manhattan_pearson|manhattan_spearman|dot_pearson|dot_spearman|
102
+ |:-------------------------|-----------------:|------------------:|--------------------:|---------------------:|--------------------:|---------------------:|--------------:|---------------:|
103
+ |[**gmatrix-embedding**](https://huggingface.co/bi-matrix/gmatrix-embedding)|**85.77**|**86.30**|**84.82**|**85.29**|**84.84**|**85.33**|**83.19**|**83.19**|
104
+ |[kf-deberta-multitask](https://huggingface.co/upskyy/kf-deberta-multitask)|85.75|86.25|84.79|85.25|84.80|85.27|82.93|82.86|
105
+ |[ko-sroberta-multitask](https://huggingface.co/jhgan/ko-sroberta-multitask)|84.77|85.6|83.71|84.40|83.70|84.38|82.42|82.33|
106
+ |[ko-sbert-multitask](https://huggingface.co/jhgan/ko-sbert-multitask)|84.13|84.71|82.42|82.66|82.41|82.69|80.05|79.69|
107
+ |[ko-sroberta-base-nli](https://huggingface.co/jhgan/ko-sroberta-nli)|82.83|83.85|82.87|83.29|82.88|83.28|80.34|79.69|
108
+ |[ko-sbert-nli](https://huggingface.co/jhgan/ko-sbert-multitask)|82.24|83.16|82.19|82.31|82.18|82.3|79.3|78.78|
109
+ |[ko-sroberta-sts](https://huggingface.co/jhgan/ko-sroberta-sts)|81.84|81.82|81.15|81.25|81.14|81.25|79.09|78.54|
110
+ |[ko-sbert-sts](https://huggingface.co/jhgan/ko-sbert-sts)|81.55|81.23|79.94|79.79|79.9|79.75|76.02|75.31|
111
+
112
+ <br>
113
+
114
+
115
+ <!--- Describe how your model was evaluated -->
116
+
117
+ G-MATRIX Embedding ๋ฐ์ดํ„ฐ์…‹ ์ธก์ • ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.
118
+ ์‚ฌ๋žŒ 3๋ช…์ด์„œ 0~5์ ์œผ๋กœ ๋‘ ๋ฌธ์žฅ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ์ธก์ •ํ•˜์—ฌ ์ ์ˆ˜๋ฅผ ๋‚ด๊ณ  ํ‰๊ท ๏ฟฝ๏ฟฝ ๊ตฌํ•˜์—ฌ ๊ฐ ๋ชจ๋ธ์˜ ์ž„๋ฒ ๋”ฉ๊ฐ’์„ ํ†ตํ•ด
119
+
120
+ ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„, ์œ ํด๋ฆฌ๋””์•ˆ ๊ฑฐ๋ฆฌ, ๋งจํ•˜ํƒ„ ๊ฑฐ๋ฆฌ, Dot-product๋ฅผ ๊ตฌํ•˜์—ฌ ํ”ผ์–ด์Šจ, ์Šคํ”ผ์–ด๋งŒ ์ƒ๊ด€๊ณ„์ˆ˜๋ฅผ ๊ตฌํ•œ ๊ฐ’์ž…๋‹ˆ๋‹ค.
121
+
122
+ - Cosine Pearson: 75.86
123
+ - Cosine Spearman: 65.75
124
+ - Manhattan Pearson: 72.65
125
+ - Manhattan Spearman: 65.20
126
+ - Euclidean Pearson: 72.48
127
+ - Euclidean Spearman: 65.32
128
+ - Dot Pearson: 64.71
129
+ - Dot Spearman: 53.90
130
+
131
+ <br>
132
+
133
+ model|cosine_pearson|cosine_spearman|euclidean_pearson|euclidean_spearman|manhattan_pearson|manhattan_spearman|dot_pearson|dot_spearman|
134
+ |:-------------------------|-----------------:|------------------:|--------------------:|---------------------:|--------------------:|---------------------:|--------------:|---------------:|
135
+ |[**gmatrix-embedding**](https://huggingface.co/bi-matrix/gmatrix-embedding)|**75.86**|**65.75**|**72.65**|**65.20**|**72.48**|**65.32**|**64.71**|**53.90**|
136
+ |[ko-sroberta-multitask](https://huggingface.co/jhgan/ko-sroberta-multitask)|71.78|63.16|70.80|63.47|70.89|63.72|53.57|44.23|
137
+ |[bge-m3](https://huggingface.co/BAAI/bge-m3)|64.15|60.65|61.88|60.68|61.88|60.19|64.16|60.71|
138
+
139
+ <br>
140
+
141
+
142
+
143
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6350f6750b94548566da3279/CcK0QL3oQAz7sJOCtH6PB.png)
144
+
145
+ <br>
146
+
147
+ ## G-MATRIX Embedding ๋ ˆ์ด๋ธ”๋ง ํŒ๋‹จ ๊ธฐ์ค€ (KLUE-RoBERTa์˜ STS ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ์ฐธ๊ณ )
148
+ 1. ๋‘ ๋ฌธ์žฅ์˜ ์œ ์‚ฌํ•œ ์ •๋„๋ฅผ ๋ณด๊ณ  0~5์ ์œผ๋กœ ํŒ๋‹จ
149
+ 2. ๋งž์ถค๋ฒ•, ๋„์–ด์“ฐ๊ธฐ, ์˜จ์ ์ด๋‚˜ ์‰ผํ‘œ ์ฐจ์ด๋Š” ํŒ๋‹จ ๋Œ€์ƒ์ด ์•„๋‹˜
150
+ 3. ๋ฌธ์žฅ์˜ ์˜๋„, ํ‘œํ˜„์ด ๋‹ด๊ณ  ์žˆ๋Š” ์˜๋ฏธ๋ฅผ ๋น„๊ต
151
+ 4. ๋‘ ๋ฌธ์žฅ์— ๊ณตํ†ต์ ์œผ๋กœ ์‚ฌ์šฉ๋œ ๋‹จ์–ด์˜ ์œ ๋ฌด๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด ์•„๋‹Œ, ๋ฌธ์žฅ์˜ ์˜๋ฏธ๊ฐ€ ์œ ์‚ฌํ•œ์ง€๋ฅผ ๋น„๊ต
152
+ 5. 0์€ ์˜๋ฏธ์  ์œ ์‚ฌ์„ฑ์ด ์—†๋Š” ๊ฒฝ์šฐ์ด๊ณ , 5๋Š” ์˜๋ฏธ์ ์œผ๋กœ ๋™๋“ฑํ•จ์„ ๋œปํ•จ
153
+
154
+
155
+
156
+ ## Training
157
+ The model was trained with the parameters:
158
+
159
+ **DataLoader**:
160
+
161
+ `torch.utils.data.dataloader.DataLoader` of length 329 with parameters:
162
+ ```
163
+ {'batch_size': 32, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
164
+ ```
165
+
166
+ **Loss**:
167
+
168
+ `sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss`
169
+
170
+
171
+ ## Full Model Architecture
172
+ ```
173
+ SentenceTransformer(
174
+ (0): Transformer({'max_seq_length': 128, 'do_lower_case': True}) with Transformer model: DeBERTaV2Model
175
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
176
+ )
177
+ ```
178
+
179
+ ## Citing & Authors
180
+
181
+ <!--- Describe where people can find more information -->
182
+ [MINSANG SONG] at [BI-Matrix](https://www.bimatrix.co.kr/)