bi-matrix
/

gmatrix-embedding1

+---
+pipeline_tag: sentence-similarity
+tags:
+- sentence-transformers
+- feature-extraction
+- sentence-similarity
+- transformers
+datasets:
+- kornlu
+language:
+- ko
+license: cc-by-4.0
+---
+# bi-matrix/gmatrix-embedding
+해당 모델은 [KF-DeBERTa](https://huggingface.co/kakaobank/kf-deberta-base) 모델과 KorSTS, KorNLI 데이터셋을 활용하였으며, sentence-transformers의 공식 문서 내 소개된 [continue-learning](https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/sts/training_stsbenchmark_continue_training.py) 방법을 통해 아래와 같이 학습되었습니다.
+1. NLI 데이터셋을 통해 nagative sampling 후 MultipleNegativeRankingLoss 활용 및 STS 데이터셋을 통해 CosineSimilarityLoss를 활용하여 Multi-task Learning 학습 10epoch 진행
+2. Learning Rate를 1e-06으로 줄여서 4epoch 추가 Multi-task 학습 진행
+---
+This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
+<!--- Describe your model here -->
+## Usage (Sentence-Transformers)
+Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
+```
+pip install -U sentence-transformers
+```
+Then you can use the model like this:
+```python
+from sentence_transformers import SentenceTransformer
+sentences = ["This is an example sentence", "Each sentence is converted"]
+model = SentenceTransformer("bi-matrix/gmatrix-embedding")
+embeddings = model.encode(sentences)
+print(embeddings)
+```
+## Usage (HuggingFace Transformers)
+Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
+```python
+from transformers import AutoTokenizer, AutoModel
+import torch
+#Mean Pooling - Take attention mask into account for correct averaging
+def mean_pooling(model_output, attention_mask):
+    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
+    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
+    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
+# Sentences we want sentence embeddings for
+sentences = ['This is an example sentence', 'Each sentence is converted']
+# Load model from HuggingFace Hub
+tokenizer = AutoTokenizer.from_pretrained("bi-matrix/gmatrix-embedding")
+model = AutoModel.from_pretrained("bi-matrix/gmatrix-embedding")
+# Tokenize sentences
+encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
+# Compute token embeddings
+with torch.no_grad():
+    model_output = model(**encoded_input)
+# Perform pooling. In this case, mean pooling.
+sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
+print("Sentence embeddings:")
+print(sentence_embeddings)
+```
+## Evaluation Results
+<!--- Describe how your model was evaluated -->
+KorSTS 평가 데이터셋으로 평가한 결과입니다.
+- Cosine Pearson: 85.77
+- Cosine Spearman: 86.30
+- Manhattan Pearson: 84.84
+- Manhattan Spearman: 85.33
+- Euclidean Pearson: 84.82
+- Euclidean Spearman: 85.29
+- Dot Pearson: 83.19
+- Dot Spearman: 83.19
+<br>
+|model|cosine_pearson|cosine_spearman|euclidean_pearson|euclidean_spearman|manhattan_pearson|manhattan_spearman|dot_pearson|dot_spearman|
+|:-------------------------|-----------------:|------------------:|--------------------:|---------------------:|--------------------:|---------------------:|--------------:|---------------:|
+|[**gmatrix-embedding**](https://huggingface.co/bi-matrix/gmatrix-embedding)|**85.77**|**86.30**|**84.82**|**85.29**|**84.84**|**85.33**|**83.19**|**83.19**|
+|[kf-deberta-multitask](https://huggingface.co/upskyy/kf-deberta-multitask)|85.75|86.25|84.79|85.25|84.80|85.27|82.93|82.86|
+|[ko-sroberta-multitask](https://huggingface.co/jhgan/ko-sroberta-multitask)|84.77|85.6|83.71|84.40|83.70|84.38|82.42|82.33|
+|[ko-sbert-multitask](https://huggingface.co/jhgan/ko-sbert-multitask)|84.13|84.71|82.42|82.66|82.41|82.69|80.05|79.69|
+|[ko-sroberta-base-nli](https://huggingface.co/jhgan/ko-sroberta-nli)|82.83|83.85|82.87|83.29|82.88|83.28|80.34|79.69|
+|[ko-sbert-nli](https://huggingface.co/jhgan/ko-sbert-multitask)|82.24|83.16|82.19|82.31|82.18|82.3|79.3|78.78|
+|[ko-sroberta-sts](https://huggingface.co/jhgan/ko-sroberta-sts)|81.84|81.82|81.15|81.25|81.14|81.25|79.09|78.54|
+|[ko-sbert-sts](https://huggingface.co/jhgan/ko-sbert-sts)|81.55|81.23|79.94|79.79|79.9|79.75|76.02|75.31|
+<br>
+<!--- Describe how your model was evaluated -->
+G-MATRIX Embedding 데이터셋 측정 결과입니다.
+사람 3명이서 0~5점으로 두 문장간의 유사도를 측정하여 점수를 내고 평균�� 구하여 각 모델의 임베딩값을 통해
+코사인 유사도, 유클리디안 거리, 맨하탄 거리, Dot-product를 구하여 피어슨, 스피어만 상관계수를 구한 값입니다.
+- Cosine Pearson: 75.86
+- Cosine Spearman: 65.75
+- Manhattan Pearson: 72.65
+- Manhattan Spearman: 65.20
+- Euclidean Pearson: 72.48
+- Euclidean Spearman: 65.32
+- Dot Pearson: 64.71
+- Dot Spearman: 53.90
+<br>
+model|cosine_pearson|cosine_spearman|euclidean_pearson|euclidean_spearman|manhattan_pearson|manhattan_spearman|dot_pearson|dot_spearman|
+|:-------------------------|-----------------:|------------------:|--------------------:|---------------------:|--------------------:|---------------------:|--------------:|---------------:|
+|[**gmatrix-embedding**](https://huggingface.co/bi-matrix/gmatrix-embedding)|**75.86**|**65.75**|**72.65**|**65.20**|**72.48**|**65.32**|**64.71**|**53.90**|
+|[ko-sroberta-multitask](https://huggingface.co/jhgan/ko-sroberta-multitask)|71.78|63.16|70.80|63.47|70.89|63.72|53.57|44.23|
+|[bge-m3](https://huggingface.co/BAAI/bge-m3)|64.15|60.65|61.88|60.68|61.88|60.19|64.16|60.71|
+<br>
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6350f6750b94548566da3279/CcK0QL3oQAz7sJOCtH6PB.png)
+<br>
+## G-MATRIX Embedding 레이블링 판단 기준 (KLUE-RoBERTa의 STS 데이터 생성 참고)
+1. 두 문장의 유사한 정도를 보고 0~5점으로 판단
+2. 맞춤법, 띄어쓰기, 온점이나 쉼표 차이는 판단 대상이 아님
+3. 문장의 의도, 표현이 담고 있는 의미를 비교
+4. 두 문장에 공통적으로 사용된 단어의 유무를 찾는 것이 아닌, 문장의 의미가 유사한지를 비교
+5. 0은 의미적 유사성이 없는 경우이고, 5는 의미적으로 동등함을 뜻함
+## Training
+The model was trained with the parameters:
+**DataLoader**:
+`torch.utils.data.dataloader.DataLoader` of length 329 with parameters:
+```
+{'batch_size': 32, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
+```
+**Loss**:
+`sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss`
+## Full Model Architecture
+```
+SentenceTransformer(
+  (0): Transformer({'max_seq_length': 128, 'do_lower_case': True}) with Transformer model: DeBERTaV2Model
+  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
+)
+```
+## Citing & Authors
+<!--- Describe where people can find more information -->
+[MINSANG SONG] at [BI-Matrix](https://www.bimatrix.co.kr/)