CocoRoF commited on
Commit
cb2afd5
·
verified ·
1 Parent(s): 0e16ce5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +501 -157
README.md CHANGED
@@ -1,199 +1,543 @@
1
  ---
2
- library_name: transformers
3
- tags: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
 
 
11
 
12
  ## Model Details
13
 
14
  ### Model Description
 
 
 
 
 
 
 
 
 
15
 
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
 
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
 
 
67
 
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
 
70
- ## How to Get Started with the Model
 
 
 
 
 
 
71
 
72
- Use the code below to get started with the model.
73
 
74
- [More Information Needed]
75
 
76
- ## Training Details
77
 
78
- ### Training Data
 
 
79
 
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
 
81
 
82
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
83
 
84
- ### Training Procedure
 
 
 
 
85
 
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 
87
 
88
- #### Preprocessing [optional]
89
 
90
- [More Information Needed]
 
91
 
 
 
92
 
93
- #### Training Hyperparameters
94
 
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
 
97
- #### Speeds, Sizes, Times [optional]
 
98
 
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 
100
 
101
- [More Information Needed]
 
102
 
103
  ## Evaluation
104
 
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
 
119
- [More Information Needed]
120
 
121
- #### Metrics
 
122
 
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
 
 
 
 
 
 
 
 
 
 
 
124
 
125
- [More Information Needed]
 
126
 
127
- ### Results
 
128
 
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
 
193
- ## Model Card Authors [optional]
 
194
 
195
- [More Information Needed]
196
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
197
  ## Model Card Contact
198
 
199
- [More Information Needed]
 
 
1
  ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:392702
8
+ - loss:CosineSimilarityLoss
9
+ base_model: x2bee/KoModernBERT-base-mlm-v03-ckp00
10
+ widget:
11
+ - source_sentence: 우리는 움직이는 동행 우주 정지 좌표계에 비례하여 이동하고 있습니다 ... 약 371km / s에서 별자리 leo
12
+ 쪽으로. "
13
+ sentences:
14
+ - 두 마리의 독수리가 가지에 앉는다.
15
+ - 다른 물체와는 관련이 없는 '정지'는 없다.
16
+ - 소녀는 버스의 열린 문 앞에 서 있다.
17
+ - source_sentence: 숲에는 개들이 있다.
18
+ sentences:
19
+ - 양을 보는 아이들.
20
+ - 여왕의 배우자를 "왕"이라고 부르지 않는 것은 아주 좋은 이유가 있다. 왜냐하면 그들은 왕이 아니기 때문이다.
21
+ - 개들은 숲속에 혼자 있다.
22
+ - source_sentence: '첫째, 두 가지 다른 종류의 대시가 있다는 것을 알아야 합니다 : en 대시와 em 대시.'
23
+ sentences:
24
+ - 그들은 그 물건들을 집 주변에 두고 가거나 집의 정리를 해칠 의도가 없다.
25
+ - 세미콜론은 혼자 있을 수 있는 문장에 참여하는데 사용되지만, 그들의 관계를 강조하기 위해 결합됩니다.
26
+ - 그의 남동생이 지켜보는 동안 집 앞에서 트럼펫을 연주하는 금발의 아이.
27
+ - source_sentence: 한 여성이 생선 껍질을 벗기고 있다.
28
+ sentences:
29
+ - 한 남자가 수영장으로 뛰어들었다.
30
+ - 한 여성이 프라이팬에 노란 혼합물을 부어 넣고 있다.
31
+ - 두 마리의 갈색 개가 눈 속에서 서로 놀고 있다.
32
+ - source_sentence: 버스가 바쁜 길을 따라 운전한다.
33
+ sentences:
34
+ - 우리와 같은 태양계가 은하계 밖에서 존재할 수도 있을 것입니다.
35
+ - 그 여자는 데이트하러 가는 중이다.
36
+ - 녹색 버스가 도로를 따라 내려간다.
37
+ datasets:
38
+ - x2bee/Korean_NLI_dataset
39
+ - CocoRoF/sts_dev
40
+ pipeline_tag: sentence-similarity
41
+ library_name: sentence-transformers
42
+ metrics:
43
+ - pearson_cosine
44
+ - spearman_cosine
45
+ - pearson_euclidean
46
+ - spearman_euclidean
47
+ - pearson_manhattan
48
+ - spearman_manhattan
49
+ - pearson_dot
50
+ - spearman_dot
51
+ - pearson_max
52
+ - spearman_max
53
+ model-index:
54
+ - name: SentenceTransformer based on x2bee/KoModernBERT-base-mlm-v03-ckp00
55
+ results:
56
+ - task:
57
+ type: semantic-similarity
58
+ name: Semantic Similarity
59
+ dataset:
60
+ name: sts dev
61
+ type: sts_dev
62
+ metrics:
63
+ - type: pearson_cosine
64
+ value: 0.6463764324668821
65
+ name: Pearson Cosine
66
+ - type: spearman_cosine
67
+ value: 0.668749120795344
68
+ name: Spearman Cosine
69
+ - type: pearson_euclidean
70
+ value: 0.6434649881382908
71
+ name: Pearson Euclidean
72
+ - type: spearman_euclidean
73
+ value: 0.6535107003038169
74
+ name: Spearman Euclidean
75
+ - type: pearson_manhattan
76
+ value: 0.6516759845194007
77
+ name: Pearson Manhattan
78
+ - type: spearman_manhattan
79
+ value: 0.6679435004022668
80
+ name: Spearman Manhattan
81
+ - type: pearson_dot
82
+ value: 0.6306152465572834
83
+ name: Pearson Dot
84
+ - type: spearman_dot
85
+ value: 0.6496717700503837
86
+ name: Spearman Dot
87
+ - type: pearson_max
88
+ value: 0.6516759845194007
89
+ name: Pearson Max
90
+ - type: spearman_max
91
+ value: 0.668749120795344
92
+ name: Spearman Max
93
  ---
94
 
95
+ # SentenceTransformer based on x2bee/KoModernBERT-base-mlm-v03-ckp00
 
 
 
96
 
97
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [x2bee/KoModernBERT-base-mlm-v03-ckp00](https://huggingface.co/x2bee/KoModernBERT-base-mlm-v03-ckp00) on the [korean_nli_dataset](https://huggingface.co/datasets/x2bee/Korean_NLI_dataset) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
98
 
99
  ## Model Details
100
 
101
  ### Model Description
102
+ - **Model Type:** Sentence Transformer
103
+ - **Base model:** [x2bee/KoModernBERT-base-mlm-v03-ckp00](https://huggingface.co/x2bee/KoModernBERT-base-mlm-v03-ckp00) <!-- at revision addb15798678d7f76904915cf8045628d402b3ce -->
104
+ - **Maximum Sequence Length:** 512 tokens
105
+ - **Output Dimensionality:** 768 dimensions
106
+ - **Similarity Function:** Cosine Similarity
107
+ - **Training Dataset:**
108
+ - [korean_nli_dataset](https://huggingface.co/datasets/x2bee/Korean_NLI_dataset)
109
+ <!-- - **Language:** Unknown -->
110
+ <!-- - **License:** Unknown -->
111
 
112
+ ### Model Sources
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
113
 
114
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
115
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
116
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
117
 
118
+ ### Full Model Architecture
119
 
120
+ ```
121
+ SentenceTransformer(
122
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: ModernBertModel
123
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': True, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
124
+ (2): Dense({'in_features': 768, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
125
+ )
126
+ ```
127
 
128
+ ## Usage
129
 
130
+ ### Direct Usage (Sentence Transformers)
131
 
132
+ First install the Sentence Transformers library:
133
 
134
+ ```bash
135
+ pip install -U sentence-transformers
136
+ ```
137
 
138
+ Then you can load this model and run inference.
139
+ ```python
140
+ from sentence_transformers import SentenceTransformer
141
 
142
+ # Download from the 🤗 Hub
143
+ model = SentenceTransformer("x2bee/sts_nli_tune_test")
144
+ # Run inference
145
+ sentences = [
146
+ '버스가 바쁜 길을 따라 운전한다.',
147
+ '녹색 버스가 도로를 따라 내려간다.',
148
+ '그 여자는 데이트하러 가는 중이다.',
149
+ ]
150
+ embeddings = model.encode(sentences)
151
+ print(embeddings.shape)
152
+ # [3, 768]
153
 
154
+ # Get the similarity scores for the embeddings
155
+ similarities = model.similarity(embeddings, embeddings)
156
+ print(similarities.shape)
157
+ # [3, 3]
158
+ ```
159
 
160
+ <!--
161
+ ### Direct Usage (Transformers)
162
 
163
+ <details><summary>Click to see the direct usage in Transformers</summary>
164
 
165
+ </details>
166
+ -->
167
 
168
+ <!--
169
+ ### Downstream Usage (Sentence Transformers)
170
 
171
+ You can finetune this model on your own dataset.
172
 
173
+ <details><summary>Click to expand</summary>
174
 
175
+ </details>
176
+ -->
177
 
178
+ <!--
179
+ ### Out-of-Scope Use
180
 
181
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
182
+ -->
183
 
184
  ## Evaluation
185
 
186
+ ### Metrics
 
 
 
 
 
 
 
 
 
 
 
 
187
 
188
+ #### Semantic Similarity
189
 
190
+ * Dataset: `sts_dev`
191
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
192
 
193
+ | Metric | Value |
194
+ |:-------------------|:-----------|
195
+ | pearson_cosine | 0.6464 |
196
+ | spearman_cosine | 0.6687 |
197
+ | pearson_euclidean | 0.6435 |
198
+ | spearman_euclidean | 0.6535 |
199
+ | pearson_manhattan | 0.6517 |
200
+ | spearman_manhattan | 0.6679 |
201
+ | pearson_dot | 0.6306 |
202
+ | spearman_dot | 0.6497 |
203
+ | pearson_max | 0.6517 |
204
+ | **spearman_max** | **0.6687** |
205
 
206
+ <!--
207
+ ## Bias, Risks and Limitations
208
 
209
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
210
+ -->
211
 
212
+ <!--
213
+ ### Recommendations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
214
 
215
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
216
+ -->
217
 
218
+ ## Training Details
219
 
220
+ ### Training Dataset
221
+
222
+ #### korean_nli_dataset
223
+
224
+ * Dataset: [korean_nli_dataset](https://huggingface.co/datasets/x2bee/Korean_NLI_dataset) at [ef305ef](https://huggingface.co/datasets/x2bee/Korean_NLI_dataset/tree/ef305ef8e2d83c6991f30f2322f321efb5a3b9d1)
225
+ * Size: 392,702 training samples
226
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
227
+ * Approximate statistics based on the first 1000 samples:
228
+ | | sentence1 | sentence2 | score |
229
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
230
+ | type | string | string | float |
231
+ | details | <ul><li>min: 4 tokens</li><li>mean: 35.7 tokens</li><li>max: 194 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 19.92 tokens</li><li>max: 64 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.48</li><li>max: 1.0</li></ul> |
232
+ * Samples:
233
+ | sentence1 | sentence2 | score |
234
+ |:----------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------|:-----------------|
235
+ | <code>개념적으로 크림 스키밍은 제품과 지리라는 두 가지 기본 차원을 가지고 있다.</code> | <code>제품과 지리학은 크림 스키밍을 작동시키는 것이다.</code> | <code>0.5</code> |
236
+ | <code>시즌 중에 알고 있는 거 알아? 네 레벨에서 다음 레벨로 잃어버리는 거야 브레이브스가 모팀을 떠올���기로 결정하면 브레이브스가 트리플 A에서 한 남자를 떠올리기로 결정하면 더블 A가 그를 대신하러 올라가고 A 한 명이 그를 대신하러 올라간다.</code> | <code>사람들이 기억하면 다음 수준으로 물건을 잃는다.</code> | <code>1.0</code> |
237
+ | <code>우리 번호 중 하나가 당신의 지시를 세밀하게 수행할 것이다.</code> | <code>우리 팀의 일원이 당신의 명령을 엄청나게 정확하게 실행할 것이다.</code> | <code>1.0</code> |
238
+ * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
239
+ ```json
240
+ {
241
+ "loss_fct": "torch.nn.modules.loss.MSELoss"
242
+ }
243
+ ```
244
+
245
+ ### Evaluation Dataset
246
+
247
+ #### sts_dev
248
+
249
+ * Dataset: [sts_dev](https://huggingface.co/datasets/CocoRoF/sts_dev) at [1de0cdf](https://huggingface.co/datasets/CocoRoF/sts_dev/tree/1de0cdfb2c238786ee61c5765aa60eed4a782371)
250
+ * Size: 1,500 evaluation samples
251
+ * Columns: <code>text</code>, <code>pair</code>, and <code>label</code>
252
+ * Approximate statistics based on the first 1000 samples:
253
+ | | text | pair | label |
254
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
255
+ | type | string | string | float |
256
+ | details | <ul><li>min: 7 tokens</li><li>mean: 20.38 tokens</li><li>max: 52 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 20.52 tokens</li><li>max: 54 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.42</li><li>max: 1.0</li></ul> |
257
+ * Samples:
258
+ | text | pair | label |
259
+ |:-------------------------------------|:------------------------------------|:------------------|
260
+ | <code>안전모를 가진 한 남자가 춤을 추고 있다.</code> | <code>안전모를 쓴 한 남자가 춤을 추고 있다.</code> | <code>1.0</code> |
261
+ | <code>어린아이가 말을 타고 있다.</code> | <code>아이가 말을 타고 있다.</code> | <code>0.95</code> |
262
+ | <code>한 남자가 뱀에게 쥐를 먹이고 있다.</code> | <code>남자가 뱀에게 쥐를 먹이고 있다.</code> | <code>1.0</code> |
263
+ * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
264
+ ```json
265
+ {
266
+ "loss_fct": "torch.nn.modules.loss.MSELoss"
267
+ }
268
+ ```
269
+
270
+ ### Training Hyperparameters
271
+ #### Non-Default Hyperparameters
272
+
273
+ - `overwrite_output_dir`: True
274
+ - `eval_strategy`: steps
275
+ - `per_device_train_batch_size`: 16
276
+ - `per_device_eval_batch_size`: 16
277
+ - `gradient_accumulation_steps`: 8
278
+ - `warmup_ratio`: 0.1
279
+ - `push_to_hub`: True
280
+ - `hub_model_id`: x2bee/sts_nli_tune_test
281
+ - `hub_strategy`: checkpoint
282
+ - `batch_sampler`: no_duplicates
283
+
284
+ #### All Hyperparameters
285
+ <details><summary>Click to expand</summary>
286
+
287
+ - `overwrite_output_dir`: True
288
+ - `do_predict`: False
289
+ - `eval_strategy`: steps
290
+ - `prediction_loss_only`: True
291
+ - `per_device_train_batch_size`: 16
292
+ - `per_device_eval_batch_size`: 16
293
+ - `per_gpu_train_batch_size`: None
294
+ - `per_gpu_eval_batch_size`: None
295
+ - `gradient_accumulation_steps`: 8
296
+ - `eval_accumulation_steps`: None
297
+ - `torch_empty_cache_steps`: None
298
+ - `learning_rate`: 5e-05
299
+ - `weight_decay`: 0.0
300
+ - `adam_beta1`: 0.9
301
+ - `adam_beta2`: 0.999
302
+ - `adam_epsilon`: 1e-08
303
+ - `max_grad_norm`: 1.0
304
+ - `num_train_epochs`: 3.0
305
+ - `max_steps`: -1
306
+ - `lr_scheduler_type`: linear
307
+ - `lr_scheduler_kwargs`: {}
308
+ - `warmup_ratio`: 0.1
309
+ - `warmup_steps`: 0
310
+ - `log_level`: passive
311
+ - `log_level_replica`: warning
312
+ - `log_on_each_node`: True
313
+ - `logging_nan_inf_filter`: True
314
+ - `save_safetensors`: True
315
+ - `save_on_each_node`: False
316
+ - `save_only_model`: False
317
+ - `restore_callback_states_from_checkpoint`: False
318
+ - `no_cuda`: False
319
+ - `use_cpu`: False
320
+ - `use_mps_device`: False
321
+ - `seed`: 42
322
+ - `data_seed`: None
323
+ - `jit_mode_eval`: False
324
+ - `use_ipex`: False
325
+ - `bf16`: False
326
+ - `fp16`: False
327
+ - `fp16_opt_level`: O1
328
+ - `half_precision_backend`: auto
329
+ - `bf16_full_eval`: False
330
+ - `fp16_full_eval`: False
331
+ - `tf32`: None
332
+ - `local_rank`: 0
333
+ - `ddp_backend`: None
334
+ - `tpu_num_cores`: None
335
+ - `tpu_metrics_debug`: False
336
+ - `debug`: []
337
+ - `dataloader_drop_last`: True
338
+ - `dataloader_num_workers`: 0
339
+ - `dataloader_prefetch_factor`: None
340
+ - `past_index`: -1
341
+ - `disable_tqdm`: False
342
+ - `remove_unused_columns`: True
343
+ - `label_names`: None
344
+ - `load_best_model_at_end`: False
345
+ - `ignore_data_skip`: False
346
+ - `fsdp`: []
347
+ - `fsdp_min_num_params`: 0
348
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
349
+ - `fsdp_transformer_layer_cls_to_wrap`: None
350
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
351
+ - `deepspeed`: None
352
+ - `label_smoothing_factor`: 0.0
353
+ - `optim`: adamw_torch
354
+ - `optim_args`: None
355
+ - `adafactor`: False
356
+ - `group_by_length`: False
357
+ - `length_column_name`: length
358
+ - `ddp_find_unused_parameters`: None
359
+ - `ddp_bucket_cap_mb`: None
360
+ - `ddp_broadcast_buffers`: False
361
+ - `dataloader_pin_memory`: True
362
+ - `dataloader_persistent_workers`: False
363
+ - `skip_memory_metrics`: True
364
+ - `use_legacy_prediction_loop`: False
365
+ - `push_to_hub`: True
366
+ - `resume_from_checkpoint`: None
367
+ - `hub_model_id`: x2bee/sts_nli_tune_test
368
+ - `hub_strategy`: checkpoint
369
+ - `hub_private_repo`: None
370
+ - `hub_always_push`: False
371
+ - `gradient_checkpointing`: False
372
+ - `gradient_checkpointing_kwargs`: None
373
+ - `include_inputs_for_metrics`: False
374
+ - `include_for_metrics`: []
375
+ - `eval_do_concat_batches`: True
376
+ - `fp16_backend`: auto
377
+ - `push_to_hub_model_id`: None
378
+ - `push_to_hub_organization`: None
379
+ - `mp_parameters`:
380
+ - `auto_find_batch_size`: False
381
+ - `full_determinism`: False
382
+ - `torchdynamo`: None
383
+ - `ray_scope`: last
384
+ - `ddp_timeout`: 1800
385
+ - `torch_compile`: False
386
+ - `torch_compile_backend`: None
387
+ - `torch_compile_mode`: None
388
+ - `dispatch_batches`: None
389
+ - `split_batches`: None
390
+ - `include_tokens_per_second`: False
391
+ - `include_num_input_tokens_seen`: False
392
+ - `neftune_noise_alpha`: None
393
+ - `optim_target_modules`: None
394
+ - `batch_eval_metrics`: False
395
+ - `eval_on_start`: False
396
+ - `use_liger_kernel`: False
397
+ - `eval_use_gather_object`: False
398
+ - `average_tokens_across_devices`: False
399
+ - `prompts`: None
400
+ - `batch_sampler`: no_duplicates
401
+ - `multi_dataset_batch_sampler`: proportional
402
+
403
+ </details>
404
+
405
+ ### Training Logs
406
+ | Epoch | Step | Training Loss | Validation Loss | sts_dev_spearman_max |
407
+ |:------:|:----:|:-------------:|:---------------:|:--------------------:|
408
+ | 0.0326 | 25 | 0.3733 | - | - |
409
+ | 0.0652 | 50 | 0.362 | - | - |
410
+ | 0.0978 | 75 | 0.3543 | - | - |
411
+ | 0.1304 | 100 | 0.3431 | - | - |
412
+ | 0.1630 | 125 | 0.3273 | - | - |
413
+ | 0.1956 | 150 | 0.2745 | - | - |
414
+ | 0.2282 | 175 | 0.2061 | - | - |
415
+ | 0.2608 | 200 | 0.1814 | - | - |
416
+ | 0.2934 | 225 | 0.1658 | - | - |
417
+ | 0.3260 | 250 | 0.1637 | - | - |
418
+ | 0.3586 | 275 | 0.1542 | - | - |
419
+ | 0.3912 | 300 | 0.147 | - | - |
420
+ | 0.4238 | 325 | 0.1392 | - | - |
421
+ | 0.4564 | 350 | 0.1329 | - | - |
422
+ | 0.4890 | 375 | 0.131 | - | - |
423
+ | 0.5216 | 400 | 0.1294 | - | - |
424
+ | 0.5542 | 425 | 0.1245 | - | - |
425
+ | 0.5868 | 450 | 0.1243 | - | - |
426
+ | 0.6194 | 475 | 0.1237 | - | - |
427
+ | 0.6520 | 500 | 0.1236 | 0.0956 | 0.5284 |
428
+ | 0.6846 | 525 | 0.1183 | - | - |
429
+ | 0.7172 | 550 | 0.1166 | - | - |
430
+ | 0.7498 | 575 | 0.1176 | - | - |
431
+ | 0.7824 | 600 | 0.1144 | - | - |
432
+ | 0.8150 | 625 | 0.1141 | - | - |
433
+ | 0.8476 | 650 | 0.1093 | - | - |
434
+ | 0.8802 | 675 | 0.1081 | - | - |
435
+ | 0.9128 | 700 | 0.1082 | - | - |
436
+ | 0.9454 | 725 | 0.1078 | - | - |
437
+ | 0.9780 | 750 | 0.1039 | - | - |
438
+ | 1.0117 | 775 | 0.1106 | - | - |
439
+ | 1.0443 | 800 | 0.1113 | - | - |
440
+ | 1.0769 | 825 | 0.1113 | - | - |
441
+ | 1.1095 | 850 | 0.1103 | - | - |
442
+ | 1.1421 | 875 | 0.1098 | - | - |
443
+ | 1.1747 | 900 | 0.1118 | - | - |
444
+ | 1.2073 | 925 | 0.1085 | - | - |
445
+ | 1.2399 | 950 | 0.1057 | - | - |
446
+ | 1.2725 | 975 | 0.1081 | - | - |
447
+ | 1.3051 | 1000 | 0.1052 | 0.0930 | 0.5830 |
448
+ | 1.3377 | 1025 | 0.1087 | - | - |
449
+ | 1.3703 | 1050 | 0.1046 | - | - |
450
+ | 1.4029 | 1075 | 0.1032 | - | - |
451
+ | 1.4355 | 1100 | 0.1037 | - | - |
452
+ | 1.4681 | 1125 | 0.1026 | - | - |
453
+ | 1.5007 | 1150 | 0.1036 | - | - |
454
+ | 1.5333 | 1175 | 0.102 | - | - |
455
+ | 1.5659 | 1200 | 0.101 | - | - |
456
+ | 1.5985 | 1225 | 0.1014 | - | - |
457
+ | 1.6311 | 1250 | 0.1024 | - | - |
458
+ | 1.6637 | 1275 | 0.1005 | - | - |
459
+ | 1.6963 | 1300 | 0.0993 | - | - |
460
+ | 1.7289 | 1325 | 0.0982 | - | - |
461
+ | 1.7615 | 1350 | 0.0988 | - | - |
462
+ | 1.7941 | 1375 | 0.0965 | - | - |
463
+ | 1.8267 | 1400 | 0.0984 | - | - |
464
+ | 1.8593 | 1425 | 0.0936 | - | - |
465
+ | 1.8919 | 1450 | 0.0924 | - | - |
466
+ | 1.9245 | 1475 | 0.0956 | - | - |
467
+ | 1.9571 | 1500 | 0.0927 | 0.0732 | 0.6470 |
468
+ | 1.9897 | 1525 | 0.0915 | - | - |
469
+ | 2.0235 | 1550 | 0.0991 | - | - |
470
+ | 2.0561 | 1575 | 0.097 | - | - |
471
+ | 2.0887 | 1600 | 0.0957 | - | - |
472
+ | 2.1213 | 1625 | 0.0968 | - | - |
473
+ | 2.1539 | 1650 | 0.0968 | - | - |
474
+ | 2.1865 | 1675 | 0.0973 | - | - |
475
+ | 2.2191 | 1700 | 0.0936 | - | - |
476
+ | 2.2517 | 1725 | 0.0955 | - | - |
477
+ | 2.2843 | 1750 | 0.0942 | - | - |
478
+ | 2.3169 | 1775 | 0.0939 | - | - |
479
+ | 2.3495 | 1800 | 0.0947 | - | - |
480
+ | 2.3821 | 1825 | 0.0934 | - | - |
481
+ | 2.4147 | 1850 | 0.0919 | - | - |
482
+ | 2.4473 | 1875 | 0.0919 | - | - |
483
+ | 2.4799 | 1900 | 0.0928 | - | - |
484
+ | 2.5125 | 1925 | 0.0927 | - | - |
485
+ | 2.5451 | 1950 | 0.0899 | - | - |
486
+ | 2.5777 | 1975 | 0.0911 | - | - |
487
+ | 2.6103 | 2000 | 0.0915 | 0.0671 | 0.6687 |
488
+ | 2.6429 | 2025 | 0.0905 | - | - |
489
+ | 2.6755 | 2050 | 0.0894 | - | - |
490
+ | 2.7081 | 2075 | 0.0887 | - | - |
491
+ | 2.7407 | 2100 | 0.0903 | - | - |
492
+ | 2.7733 | 2125 | 0.0887 | - | - |
493
+ | 2.8059 | 2150 | 0.0869 | - | - |
494
+ | 2.8385 | 2175 | 0.0871 | - | - |
495
+ | 2.8711 | 2200 | 0.0843 | - | - |
496
+ | 2.9037 | 2225 | 0.0838 | - | - |
497
+ | 2.9363 | 2250 | 0.0864 | - | - |
498
+ | 2.9689 | 2275 | 0.0831 | - | - |
499
+
500
+
501
+ ### Framework Versions
502
+ - Python: 3.11.10
503
+ - Sentence Transformers: 3.3.1
504
+ - Transformers: 4.48.0
505
+ - PyTorch: 2.5.1+cu124
506
+ - Accelerate: 1.2.1
507
+ - Datasets: 3.2.0
508
+ - Tokenizers: 0.21.0
509
+
510
+ ## Citation
511
+
512
+ ### BibTeX
513
+
514
+ #### Sentence Transformers
515
+ ```bibtex
516
+ @inproceedings{reimers-2019-sentence-bert,
517
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
518
+ author = "Reimers, Nils and Gurevych, Iryna",
519
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
520
+ month = "11",
521
+ year = "2019",
522
+ publisher = "Association for Computational Linguistics",
523
+ url = "https://arxiv.org/abs/1908.10084",
524
+ }
525
+ ```
526
+
527
+ <!--
528
+ ## Glossary
529
+
530
+ *Clearly define terms in order to be accessible across audiences.*
531
+ -->
532
+
533
+ <!--
534
+ ## Model Card Authors
535
+
536
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
537
+ -->
538
+
539
+ <!--
540
  ## Model Card Contact
541
 
542
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
543
+ -->