bobox commited on
Commit
cb32f3b
·
verified ·
1 Parent(s): 8a66a18

Training in progress, step 208, checkpoint

Browse files
checkpoint-208/1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
checkpoint-208/README.md ADDED
@@ -0,0 +1,739 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: bobox/DeBERTa-small-ST-v1-test-step3
3
+ datasets: []
4
+ language: []
5
+ library_name: sentence-transformers
6
+ metrics:
7
+ - pearson_cosine
8
+ - spearman_cosine
9
+ - pearson_manhattan
10
+ - spearman_manhattan
11
+ - pearson_euclidean
12
+ - spearman_euclidean
13
+ - pearson_dot
14
+ - spearman_dot
15
+ - pearson_max
16
+ - spearman_max
17
+ pipeline_tag: sentence-similarity
18
+ tags:
19
+ - sentence-transformers
20
+ - sentence-similarity
21
+ - feature-extraction
22
+ - generated_from_trainer
23
+ - dataset_size:120849
24
+ - loss:CachedGISTEmbedLoss
25
+ widget:
26
+ - source_sentence: '"Today I lost those who for 24 years I called...my family," said
27
+ Enes Kanter of the Oklahoma City Thunder.
28
+
29
+ Turkish President Recep Tayyip Erdogan blames Mr Gulen for inciting a failed coup
30
+ last month and is seeking the cleric''s extradition to Turkey.
31
+
32
+ Mr Gulen, who has a large following, denies being involved in the coup.
33
+
34
+ Kanter''s father, Mehmet, disowned his son in a letter published on Monday by
35
+ Sabah, a pro-government newspaper.
36
+
37
+ Mehmet Kanter wrote his son had been "hypnotised" by the Gulen movement.
38
+
39
+ "With a feeling of shame I apologise to our president and the Turkish people for
40
+ having such a son," the letter said.
41
+
42
+ Q&A on the Gulen movement
43
+
44
+ Mr Gulen is regarded by followers as a spiritual leader and sometimes described
45
+ as Turkey''s second most powerful man.
46
+
47
+ Enes Kanter has been a vocal supporter of Mr Gulen on Twitter.
48
+
49
+ The movement - known in Turkey as Hizmet, or service - runs schools all over Turkey
50
+ and around the world, including in Turkic former Soviet republics, Muslim countries
51
+ such as Pakistan and Western nations including Romania and the US, where it runs
52
+ more than 100 schools.
53
+
54
+ In May 2016, the Turkish government formally declared the Gulen movement a terrorist
55
+ organisation.
56
+
57
+ After the failed coup, suspected Gulen supporters in Turkey were purged in a wave
58
+ of arrests.
59
+
60
+ Western nations have been critical of the government''s response to the coup.
61
+ US officials have said they will extradite Mr Gulen only if Turkey provides evidence.'
62
+ sentences:
63
+ - 'The Thinker | Rodin Museum H. 189 cm ; W. 98 cm ; D. 140 cm S.2838 When conceived
64
+ in 1880 in its original size (approx. 70 cm) as the crowning element of The Gates
65
+ of Hell , seated on the tympanum , The Thinker was entitled The Poet. He represented
66
+ Dante, author of the Divine Comedy which had inspired The Gates, leaning forward
67
+ to observe the circles of Hell, while meditating on his work. The Thinker was
68
+ therefore initially both a being with a tortured body, almost a damned soul, and
69
+ a free-thinking man, determined to transcend his suffering through poetry. The
70
+ pose of this figure owes much to Carpeaux’s Ugolino (1861) and to the seated portrait
71
+ of Lorenzo de’ Medici carved by Michelangelo (1526-31).   While remaining in place
72
+ on the monumental Gates of Hell, The Thinker was exhibited individually in 1888
73
+ and thus became an independent work. Enlarged in 1904, its colossal version proved
74
+ even more popular: this image of a man lost in thought, but whose powerful body
75
+ suggests a great capacity for action, has became one of the most celebrated sculptures
76
+ ever known. Numerous casts exist worldwide, including the one now in the gardens
77
+ of the Musée Rodin, a gift to the City of Paris installed outside the Panthéon
78
+ in 1906, and another in the gardens of Rodin’s house in Meudon, on the tomb of
79
+ the sculptor and his wife. George Bernard Shaw in the Pose of "The Thinker" Rodin,
80
+ the Monument to Victor Hugo and The Thinker Rodin''s "Thinker" in Dr Linde''s
81
+ Garden in Lübeck'
82
+ - An American basketball player has cut ties with his Turkish family over his support
83
+ for Pennsylvania-based preacher Fethullah Gulen.
84
+ - Police are investigating a death at a bus stop in Fife.
85
+ - source_sentence: Two adorable birds perched on a piece of bamboo.
86
+ sentences:
87
+ - Two birds are sitting perched on a tree limb
88
+ - A young boy with a spoon looking at a birthday cupcake.
89
+ - As part of his attempt to turn the Austrian right , Dessaix ordered a battalion
90
+ to move along the Aire stream near Tairier and Crache .
91
+ - source_sentence: how do venom snake keepers make money?
92
+ sentences:
93
+ - "The USDA regulates who can buy and sell snake venom. It is very important to\
94
+ \ learn about these regulations so that you can operate properly. On average,\
95
+ \ snake milkers make around $2,500 per month, but snake venom is an expensive\
96
+ \ market. One gram of certain types of snake venom can sell for $2,000.If you\
97
+ \ are crazy enough to capture, milk, and breed snakes, please take the precaution\
98
+ \ to wear protective clothing and always have antivenom close at hand.nake milkers\
99
+ \ have an insane job. They â\x80\x9Cmilkâ\x80\x9D snakes for their venom. This\
100
+ \ means that every single day, a snake milker handles deadly, venomous snakes.\
101
+ \ Itâ\x80\x99s a hands on job where you put your fingers millimeters away from\
102
+ \ the sharp, fangs of asps, vipers, cobras, corals, mambas, kraits, and rattlesnakes."
103
+ - a greenhouse is used to protect plants by keeping them warm
104
+ - Nashville Mayor Megan Barry has said her 22-year-old son died of what appeared
105
+ to be a drug overdose, according to a family statement.
106
+ - source_sentence: Adult bees include workers, a queen and what other type?
107
+ sentences:
108
+ - "matter vibrating can cause sound. Thus, sound is a wave in air . \n matter vibrating\
109
+ \ can cause a wave in air"
110
+ - His references in electronic music are Todd Terry , Armand Van Helden , Roger
111
+ Sanchez , Tiesto and the Epic Sax Guy.
112
+ - 'Look at the honeybees in Figure below . Honeybees live in colonies that may consist
113
+ of thousands of individual bees. Generally, there are three types of adult bees
114
+ in a colony: workers, a queen, and drones.'
115
+ - source_sentence: can an object have constant non zero velocity and changing acceleration?
116
+ sentences:
117
+ - when an animal sheds its fur , its fur becomes less dense
118
+ - Acceleration is defined as the time derivative of the velocity; if the velocity
119
+ is unchanging the acceleration is zero. Velocity is a vector, speed is a scalar
120
+ magnitude of the vector. If the velocity vector changes direction you can have
121
+ constant speed (not velocity) with a non-zero acceleration.
122
+ - Acne treatment is individual and customized to the type of acne you have. On average,
123
+ mild acne responds in 1-2 months, moderate acne responds in 2-4 months and severe
124
+ acne can take 4-6 months to clear, granted that the most effective measures can
125
+ be used.
126
+ model-index:
127
+ - name: SentenceTransformer based on bobox/DeBERTa-small-ST-v1-test-step3
128
+ results:
129
+ - task:
130
+ type: semantic-similarity
131
+ name: Semantic Similarity
132
+ dataset:
133
+ name: sts test
134
+ type: sts-test
135
+ metrics:
136
+ - type: pearson_cosine
137
+ value: 0.8742827064396023
138
+ name: Pearson Cosine
139
+ - type: spearman_cosine
140
+ value: 0.9022449488282648
141
+ name: Spearman Cosine
142
+ - type: pearson_manhattan
143
+ value: 0.9078074095863298
144
+ name: Pearson Manhattan
145
+ - type: spearman_manhattan
146
+ value: 0.9048224424793636
147
+ name: Spearman Manhattan
148
+ - type: pearson_euclidean
149
+ value: 0.9072144921384091
150
+ name: Pearson Euclidean
151
+ - type: spearman_euclidean
152
+ value: 0.9046033403035915
153
+ name: Spearman Euclidean
154
+ - type: pearson_dot
155
+ value: 0.8524658058205945
156
+ name: Pearson Dot
157
+ - type: spearman_dot
158
+ value: 0.8547093534556153
159
+ name: Spearman Dot
160
+ - type: pearson_max
161
+ value: 0.9078074095863298
162
+ name: Pearson Max
163
+ - type: spearman_max
164
+ value: 0.9048224424793636
165
+ name: Spearman Max
166
+ ---
167
+
168
+ # SentenceTransformer based on bobox/DeBERTa-small-ST-v1-test-step3
169
+
170
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [bobox/DeBERTa-small-ST-v1-test-step3](https://huggingface.co/bobox/DeBERTa-small-ST-v1-test-step3) on the bobox/enhanced_nli-50_k dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
171
+
172
+ ## Model Details
173
+
174
+ ### Model Description
175
+ - **Model Type:** Sentence Transformer
176
+ - **Base model:** [bobox/DeBERTa-small-ST-v1-test-step3](https://huggingface.co/bobox/DeBERTa-small-ST-v1-test-step3) <!-- at revision df9aaa75fe0c2791e5ed35ff33de1689d9a5f5ff -->
177
+ - **Maximum Sequence Length:** 512 tokens
178
+ - **Output Dimensionality:** 768 tokens
179
+ - **Similarity Function:** Cosine Similarity
180
+ - **Training Dataset:**
181
+ - bobox/enhanced_nli-50_k
182
+ <!-- - **Language:** Unknown -->
183
+ <!-- - **License:** Unknown -->
184
+
185
+ ### Model Sources
186
+
187
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
188
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
189
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
190
+
191
+ ### Full Model Architecture
192
+
193
+ ```
194
+ SentenceTransformer(
195
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model
196
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
197
+ )
198
+ ```
199
+
200
+ ## Usage
201
+
202
+ ### Direct Usage (Sentence Transformers)
203
+
204
+ First install the Sentence Transformers library:
205
+
206
+ ```bash
207
+ pip install -U sentence-transformers
208
+ ```
209
+
210
+ Then you can load this model and run inference.
211
+ ```python
212
+ from sentence_transformers import SentenceTransformer
213
+
214
+ # Download from the 🤗 Hub
215
+ model = SentenceTransformer("bobox/DeBERTa-small-ST-v1-test-UnifiedDatasets-checkpoints-tmp")
216
+ # Run inference
217
+ sentences = [
218
+ 'can an object have constant non zero velocity and changing acceleration?',
219
+ 'Acceleration is defined as the time derivative of the velocity; if the velocity is unchanging the acceleration is zero. Velocity is a vector, speed is a scalar magnitude of the vector. If the velocity vector changes direction you can have constant speed (not velocity) with a non-zero acceleration.',
220
+ 'Acne treatment is individual and customized to the type of acne you have. On average, mild acne responds in 1-2 months, moderate acne responds in 2-4 months and severe acne can take 4-6 months to clear, granted that the most effective measures can be used.',
221
+ ]
222
+ embeddings = model.encode(sentences)
223
+ print(embeddings.shape)
224
+ # [3, 768]
225
+
226
+ # Get the similarity scores for the embeddings
227
+ similarities = model.similarity(embeddings, embeddings)
228
+ print(similarities.shape)
229
+ # [3, 3]
230
+ ```
231
+
232
+ <!--
233
+ ### Direct Usage (Transformers)
234
+
235
+ <details><summary>Click to see the direct usage in Transformers</summary>
236
+
237
+ </details>
238
+ -->
239
+
240
+ <!--
241
+ ### Downstream Usage (Sentence Transformers)
242
+
243
+ You can finetune this model on your own dataset.
244
+
245
+ <details><summary>Click to expand</summary>
246
+
247
+ </details>
248
+ -->
249
+
250
+ <!--
251
+ ### Out-of-Scope Use
252
+
253
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
254
+ -->
255
+
256
+ ## Evaluation
257
+
258
+ ### Metrics
259
+
260
+ #### Semantic Similarity
261
+ * Dataset: `sts-test`
262
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
263
+
264
+ | Metric | Value |
265
+ |:--------------------|:-----------|
266
+ | pearson_cosine | 0.8743 |
267
+ | **spearman_cosine** | **0.9022** |
268
+ | pearson_manhattan | 0.9078 |
269
+ | spearman_manhattan | 0.9048 |
270
+ | pearson_euclidean | 0.9072 |
271
+ | spearman_euclidean | 0.9046 |
272
+ | pearson_dot | 0.8525 |
273
+ | spearman_dot | 0.8547 |
274
+ | pearson_max | 0.9078 |
275
+ | spearman_max | 0.9048 |
276
+
277
+ <!--
278
+ ## Bias, Risks and Limitations
279
+
280
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
281
+ -->
282
+
283
+ <!--
284
+ ### Recommendations
285
+
286
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
287
+ -->
288
+
289
+ ## Training Details
290
+
291
+ ### Training Dataset
292
+
293
+ #### bobox/enhanced_nli-50_k
294
+
295
+ * Dataset: bobox/enhanced_nli-50_k
296
+ * Size: 120,849 training samples
297
+ * Columns: <code>sentence1</code> and <code>sentence2</code>
298
+ * Approximate statistics based on the first 1000 samples:
299
+ | | sentence1 | sentence2 |
300
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
301
+ | type | string | string |
302
+ | details | <ul><li>min: 4 tokens</li><li>mean: 32.01 tokens</li><li>max: 336 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 60.45 tokens</li><li>max: 512 tokens</li></ul> |
303
+ * Samples:
304
+ | sentence1 | sentence2 |
305
+ |:---------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
306
+ | <code>A lady working in a kitchen with several different types of dishes.</code> | <code>A woman is cooking and cleaning in her kitchen.</code> |
307
+ | <code>can you renew your licence online sa?</code> | <code>You can renew your licence online for as long as your photo is valid. Renew your driver's licence online with a mySA GOV account. With a mySA GOV account, you can access a legally compliant digital licence through the mySA GOV app.</code> |
308
+ | <code>how can coconut oil lower cholesterol</code> | <code>It has been shown that lauric acid increases the good HDL cholesterol in the blood to help improve cholesterol ratio levels. Coconut oil lowers cholesterol by promoting its conversion to pregnenolone, a molecule that is a precursor to many of the hormones our bodies need. Coconut can help restore normal thyroid function. When the thyroid does not function optimally, it can contribute to higher levels of bad cholesterol.</code> |
309
+ * Loss: [<code>CachedGISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedgistembedloss) with these parameters:
310
+ ```json
311
+ {'guide': SentenceTransformer(
312
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
313
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
314
+ (2): Normalize()
315
+ ), 'temperature': 0.025}
316
+ ```
317
+
318
+ ### Evaluation Dataset
319
+
320
+ #### bobox/enhanced_nli-50_k
321
+
322
+ * Dataset: bobox/enhanced_nli-50_k
323
+ * Size: 3,052 evaluation samples
324
+ * Columns: <code>sentence1</code> and <code>sentence2</code>
325
+ * Approximate statistics based on the first 1000 samples:
326
+ | | sentence1 | sentence2 |
327
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
328
+ | type | string | string |
329
+ | details | <ul><li>min: 4 tokens</li><li>mean: 32.91 tokens</li><li>max: 342 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 60.3 tokens</li><li>max: 408 tokens</li></ul> |
330
+ * Samples:
331
+ | sentence1 | sentence2 |
332
+ |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
333
+ | <code>The body was found in the River Avon in Bath, Avon and Somerset Police said.<br>Officers said although formal identification had not yet taken place, Henry Burke's family had been told.<br>Earlier officers said they were looking for Mr Burke, who was last seen leaving a nightclub in George Street late on Thursday.<br>A force spokesman said the death was being treated as unexplained and inquiries were continuing.<br>Mr Burke's girlfriend, Em Comley, earlier said he had been texting her "throughout the night" but then the messages suddenly stopped just after midnight.</code> | <code>A man's body has been found in a river after search and rescue teams were called in to try and find a missing 19-year-old student.</code> |
334
+ | <code>what happens when the president of united states is impeached?</code> | <code>Parliament votes on the proposal by secret ballot, and if two thirds of all representatives agree, the president is impeached. Once impeached, the president's powers are suspended, and the Constitutional Court decides whether or not the President should be removed from office.</code> |
335
+ | <code>What can feed at more than one trophic level?</code> | <code>Many consumers feed at more than one trophic level.. Nuts are also consumed by deer, turkey, foxes, wood ducks and squirrels. <br> wood ducks can feed at more than one trophic level</code> |
336
+ * Loss: [<code>CachedGISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedgistembedloss) with these parameters:
337
+ ```json
338
+ {'guide': SentenceTransformer(
339
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
340
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
341
+ (2): Normalize()
342
+ ), 'temperature': 0.025}
343
+ ```
344
+
345
+ ### Training Hyperparameters
346
+ #### Non-Default Hyperparameters
347
+
348
+ - `eval_strategy`: steps
349
+ - `per_device_train_batch_size`: 960
350
+ - `per_device_eval_batch_size`: 128
351
+ - `learning_rate`: 3.5e-05
352
+ - `weight_decay`: 0.0001
353
+ - `num_train_epochs`: 2
354
+ - `lr_scheduler_type`: cosine_with_min_lr
355
+ - `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 5.833333333333333e-06}
356
+ - `warmup_ratio`: 0.25
357
+ - `save_safetensors`: False
358
+ - `fp16`: True
359
+ - `push_to_hub`: True
360
+ - `hub_model_id`: bobox/DeBERTa-small-ST-v1-test-UnifiedDatasets-checkpoints-tmp
361
+ - `hub_strategy`: all_checkpoints
362
+ - `batch_sampler`: no_duplicates
363
+
364
+ #### All Hyperparameters
365
+ <details><summary>Click to expand</summary>
366
+
367
+ - `overwrite_output_dir`: False
368
+ - `do_predict`: False
369
+ - `eval_strategy`: steps
370
+ - `prediction_loss_only`: True
371
+ - `per_device_train_batch_size`: 960
372
+ - `per_device_eval_batch_size`: 128
373
+ - `per_gpu_train_batch_size`: None
374
+ - `per_gpu_eval_batch_size`: None
375
+ - `gradient_accumulation_steps`: 1
376
+ - `eval_accumulation_steps`: None
377
+ - `torch_empty_cache_steps`: None
378
+ - `learning_rate`: 3.5e-05
379
+ - `weight_decay`: 0.0001
380
+ - `adam_beta1`: 0.9
381
+ - `adam_beta2`: 0.999
382
+ - `adam_epsilon`: 1e-08
383
+ - `max_grad_norm`: 1.0
384
+ - `num_train_epochs`: 2
385
+ - `max_steps`: -1
386
+ - `lr_scheduler_type`: cosine_with_min_lr
387
+ - `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 5.833333333333333e-06}
388
+ - `warmup_ratio`: 0.25
389
+ - `warmup_steps`: 0
390
+ - `log_level`: passive
391
+ - `log_level_replica`: warning
392
+ - `log_on_each_node`: True
393
+ - `logging_nan_inf_filter`: True
394
+ - `save_safetensors`: False
395
+ - `save_on_each_node`: False
396
+ - `save_only_model`: False
397
+ - `restore_callback_states_from_checkpoint`: False
398
+ - `no_cuda`: False
399
+ - `use_cpu`: False
400
+ - `use_mps_device`: False
401
+ - `seed`: 42
402
+ - `data_seed`: None
403
+ - `jit_mode_eval`: False
404
+ - `use_ipex`: False
405
+ - `bf16`: False
406
+ - `fp16`: True
407
+ - `fp16_opt_level`: O1
408
+ - `half_precision_backend`: auto
409
+ - `bf16_full_eval`: False
410
+ - `fp16_full_eval`: False
411
+ - `tf32`: None
412
+ - `local_rank`: 0
413
+ - `ddp_backend`: None
414
+ - `tpu_num_cores`: None
415
+ - `tpu_metrics_debug`: False
416
+ - `debug`: []
417
+ - `dataloader_drop_last`: False
418
+ - `dataloader_num_workers`: 0
419
+ - `dataloader_prefetch_factor`: None
420
+ - `past_index`: -1
421
+ - `disable_tqdm`: False
422
+ - `remove_unused_columns`: True
423
+ - `label_names`: None
424
+ - `load_best_model_at_end`: False
425
+ - `ignore_data_skip`: False
426
+ - `fsdp`: []
427
+ - `fsdp_min_num_params`: 0
428
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
429
+ - `fsdp_transformer_layer_cls_to_wrap`: None
430
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
431
+ - `deepspeed`: None
432
+ - `label_smoothing_factor`: 0.0
433
+ - `optim`: adamw_torch
434
+ - `optim_args`: None
435
+ - `adafactor`: False
436
+ - `group_by_length`: False
437
+ - `length_column_name`: length
438
+ - `ddp_find_unused_parameters`: None
439
+ - `ddp_bucket_cap_mb`: None
440
+ - `ddp_broadcast_buffers`: False
441
+ - `dataloader_pin_memory`: True
442
+ - `dataloader_persistent_workers`: False
443
+ - `skip_memory_metrics`: True
444
+ - `use_legacy_prediction_loop`: False
445
+ - `push_to_hub`: True
446
+ - `resume_from_checkpoint`: None
447
+ - `hub_model_id`: bobox/DeBERTa-small-ST-v1-test-UnifiedDatasets-checkpoints-tmp
448
+ - `hub_strategy`: all_checkpoints
449
+ - `hub_private_repo`: False
450
+ - `hub_always_push`: False
451
+ - `gradient_checkpointing`: False
452
+ - `gradient_checkpointing_kwargs`: None
453
+ - `include_inputs_for_metrics`: False
454
+ - `eval_do_concat_batches`: True
455
+ - `fp16_backend`: auto
456
+ - `push_to_hub_model_id`: None
457
+ - `push_to_hub_organization`: None
458
+ - `mp_parameters`:
459
+ - `auto_find_batch_size`: False
460
+ - `full_determinism`: False
461
+ - `torchdynamo`: None
462
+ - `ray_scope`: last
463
+ - `ddp_timeout`: 1800
464
+ - `torch_compile`: False
465
+ - `torch_compile_backend`: None
466
+ - `torch_compile_mode`: None
467
+ - `dispatch_batches`: None
468
+ - `split_batches`: None
469
+ - `include_tokens_per_second`: False
470
+ - `include_num_input_tokens_seen`: False
471
+ - `neftune_noise_alpha`: None
472
+ - `optim_target_modules`: None
473
+ - `batch_eval_metrics`: False
474
+ - `eval_on_start`: False
475
+ - `eval_use_gather_object`: False
476
+ - `batch_sampler`: no_duplicates
477
+ - `multi_dataset_batch_sampler`: proportional
478
+
479
+ </details>
480
+
481
+ ### Training Logs
482
+ <details><summary>Click to expand</summary>
483
+
484
+ | Epoch | Step | Training Loss | loss | sts-test_spearman_cosine |
485
+ |:------:|:----:|:-------------:|:------:|:------------------------:|
486
+ | 0.0079 | 1 | 0.404 | - | - |
487
+ | 0.0159 | 2 | 0.3185 | - | - |
488
+ | 0.0238 | 3 | 0.2821 | - | - |
489
+ | 0.0317 | 4 | 0.4036 | - | - |
490
+ | 0.0397 | 5 | 0.3442 | 0.1253 | 0.9078 |
491
+ | 0.0476 | 6 | 0.4145 | - | - |
492
+ | 0.0556 | 7 | 0.4224 | - | - |
493
+ | 0.0635 | 8 | 0.4048 | - | - |
494
+ | 0.0714 | 9 | 0.3899 | - | - |
495
+ | 0.0794 | 10 | 0.4127 | 0.1237 | 0.9079 |
496
+ | 0.0873 | 11 | 0.3496 | - | - |
497
+ | 0.0952 | 12 | 0.3731 | - | - |
498
+ | 0.1032 | 13 | 0.3929 | - | - |
499
+ | 0.1111 | 14 | 0.2957 | - | - |
500
+ | 0.1190 | 15 | 0.3324 | 0.1206 | 0.9083 |
501
+ | 0.1270 | 16 | 0.3341 | - | - |
502
+ | 0.1349 | 17 | 0.3466 | - | - |
503
+ | 0.1429 | 18 | 0.3558 | - | - |
504
+ | 0.1508 | 19 | 0.2634 | - | - |
505
+ | 0.1587 | 20 | 0.3095 | 0.1156 | 0.9088 |
506
+ | 0.1667 | 21 | 0.2973 | - | - |
507
+ | 0.1746 | 22 | 0.2884 | - | - |
508
+ | 0.1825 | 23 | 0.3697 | - | - |
509
+ | 0.1905 | 24 | 0.2683 | - | - |
510
+ | 0.1984 | 25 | 0.3026 | 0.1096 | 0.9088 |
511
+ | 0.2063 | 26 | 0.2441 | - | - |
512
+ | 0.2143 | 27 | 0.3145 | - | - |
513
+ | 0.2222 | 28 | 0.3119 | - | - |
514
+ | 0.2302 | 29 | 0.2766 | - | - |
515
+ | 0.2381 | 30 | 0.3343 | 0.1054 | 0.9084 |
516
+ | 0.2460 | 31 | 0.344 | - | - |
517
+ | 0.2540 | 32 | 0.3005 | - | - |
518
+ | 0.2619 | 33 | 0.2526 | - | - |
519
+ | 0.2698 | 34 | 0.2422 | - | - |
520
+ | 0.2778 | 35 | 0.3447 | 0.1022 | 0.9072 |
521
+ | 0.2857 | 36 | 0.2809 | - | - |
522
+ | 0.2937 | 37 | 0.2836 | - | - |
523
+ | 0.3016 | 38 | 0.2878 | - | - |
524
+ | 0.3095 | 39 | 0.2738 | - | - |
525
+ | 0.3175 | 40 | 0.2806 | 0.1003 | 0.9065 |
526
+ | 0.3254 | 41 | 0.2797 | - | - |
527
+ | 0.3333 | 42 | 0.3217 | - | - |
528
+ | 0.3413 | 43 | 0.2544 | - | - |
529
+ | 0.3492 | 44 | 0.3203 | - | - |
530
+ | 0.3571 | 45 | 0.2987 | 0.0990 | 0.9064 |
531
+ | 0.3651 | 46 | 0.2765 | - | - |
532
+ | 0.3730 | 47 | 0.2716 | - | - |
533
+ | 0.3810 | 48 | 0.3726 | - | - |
534
+ | 0.3889 | 49 | 0.2963 | - | - |
535
+ | 0.3968 | 50 | 0.2784 | 0.0952 | 0.9072 |
536
+ | 0.4048 | 51 | 0.2437 | - | - |
537
+ | 0.4127 | 52 | 0.2258 | - | - |
538
+ | 0.4206 | 53 | 0.2821 | - | - |
539
+ | 0.4286 | 54 | 0.249 | - | - |
540
+ | 0.4365 | 55 | 0.2813 | 0.0928 | 0.9080 |
541
+ | 0.4444 | 56 | 0.3003 | - | - |
542
+ | 0.4524 | 57 | 0.2812 | - | - |
543
+ | 0.4603 | 58 | 0.2619 | - | - |
544
+ | 0.4683 | 59 | 0.299 | - | - |
545
+ | 0.4762 | 60 | 0.2706 | 0.0927 | 0.9088 |
546
+ | 0.4841 | 61 | 0.297 | - | - |
547
+ | 0.4921 | 62 | 0.2906 | - | - |
548
+ | 0.5 | 63 | 0.2914 | - | - |
549
+ | 0.5079 | 64 | 0.2669 | - | - |
550
+ | 0.5159 | 65 | 0.2723 | 0.0946 | 0.9093 |
551
+ | 0.5238 | 66 | 0.3194 | - | - |
552
+ | 0.5317 | 67 | 0.3585 | - | - |
553
+ | 0.5397 | 68 | 0.2843 | - | - |
554
+ | 0.5476 | 69 | 0.1916 | - | - |
555
+ | 0.5556 | 70 | 0.351 | 0.0971 | 0.9104 |
556
+ | 0.5635 | 71 | 0.3105 | - | - |
557
+ | 0.5714 | 72 | 0.2847 | - | - |
558
+ | 0.5794 | 73 | 0.2641 | - | - |
559
+ | 0.5873 | 74 | 0.3305 | - | - |
560
+ | 0.5952 | 75 | 0.2461 | 0.0965 | 0.9096 |
561
+ | 0.6032 | 76 | 0.259 | - | - |
562
+ | 0.6111 | 77 | 0.2506 | - | - |
563
+ | 0.6190 | 78 | 0.2832 | - | - |
564
+ | 0.6270 | 79 | 0.3322 | - | - |
565
+ | 0.6349 | 80 | 0.2533 | 0.1001 | 0.9089 |
566
+ | 0.6429 | 81 | 0.2349 | - | - |
567
+ | 0.6508 | 82 | 0.2748 | - | - |
568
+ | 0.6587 | 83 | 0.223 | - | - |
569
+ | 0.6667 | 84 | 0.2416 | - | - |
570
+ | 0.6746 | 85 | 0.2637 | 0.1034 | 0.9082 |
571
+ | 0.6825 | 86 | 0.2856 | - | - |
572
+ | 0.6905 | 87 | 0.2476 | - | - |
573
+ | 0.6984 | 88 | 0.2427 | - | - |
574
+ | 0.7063 | 89 | 0.2614 | - | - |
575
+ | 0.7143 | 90 | 0.26 | 0.1032 | 0.9088 |
576
+ | 0.7222 | 91 | 0.1862 | - | - |
577
+ | 0.7302 | 92 | 0.267 | - | - |
578
+ | 0.7381 | 93 | 0.2175 | - | - |
579
+ | 0.7460 | 94 | 0.2079 | - | - |
580
+ | 0.7540 | 95 | 0.2562 | 0.0999 | 0.9086 |
581
+ | 0.7619 | 96 | 0.2516 | - | - |
582
+ | 0.7698 | 97 | 0.2956 | - | - |
583
+ | 0.7778 | 98 | 0.2733 | - | - |
584
+ | 0.7857 | 99 | 0.2919 | - | - |
585
+ | 0.7937 | 100 | 0.2997 | 0.1032 | 0.9069 |
586
+ | 0.8016 | 101 | 0.2276 | - | - |
587
+ | 0.8095 | 102 | 0.2582 | - | - |
588
+ | 0.8175 | 103 | 0.2559 | - | - |
589
+ | 0.8254 | 104 | 0.2864 | - | - |
590
+ | 0.8333 | 105 | 0.2839 | 0.1074 | 0.9076 |
591
+ | 0.8413 | 106 | 0.2549 | - | - |
592
+ | 0.8492 | 107 | 0.2826 | - | - |
593
+ | 0.8571 | 108 | 0.2334 | - | - |
594
+ | 0.8651 | 109 | 0.2632 | - | - |
595
+ | 0.8730 | 110 | 0.2255 | 0.1090 | 0.9056 |
596
+ | 0.8810 | 111 | 0.2589 | - | - |
597
+ | 0.8889 | 112 | 0.2569 | - | - |
598
+ | 0.8968 | 113 | 0.2797 | - | - |
599
+ | 0.9048 | 114 | 0.2742 | - | - |
600
+ | 0.9127 | 115 | 0.2295 | 0.1070 | 0.9014 |
601
+ | 0.9206 | 116 | 0.2047 | - | - |
602
+ | 0.9286 | 117 | 0.2577 | - | - |
603
+ | 0.9365 | 118 | 0.2614 | - | - |
604
+ | 0.9444 | 119 | 0.2722 | - | - |
605
+ | 0.9524 | 120 | 0.1927 | 0.1024 | 0.9008 |
606
+ | 0.9603 | 121 | 0.2649 | - | - |
607
+ | 0.9683 | 122 | 0.2386 | - | - |
608
+ | 0.9762 | 123 | 0.2801 | - | - |
609
+ | 0.9841 | 124 | 0.2583 | - | - |
610
+ | 0.9921 | 125 | 0.3076 | 0.0949 | 0.9016 |
611
+ | 1.0 | 126 | 0.5477 | - | - |
612
+ | 1.0079 | 127 | 0.0031 | - | - |
613
+ | 1.0159 | 128 | 0.0 | - | - |
614
+ | 1.0238 | 129 | 0.0 | - | - |
615
+ | 1.0317 | 130 | 0.0 | 0.0955 | 0.9021 |
616
+ | 1.0397 | 131 | 0.0 | - | - |
617
+ | 1.0476 | 132 | 0.0 | - | - |
618
+ | 1.0556 | 133 | 0.0 | - | - |
619
+ | 1.0635 | 134 | 0.0 | - | - |
620
+ | 1.0714 | 135 | 0.0 | 0.0968 | 0.9023 |
621
+ | 1.0794 | 136 | 0.0 | - | - |
622
+ | 1.0873 | 137 | 0.0 | - | - |
623
+ | 1.0952 | 138 | 0.0 | - | - |
624
+ | 1.1032 | 139 | 0.0 | - | - |
625
+ | 1.1111 | 140 | 0.0 | 0.0978 | 0.9024 |
626
+ | 1.1190 | 141 | 0.0 | - | - |
627
+ | 1.1270 | 142 | 0.0 | - | - |
628
+ | 1.1349 | 143 | 0.0 | - | - |
629
+ | 1.1429 | 144 | 0.0 | - | - |
630
+ | 1.1508 | 145 | 0.0 | 0.0986 | 0.9024 |
631
+ | 1.1587 | 146 | 0.0 | - | - |
632
+ | 1.1667 | 147 | 0.0 | - | - |
633
+ | 1.1746 | 148 | 0.0 | - | - |
634
+ | 1.1825 | 149 | 0.0 | - | - |
635
+ | 1.1905 | 150 | 0.0 | 0.0991 | 0.9023 |
636
+ | 1.1984 | 151 | 0.0 | - | - |
637
+ | 1.2063 | 152 | 0.0 | - | - |
638
+ | 1.2143 | 153 | 0.0 | - | - |
639
+ | 1.2222 | 154 | 0.0 | - | - |
640
+ | 1.2302 | 155 | 0.0 | 0.0994 | 0.9023 |
641
+ | 1.2381 | 156 | 0.0 | - | - |
642
+ | 1.2460 | 157 | 0.0 | - | - |
643
+ | 1.2540 | 158 | 0.0 | - | - |
644
+ | 1.2619 | 159 | 0.0 | - | - |
645
+ | 1.2698 | 160 | 0.0 | 0.0995 | 0.9023 |
646
+ | 1.2778 | 161 | 0.0 | - | - |
647
+ | 1.2857 | 162 | 0.0 | - | - |
648
+ | 1.2937 | 163 | 0.0 | - | - |
649
+ | 1.3016 | 164 | 0.0 | - | - |
650
+ | 1.3095 | 165 | 0.0 | 0.0996 | 0.9023 |
651
+ | 1.3175 | 166 | 0.0 | - | - |
652
+ | 1.3254 | 167 | 0.0 | - | - |
653
+ | 1.3333 | 168 | 0.0 | - | - |
654
+ | 1.3413 | 169 | 0.0 | - | - |
655
+ | 1.3492 | 170 | 0.0 | 0.0997 | 0.9023 |
656
+ | 1.3571 | 171 | 0.0 | - | - |
657
+ | 1.3651 | 172 | 0.0 | - | - |
658
+ | 1.3730 | 173 | 0.0 | - | - |
659
+ | 1.3810 | 174 | 0.0 | - | - |
660
+ | 1.3889 | 175 | 0.0 | 0.0997 | 0.9023 |
661
+ | 1.3968 | 176 | 0.0 | - | - |
662
+ | 1.4048 | 177 | 0.0 | - | - |
663
+ | 1.4127 | 178 | 0.0 | - | - |
664
+ | 1.4206 | 179 | 0.0 | - | - |
665
+ | 1.4286 | 180 | 0.0 | 0.0997 | 0.9023 |
666
+ | 1.4365 | 181 | 0.0 | - | - |
667
+ | 1.4444 | 182 | 0.0 | - | - |
668
+ | 1.4524 | 183 | 0.0 | - | - |
669
+ | 1.4603 | 184 | 0.0 | - | - |
670
+ | 1.4683 | 185 | 0.0 | 0.0998 | 0.9023 |
671
+ | 1.4762 | 186 | 0.0 | - | - |
672
+ | 1.4841 | 187 | 0.0 | - | - |
673
+ | 1.4921 | 188 | 0.0 | - | - |
674
+ | 1.5 | 189 | 0.0 | - | - |
675
+ | 1.5079 | 190 | 0.0 | 0.0998 | 0.9023 |
676
+ | 1.5159 | 191 | 0.0 | - | - |
677
+ | 1.5238 | 192 | 0.0 | - | - |
678
+ | 1.5317 | 193 | 0.0 | - | - |
679
+ | 1.5397 | 194 | 0.0 | - | - |
680
+ | 1.5476 | 195 | 0.0 | 0.0998 | 0.9022 |
681
+ | 1.5556 | 196 | 0.0 | - | - |
682
+ | 1.5635 | 197 | 0.0 | - | - |
683
+ | 1.5714 | 198 | 0.0 | - | - |
684
+ | 1.5794 | 199 | 0.0 | - | - |
685
+ | 1.5873 | 200 | 0.0 | 0.0998 | 0.9022 |
686
+ | 1.5952 | 201 | 0.0 | - | - |
687
+ | 1.6032 | 202 | 0.0 | - | - |
688
+ | 1.6111 | 203 | 0.0 | - | - |
689
+ | 1.6190 | 204 | 0.0 | - | - |
690
+ | 1.6270 | 205 | 0.0 | 0.0998 | 0.9022 |
691
+ | 1.6349 | 206 | 0.0 | - | - |
692
+ | 1.6429 | 207 | 0.0 | - | - |
693
+ | 1.6508 | 208 | 0.0 | - | - |
694
+
695
+ </details>
696
+
697
+ ### Framework Versions
698
+ - Python: 3.10.14
699
+ - Sentence Transformers: 3.0.1
700
+ - Transformers: 4.44.0
701
+ - PyTorch: 2.4.0
702
+ - Accelerate: 0.33.0
703
+ - Datasets: 2.21.0
704
+ - Tokenizers: 0.19.1
705
+
706
+ ## Citation
707
+
708
+ ### BibTeX
709
+
710
+ #### Sentence Transformers
711
+ ```bibtex
712
+ @inproceedings{reimers-2019-sentence-bert,
713
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
714
+ author = "Reimers, Nils and Gurevych, Iryna",
715
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
716
+ month = "11",
717
+ year = "2019",
718
+ publisher = "Association for Computational Linguistics",
719
+ url = "https://arxiv.org/abs/1908.10084",
720
+ }
721
+ ```
722
+
723
+ <!--
724
+ ## Glossary
725
+
726
+ *Clearly define terms in order to be accessible across audiences.*
727
+ -->
728
+
729
+ <!--
730
+ ## Model Card Authors
731
+
732
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
733
+ -->
734
+
735
+ <!--
736
+ ## Model Card Contact
737
+
738
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
739
+ -->
checkpoint-208/added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "[MASK]": 128000
3
+ }
checkpoint-208/config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "bobox/DeBERTa-small-ST-v1-test-step3",
3
+ "architectures": [
4
+ "DebertaV2Model"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 768,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 3072,
12
+ "layer_norm_eps": 1e-07,
13
+ "max_position_embeddings": 512,
14
+ "max_relative_positions": -1,
15
+ "model_type": "deberta-v2",
16
+ "norm_rel_ebd": "layer_norm",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 6,
19
+ "pad_token_id": 0,
20
+ "pooler_dropout": 0,
21
+ "pooler_hidden_act": "gelu",
22
+ "pooler_hidden_size": 768,
23
+ "pos_att_type": [
24
+ "p2c",
25
+ "c2p"
26
+ ],
27
+ "position_biased_input": false,
28
+ "position_buckets": 256,
29
+ "relative_attention": true,
30
+ "share_att_key": true,
31
+ "torch_dtype": "float32",
32
+ "transformers_version": "4.44.0",
33
+ "type_vocab_size": 0,
34
+ "vocab_size": 128100
35
+ }
checkpoint-208/config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.44.0",
5
+ "pytorch": "2.4.0"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
checkpoint-208/modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
checkpoint-208/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f3b86ff4c3efcebfc399dfd271fd1639a9b55c164e61648b7fc7122690538a5
3
+ size 1130520122
checkpoint-208/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa8260eefc98e996b63b68e71176685983c92c02b4f58e4ee350cdaf99e9dd71
3
+ size 565251810
checkpoint-208/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2512fc08615ffa6c7bb0033e5d7a5447bc04a8c002bfa327e13e989292c25b28
3
+ size 14244
checkpoint-208/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:38510ab06b46572fec36821fc3b8a60274b14209e591c36397ec5ec60b192e9e
3
+ size 1064
checkpoint-208/sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
checkpoint-208/special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "[CLS]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "[SEP]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "[MASK]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "[PAD]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "[SEP]",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": true,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
checkpoint-208/spm.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c679fbf93643d19aab7ee10c0b99e460bdbc02fedf34b92b05af343b4af586fd
3
+ size 2464616
checkpoint-208/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-208/tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[CLS]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[SEP]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[UNK]",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128000": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "[CLS]",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "[CLS]",
47
+ "do_lower_case": false,
48
+ "eos_token": "[SEP]",
49
+ "mask_token": "[MASK]",
50
+ "max_length": 512,
51
+ "model_max_length": 512,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "[PAD]",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "[SEP]",
57
+ "sp_model_kwargs": {},
58
+ "split_by_punct": false,
59
+ "stride": 0,
60
+ "tokenizer_class": "DebertaV2Tokenizer",
61
+ "truncation_side": "right",
62
+ "truncation_strategy": "longest_first",
63
+ "unk_token": "[UNK]",
64
+ "vocab_type": "spm"
65
+ }
checkpoint-208/trainer_state.json ADDED
@@ -0,0 +1,2227 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 1.6507936507936507,
5
+ "eval_steps": 5,
6
+ "global_step": 208,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.007936507936507936,
13
+ "grad_norm": 3.5297670364379883,
14
+ "learning_rate": 5.555555555555555e-07,
15
+ "loss": 0.404,
16
+ "step": 1
17
+ },
18
+ {
19
+ "epoch": 0.015873015873015872,
20
+ "grad_norm": 3.6838796138763428,
21
+ "learning_rate": 1.111111111111111e-06,
22
+ "loss": 0.3185,
23
+ "step": 2
24
+ },
25
+ {
26
+ "epoch": 0.023809523809523808,
27
+ "grad_norm": 3.5556721687316895,
28
+ "learning_rate": 1.6666666666666665e-06,
29
+ "loss": 0.2821,
30
+ "step": 3
31
+ },
32
+ {
33
+ "epoch": 0.031746031746031744,
34
+ "grad_norm": 3.922109842300415,
35
+ "learning_rate": 2.222222222222222e-06,
36
+ "loss": 0.4036,
37
+ "step": 4
38
+ },
39
+ {
40
+ "epoch": 0.03968253968253968,
41
+ "grad_norm": 3.9366657733917236,
42
+ "learning_rate": 2.7777777777777775e-06,
43
+ "loss": 0.3442,
44
+ "step": 5
45
+ },
46
+ {
47
+ "epoch": 0.03968253968253968,
48
+ "eval_loss": 0.12529698014259338,
49
+ "eval_runtime": 113.8002,
50
+ "eval_samples_per_second": 26.819,
51
+ "eval_steps_per_second": 0.211,
52
+ "eval_sts-test_pearson_cosine": 0.886081184413048,
53
+ "eval_sts-test_pearson_dot": 0.8767533438290611,
54
+ "eval_sts-test_pearson_euclidean": 0.9080817963557108,
55
+ "eval_sts-test_pearson_manhattan": 0.9087794191320873,
56
+ "eval_sts-test_pearson_max": 0.9087794191320873,
57
+ "eval_sts-test_spearman_cosine": 0.9077787555581409,
58
+ "eval_sts-test_spearman_dot": 0.8792746633711961,
59
+ "eval_sts-test_spearman_euclidean": 0.9039925750881216,
60
+ "eval_sts-test_spearman_manhattan": 0.904489537845873,
61
+ "eval_sts-test_spearman_max": 0.9077787555581409,
62
+ "step": 5
63
+ },
64
+ {
65
+ "epoch": 0.047619047619047616,
66
+ "grad_norm": 3.8135547637939453,
67
+ "learning_rate": 3.333333333333333e-06,
68
+ "loss": 0.4145,
69
+ "step": 6
70
+ },
71
+ {
72
+ "epoch": 0.05555555555555555,
73
+ "grad_norm": 4.132374286651611,
74
+ "learning_rate": 3.888888888888889e-06,
75
+ "loss": 0.4224,
76
+ "step": 7
77
+ },
78
+ {
79
+ "epoch": 0.06349206349206349,
80
+ "grad_norm": 3.9953386783599854,
81
+ "learning_rate": 4.444444444444444e-06,
82
+ "loss": 0.4048,
83
+ "step": 8
84
+ },
85
+ {
86
+ "epoch": 0.07142857142857142,
87
+ "grad_norm": 4.023675918579102,
88
+ "learning_rate": 4.9999999999999996e-06,
89
+ "loss": 0.3899,
90
+ "step": 9
91
+ },
92
+ {
93
+ "epoch": 0.07936507936507936,
94
+ "grad_norm": 3.854191780090332,
95
+ "learning_rate": 5.555555555555555e-06,
96
+ "loss": 0.4127,
97
+ "step": 10
98
+ },
99
+ {
100
+ "epoch": 0.07936507936507936,
101
+ "eval_loss": 0.12369368970394135,
102
+ "eval_runtime": 113.6707,
103
+ "eval_samples_per_second": 26.849,
104
+ "eval_steps_per_second": 0.211,
105
+ "eval_sts-test_pearson_cosine": 0.8860118050647048,
106
+ "eval_sts-test_pearson_dot": 0.8760605933678182,
107
+ "eval_sts-test_pearson_euclidean": 0.9086480781293332,
108
+ "eval_sts-test_pearson_manhattan": 0.9092897840847158,
109
+ "eval_sts-test_pearson_max": 0.9092897840847158,
110
+ "eval_sts-test_spearman_cosine": 0.9078577415344969,
111
+ "eval_sts-test_spearman_dot": 0.8791339654053815,
112
+ "eval_sts-test_spearman_euclidean": 0.9047648028546915,
113
+ "eval_sts-test_spearman_manhattan": 0.9052383607027356,
114
+ "eval_sts-test_spearman_max": 0.9078577415344969,
115
+ "step": 10
116
+ },
117
+ {
118
+ "epoch": 0.0873015873015873,
119
+ "grad_norm": 3.8079540729522705,
120
+ "learning_rate": 6.11111111111111e-06,
121
+ "loss": 0.3496,
122
+ "step": 11
123
+ },
124
+ {
125
+ "epoch": 0.09523809523809523,
126
+ "grad_norm": 3.929018259048462,
127
+ "learning_rate": 6.666666666666666e-06,
128
+ "loss": 0.3731,
129
+ "step": 12
130
+ },
131
+ {
132
+ "epoch": 0.10317460317460317,
133
+ "grad_norm": 4.284013271331787,
134
+ "learning_rate": 7.222222222222221e-06,
135
+ "loss": 0.3929,
136
+ "step": 13
137
+ },
138
+ {
139
+ "epoch": 0.1111111111111111,
140
+ "grad_norm": 3.3490402698516846,
141
+ "learning_rate": 7.777777777777777e-06,
142
+ "loss": 0.2957,
143
+ "step": 14
144
+ },
145
+ {
146
+ "epoch": 0.11904761904761904,
147
+ "grad_norm": 3.553280830383301,
148
+ "learning_rate": 8.333333333333332e-06,
149
+ "loss": 0.3324,
150
+ "step": 15
151
+ },
152
+ {
153
+ "epoch": 0.11904761904761904,
154
+ "eval_loss": 0.12056715041399002,
155
+ "eval_runtime": 113.718,
156
+ "eval_samples_per_second": 26.838,
157
+ "eval_steps_per_second": 0.211,
158
+ "eval_sts-test_pearson_cosine": 0.8856265458568289,
159
+ "eval_sts-test_pearson_dot": 0.8743050518330721,
160
+ "eval_sts-test_pearson_euclidean": 0.9095228583162331,
161
+ "eval_sts-test_pearson_manhattan": 0.9101600217218586,
162
+ "eval_sts-test_pearson_max": 0.9101600217218586,
163
+ "eval_sts-test_spearman_cosine": 0.908261263658463,
164
+ "eval_sts-test_spearman_dot": 0.87867141636764,
165
+ "eval_sts-test_spearman_euclidean": 0.9060734192402989,
166
+ "eval_sts-test_spearman_manhattan": 0.9066336155303966,
167
+ "eval_sts-test_spearman_max": 0.908261263658463,
168
+ "step": 15
169
+ },
170
+ {
171
+ "epoch": 0.12698412698412698,
172
+ "grad_norm": 3.6310322284698486,
173
+ "learning_rate": 8.888888888888888e-06,
174
+ "loss": 0.3341,
175
+ "step": 16
176
+ },
177
+ {
178
+ "epoch": 0.1349206349206349,
179
+ "grad_norm": 3.6535122394561768,
180
+ "learning_rate": 9.444444444444443e-06,
181
+ "loss": 0.3466,
182
+ "step": 17
183
+ },
184
+ {
185
+ "epoch": 0.14285714285714285,
186
+ "grad_norm": 3.6199331283569336,
187
+ "learning_rate": 9.999999999999999e-06,
188
+ "loss": 0.3558,
189
+ "step": 18
190
+ },
191
+ {
192
+ "epoch": 0.15079365079365079,
193
+ "grad_norm": 3.089895248413086,
194
+ "learning_rate": 1.0555555555555554e-05,
195
+ "loss": 0.2634,
196
+ "step": 19
197
+ },
198
+ {
199
+ "epoch": 0.15873015873015872,
200
+ "grad_norm": 3.320916175842285,
201
+ "learning_rate": 1.111111111111111e-05,
202
+ "loss": 0.3095,
203
+ "step": 20
204
+ },
205
+ {
206
+ "epoch": 0.15873015873015872,
207
+ "eval_loss": 0.11563990265130997,
208
+ "eval_runtime": 113.5377,
209
+ "eval_samples_per_second": 26.881,
210
+ "eval_steps_per_second": 0.211,
211
+ "eval_sts-test_pearson_cosine": 0.8848740042612456,
212
+ "eval_sts-test_pearson_dot": 0.8724689429546052,
213
+ "eval_sts-test_pearson_euclidean": 0.9104294765782397,
214
+ "eval_sts-test_pearson_manhattan": 0.9111381492292419,
215
+ "eval_sts-test_pearson_max": 0.9111381492292419,
216
+ "eval_sts-test_spearman_cosine": 0.9087803335393421,
217
+ "eval_sts-test_spearman_dot": 0.8777188410176626,
218
+ "eval_sts-test_spearman_euclidean": 0.9069791847708608,
219
+ "eval_sts-test_spearman_manhattan": 0.9078148698260838,
220
+ "eval_sts-test_spearman_max": 0.9087803335393421,
221
+ "step": 20
222
+ },
223
+ {
224
+ "epoch": 0.16666666666666666,
225
+ "grad_norm": 3.0193159580230713,
226
+ "learning_rate": 1.1666666666666665e-05,
227
+ "loss": 0.2973,
228
+ "step": 21
229
+ },
230
+ {
231
+ "epoch": 0.1746031746031746,
232
+ "grad_norm": 3.3553476333618164,
233
+ "learning_rate": 1.222222222222222e-05,
234
+ "loss": 0.2884,
235
+ "step": 22
236
+ },
237
+ {
238
+ "epoch": 0.18253968253968253,
239
+ "grad_norm": 3.5176496505737305,
240
+ "learning_rate": 1.2777777777777775e-05,
241
+ "loss": 0.3697,
242
+ "step": 23
243
+ },
244
+ {
245
+ "epoch": 0.19047619047619047,
246
+ "grad_norm": 3.2073943614959717,
247
+ "learning_rate": 1.3333333333333332e-05,
248
+ "loss": 0.2683,
249
+ "step": 24
250
+ },
251
+ {
252
+ "epoch": 0.1984126984126984,
253
+ "grad_norm": 3.2101964950561523,
254
+ "learning_rate": 1.3888888888888886e-05,
255
+ "loss": 0.3026,
256
+ "step": 25
257
+ },
258
+ {
259
+ "epoch": 0.1984126984126984,
260
+ "eval_loss": 0.10958973318338394,
261
+ "eval_runtime": 113.6214,
262
+ "eval_samples_per_second": 26.861,
263
+ "eval_steps_per_second": 0.211,
264
+ "eval_sts-test_pearson_cosine": 0.8832622086480311,
265
+ "eval_sts-test_pearson_dot": 0.8697582354953435,
266
+ "eval_sts-test_pearson_euclidean": 0.9107566690862425,
267
+ "eval_sts-test_pearson_manhattan": 0.9115546986654615,
268
+ "eval_sts-test_pearson_max": 0.9115546986654615,
269
+ "eval_sts-test_spearman_cosine": 0.9087605087305455,
270
+ "eval_sts-test_spearman_dot": 0.8760767382321666,
271
+ "eval_sts-test_spearman_euclidean": 0.9073999361304628,
272
+ "eval_sts-test_spearman_manhattan": 0.9084107328715103,
273
+ "eval_sts-test_spearman_max": 0.9087605087305455,
274
+ "step": 25
275
+ },
276
+ {
277
+ "epoch": 0.20634920634920634,
278
+ "grad_norm": 2.84037709236145,
279
+ "learning_rate": 1.4444444444444442e-05,
280
+ "loss": 0.2441,
281
+ "step": 26
282
+ },
283
+ {
284
+ "epoch": 0.21428571428571427,
285
+ "grad_norm": 3.3099992275238037,
286
+ "learning_rate": 1.4999999999999999e-05,
287
+ "loss": 0.3145,
288
+ "step": 27
289
+ },
290
+ {
291
+ "epoch": 0.2222222222222222,
292
+ "grad_norm": 3.061953067779541,
293
+ "learning_rate": 1.5555555555555555e-05,
294
+ "loss": 0.3119,
295
+ "step": 28
296
+ },
297
+ {
298
+ "epoch": 0.23015873015873015,
299
+ "grad_norm": 3.0163729190826416,
300
+ "learning_rate": 1.6111111111111108e-05,
301
+ "loss": 0.2766,
302
+ "step": 29
303
+ },
304
+ {
305
+ "epoch": 0.23809523809523808,
306
+ "grad_norm": 3.140418291091919,
307
+ "learning_rate": 1.6666666666666664e-05,
308
+ "loss": 0.3343,
309
+ "step": 30
310
+ },
311
+ {
312
+ "epoch": 0.23809523809523808,
313
+ "eval_loss": 0.10535401105880737,
314
+ "eval_runtime": 113.5942,
315
+ "eval_samples_per_second": 26.868,
316
+ "eval_steps_per_second": 0.211,
317
+ "eval_sts-test_pearson_cosine": 0.8819465403802665,
318
+ "eval_sts-test_pearson_dot": 0.866997957398371,
319
+ "eval_sts-test_pearson_euclidean": 0.9110501477101954,
320
+ "eval_sts-test_pearson_manhattan": 0.9119047974126511,
321
+ "eval_sts-test_pearson_max": 0.9119047974126511,
322
+ "eval_sts-test_spearman_cosine": 0.9084358383291508,
323
+ "eval_sts-test_spearman_dot": 0.8727757956894143,
324
+ "eval_sts-test_spearman_euclidean": 0.9077817538926543,
325
+ "eval_sts-test_spearman_manhattan": 0.9089103807049453,
326
+ "eval_sts-test_spearman_max": 0.9089103807049453,
327
+ "step": 30
328
+ },
329
+ {
330
+ "epoch": 0.24603174603174602,
331
+ "grad_norm": 3.1329221725463867,
332
+ "learning_rate": 1.722222222222222e-05,
333
+ "loss": 0.344,
334
+ "step": 31
335
+ },
336
+ {
337
+ "epoch": 0.25396825396825395,
338
+ "grad_norm": 2.9861748218536377,
339
+ "learning_rate": 1.7777777777777777e-05,
340
+ "loss": 0.3005,
341
+ "step": 32
342
+ },
343
+ {
344
+ "epoch": 0.2619047619047619,
345
+ "grad_norm": 2.8316733837127686,
346
+ "learning_rate": 1.8333333333333333e-05,
347
+ "loss": 0.2526,
348
+ "step": 33
349
+ },
350
+ {
351
+ "epoch": 0.2698412698412698,
352
+ "grad_norm": 2.8335487842559814,
353
+ "learning_rate": 1.8888888888888886e-05,
354
+ "loss": 0.2422,
355
+ "step": 34
356
+ },
357
+ {
358
+ "epoch": 0.2777777777777778,
359
+ "grad_norm": 3.0785422325134277,
360
+ "learning_rate": 1.9444444444444442e-05,
361
+ "loss": 0.3447,
362
+ "step": 35
363
+ },
364
+ {
365
+ "epoch": 0.2777777777777778,
366
+ "eval_loss": 0.10223711282014847,
367
+ "eval_runtime": 113.9847,
368
+ "eval_samples_per_second": 26.776,
369
+ "eval_steps_per_second": 0.211,
370
+ "eval_sts-test_pearson_cosine": 0.8812001280334643,
371
+ "eval_sts-test_pearson_dot": 0.8652746969985129,
372
+ "eval_sts-test_pearson_euclidean": 0.9105701789873448,
373
+ "eval_sts-test_pearson_manhattan": 0.9116177887236803,
374
+ "eval_sts-test_pearson_max": 0.9116177887236803,
375
+ "eval_sts-test_spearman_cosine": 0.9072243769320245,
376
+ "eval_sts-test_spearman_dot": 0.8716048789351082,
377
+ "eval_sts-test_spearman_euclidean": 0.9073166540330135,
378
+ "eval_sts-test_spearman_manhattan": 0.9081332302996223,
379
+ "eval_sts-test_spearman_max": 0.9081332302996223,
380
+ "step": 35
381
+ },
382
+ {
383
+ "epoch": 0.2857142857142857,
384
+ "grad_norm": 2.944396734237671,
385
+ "learning_rate": 1.9999999999999998e-05,
386
+ "loss": 0.2809,
387
+ "step": 36
388
+ },
389
+ {
390
+ "epoch": 0.29365079365079366,
391
+ "grad_norm": 2.8323400020599365,
392
+ "learning_rate": 2.0555555555555555e-05,
393
+ "loss": 0.2836,
394
+ "step": 37
395
+ },
396
+ {
397
+ "epoch": 0.30158730158730157,
398
+ "grad_norm": 2.8760273456573486,
399
+ "learning_rate": 2.1111111111111107e-05,
400
+ "loss": 0.2878,
401
+ "step": 38
402
+ },
403
+ {
404
+ "epoch": 0.30952380952380953,
405
+ "grad_norm": 2.744379758834839,
406
+ "learning_rate": 2.1666666666666667e-05,
407
+ "loss": 0.2738,
408
+ "step": 39
409
+ },
410
+ {
411
+ "epoch": 0.31746031746031744,
412
+ "grad_norm": 2.8519983291625977,
413
+ "learning_rate": 2.222222222222222e-05,
414
+ "loss": 0.2806,
415
+ "step": 40
416
+ },
417
+ {
418
+ "epoch": 0.31746031746031744,
419
+ "eval_loss": 0.10033170133829117,
420
+ "eval_runtime": 113.5147,
421
+ "eval_samples_per_second": 26.886,
422
+ "eval_steps_per_second": 0.211,
423
+ "eval_sts-test_pearson_cosine": 0.8802115569848467,
424
+ "eval_sts-test_pearson_dot": 0.8634798448575132,
425
+ "eval_sts-test_pearson_euclidean": 0.9094188438238102,
426
+ "eval_sts-test_pearson_manhattan": 0.9105849471172345,
427
+ "eval_sts-test_pearson_max": 0.9105849471172345,
428
+ "eval_sts-test_spearman_cosine": 0.9064710789490229,
429
+ "eval_sts-test_spearman_dot": 0.8693704037025742,
430
+ "eval_sts-test_spearman_euclidean": 0.9064271779615981,
431
+ "eval_sts-test_spearman_manhattan": 0.9073247092600637,
432
+ "eval_sts-test_spearman_max": 0.9073247092600637,
433
+ "step": 40
434
+ },
435
+ {
436
+ "epoch": 0.3253968253968254,
437
+ "grad_norm": 2.9139747619628906,
438
+ "learning_rate": 2.2777777777777776e-05,
439
+ "loss": 0.2797,
440
+ "step": 41
441
+ },
442
+ {
443
+ "epoch": 0.3333333333333333,
444
+ "grad_norm": 2.9206557273864746,
445
+ "learning_rate": 2.333333333333333e-05,
446
+ "loss": 0.3217,
447
+ "step": 42
448
+ },
449
+ {
450
+ "epoch": 0.3412698412698413,
451
+ "grad_norm": 2.755398988723755,
452
+ "learning_rate": 2.388888888888889e-05,
453
+ "loss": 0.2544,
454
+ "step": 43
455
+ },
456
+ {
457
+ "epoch": 0.3492063492063492,
458
+ "grad_norm": 3.0441982746124268,
459
+ "learning_rate": 2.444444444444444e-05,
460
+ "loss": 0.3203,
461
+ "step": 44
462
+ },
463
+ {
464
+ "epoch": 0.35714285714285715,
465
+ "grad_norm": 2.978891611099243,
466
+ "learning_rate": 2.4999999999999998e-05,
467
+ "loss": 0.2987,
468
+ "step": 45
469
+ },
470
+ {
471
+ "epoch": 0.35714285714285715,
472
+ "eval_loss": 0.09902294725179672,
473
+ "eval_runtime": 113.6912,
474
+ "eval_samples_per_second": 26.845,
475
+ "eval_steps_per_second": 0.211,
476
+ "eval_sts-test_pearson_cosine": 0.8796209948380269,
477
+ "eval_sts-test_pearson_dot": 0.8617122615494917,
478
+ "eval_sts-test_pearson_euclidean": 0.9092272396432914,
479
+ "eval_sts-test_pearson_manhattan": 0.9100341993020892,
480
+ "eval_sts-test_pearson_max": 0.9100341993020892,
481
+ "eval_sts-test_spearman_cosine": 0.9063911531961779,
482
+ "eval_sts-test_spearman_dot": 0.867835166929281,
483
+ "eval_sts-test_spearman_euclidean": 0.9066020658911155,
484
+ "eval_sts-test_spearman_manhattan": 0.9072894005148261,
485
+ "eval_sts-test_spearman_max": 0.9072894005148261,
486
+ "step": 45
487
+ },
488
+ {
489
+ "epoch": 0.36507936507936506,
490
+ "grad_norm": 2.9183595180511475,
491
+ "learning_rate": 2.555555555555555e-05,
492
+ "loss": 0.2765,
493
+ "step": 46
494
+ },
495
+ {
496
+ "epoch": 0.373015873015873,
497
+ "grad_norm": 2.960238456726074,
498
+ "learning_rate": 2.611111111111111e-05,
499
+ "loss": 0.2716,
500
+ "step": 47
501
+ },
502
+ {
503
+ "epoch": 0.38095238095238093,
504
+ "grad_norm": 3.23356294631958,
505
+ "learning_rate": 2.6666666666666663e-05,
506
+ "loss": 0.3726,
507
+ "step": 48
508
+ },
509
+ {
510
+ "epoch": 0.3888888888888889,
511
+ "grad_norm": 2.974705457687378,
512
+ "learning_rate": 2.722222222222222e-05,
513
+ "loss": 0.2963,
514
+ "step": 49
515
+ },
516
+ {
517
+ "epoch": 0.3968253968253968,
518
+ "grad_norm": 2.8041574954986572,
519
+ "learning_rate": 2.7777777777777772e-05,
520
+ "loss": 0.2784,
521
+ "step": 50
522
+ },
523
+ {
524
+ "epoch": 0.3968253968253968,
525
+ "eval_loss": 0.09521521627902985,
526
+ "eval_runtime": 113.6139,
527
+ "eval_samples_per_second": 26.863,
528
+ "eval_steps_per_second": 0.211,
529
+ "eval_sts-test_pearson_cosine": 0.8802451373465323,
530
+ "eval_sts-test_pearson_dot": 0.8609764645232105,
531
+ "eval_sts-test_pearson_euclidean": 0.9103012041260427,
532
+ "eval_sts-test_pearson_manhattan": 0.9108880877390901,
533
+ "eval_sts-test_pearson_max": 0.9108880877390901,
534
+ "eval_sts-test_spearman_cosine": 0.9071928272927434,
535
+ "eval_sts-test_spearman_dot": 0.867374407941995,
536
+ "eval_sts-test_spearman_euclidean": 0.9083242734345022,
537
+ "eval_sts-test_spearman_manhattan": 0.9086424996542565,
538
+ "eval_sts-test_spearman_max": 0.9086424996542565,
539
+ "step": 50
540
+ },
541
+ {
542
+ "epoch": 0.40476190476190477,
543
+ "grad_norm": 2.6451456546783447,
544
+ "learning_rate": 2.8333333333333332e-05,
545
+ "loss": 0.2437,
546
+ "step": 51
547
+ },
548
+ {
549
+ "epoch": 0.4126984126984127,
550
+ "grad_norm": 2.7020044326782227,
551
+ "learning_rate": 2.8888888888888885e-05,
552
+ "loss": 0.2258,
553
+ "step": 52
554
+ },
555
+ {
556
+ "epoch": 0.42063492063492064,
557
+ "grad_norm": 2.7229156494140625,
558
+ "learning_rate": 2.944444444444444e-05,
559
+ "loss": 0.2821,
560
+ "step": 53
561
+ },
562
+ {
563
+ "epoch": 0.42857142857142855,
564
+ "grad_norm": 2.770799398422241,
565
+ "learning_rate": 2.9999999999999997e-05,
566
+ "loss": 0.249,
567
+ "step": 54
568
+ },
569
+ {
570
+ "epoch": 0.4365079365079365,
571
+ "grad_norm": 2.762690305709839,
572
+ "learning_rate": 3.0555555555555554e-05,
573
+ "loss": 0.2813,
574
+ "step": 55
575
+ },
576
+ {
577
+ "epoch": 0.4365079365079365,
578
+ "eval_loss": 0.09280610829591751,
579
+ "eval_runtime": 113.4966,
580
+ "eval_samples_per_second": 26.891,
581
+ "eval_steps_per_second": 0.211,
582
+ "eval_sts-test_pearson_cosine": 0.8804507408393794,
583
+ "eval_sts-test_pearson_dot": 0.8631869703781383,
584
+ "eval_sts-test_pearson_euclidean": 0.9108211341698824,
585
+ "eval_sts-test_pearson_manhattan": 0.9114068237803576,
586
+ "eval_sts-test_pearson_max": 0.9114068237803576,
587
+ "eval_sts-test_spearman_cosine": 0.9079720810073518,
588
+ "eval_sts-test_spearman_dot": 0.8709471248951776,
589
+ "eval_sts-test_spearman_euclidean": 0.9085633794241165,
590
+ "eval_sts-test_spearman_manhattan": 0.9093315348258998,
591
+ "eval_sts-test_spearman_max": 0.9093315348258998,
592
+ "step": 55
593
+ },
594
+ {
595
+ "epoch": 0.4444444444444444,
596
+ "grad_norm": 2.9767086505889893,
597
+ "learning_rate": 3.111111111111111e-05,
598
+ "loss": 0.3003,
599
+ "step": 56
600
+ },
601
+ {
602
+ "epoch": 0.4523809523809524,
603
+ "grad_norm": 2.816253185272217,
604
+ "learning_rate": 3.1666666666666666e-05,
605
+ "loss": 0.2812,
606
+ "step": 57
607
+ },
608
+ {
609
+ "epoch": 0.4603174603174603,
610
+ "grad_norm": 2.5184807777404785,
611
+ "learning_rate": 3.2222222222222216e-05,
612
+ "loss": 0.2619,
613
+ "step": 58
614
+ },
615
+ {
616
+ "epoch": 0.46825396825396826,
617
+ "grad_norm": 2.7500715255737305,
618
+ "learning_rate": 3.277777777777777e-05,
619
+ "loss": 0.299,
620
+ "step": 59
621
+ },
622
+ {
623
+ "epoch": 0.47619047619047616,
624
+ "grad_norm": 2.5309386253356934,
625
+ "learning_rate": 3.333333333333333e-05,
626
+ "loss": 0.2706,
627
+ "step": 60
628
+ },
629
+ {
630
+ "epoch": 0.47619047619047616,
631
+ "eval_loss": 0.09274312108755112,
632
+ "eval_runtime": 113.479,
633
+ "eval_samples_per_second": 26.895,
634
+ "eval_steps_per_second": 0.211,
635
+ "eval_sts-test_pearson_cosine": 0.8814996628266308,
636
+ "eval_sts-test_pearson_dot": 0.8647617194348185,
637
+ "eval_sts-test_pearson_euclidean": 0.9116395612568413,
638
+ "eval_sts-test_pearson_manhattan": 0.9121591417317261,
639
+ "eval_sts-test_pearson_max": 0.9121591417317261,
640
+ "eval_sts-test_spearman_cosine": 0.9087614932582961,
641
+ "eval_sts-test_spearman_dot": 0.8732032149869635,
642
+ "eval_sts-test_spearman_euclidean": 0.9101066714244602,
643
+ "eval_sts-test_spearman_manhattan": 0.9099515188012163,
644
+ "eval_sts-test_spearman_max": 0.9101066714244602,
645
+ "step": 60
646
+ },
647
+ {
648
+ "epoch": 0.48412698412698413,
649
+ "grad_norm": 2.7175261974334717,
650
+ "learning_rate": 3.3888888888888884e-05,
651
+ "loss": 0.297,
652
+ "step": 61
653
+ },
654
+ {
655
+ "epoch": 0.49206349206349204,
656
+ "grad_norm": 2.7492423057556152,
657
+ "learning_rate": 3.444444444444444e-05,
658
+ "loss": 0.2906,
659
+ "step": 62
660
+ },
661
+ {
662
+ "epoch": 0.5,
663
+ "grad_norm": 2.815702438354492,
664
+ "learning_rate": 3.5e-05,
665
+ "loss": 0.2914,
666
+ "step": 63
667
+ },
668
+ {
669
+ "epoch": 0.5079365079365079,
670
+ "grad_norm": 2.9056921005249023,
671
+ "learning_rate": 3.499798538091195e-05,
672
+ "loss": 0.2669,
673
+ "step": 64
674
+ },
675
+ {
676
+ "epoch": 0.5158730158730159,
677
+ "grad_norm": 2.832461357116699,
678
+ "learning_rate": 3.4991942080268184e-05,
679
+ "loss": 0.2723,
680
+ "step": 65
681
+ },
682
+ {
683
+ "epoch": 0.5158730158730159,
684
+ "eval_loss": 0.09455278515815735,
685
+ "eval_runtime": 113.5618,
686
+ "eval_samples_per_second": 26.875,
687
+ "eval_steps_per_second": 0.211,
688
+ "eval_sts-test_pearson_cosine": 0.8827592572843797,
689
+ "eval_sts-test_pearson_dot": 0.8655702748779494,
690
+ "eval_sts-test_pearson_euclidean": 0.9124138196335778,
691
+ "eval_sts-test_pearson_manhattan": 0.9124858955018784,
692
+ "eval_sts-test_pearson_max": 0.9124858955018784,
693
+ "eval_sts-test_spearman_cosine": 0.9092536676310787,
694
+ "eval_sts-test_spearman_dot": 0.87468645079452,
695
+ "eval_sts-test_spearman_euclidean": 0.910149408879089,
696
+ "eval_sts-test_spearman_manhattan": 0.9104867886387189,
697
+ "eval_sts-test_spearman_max": 0.9104867886387189,
698
+ "step": 65
699
+ },
700
+ {
701
+ "epoch": 0.5238095238095238,
702
+ "grad_norm": 2.834491729736328,
703
+ "learning_rate": 3.4981871767775944e-05,
704
+ "loss": 0.3194,
705
+ "step": 66
706
+ },
707
+ {
708
+ "epoch": 0.5317460317460317,
709
+ "grad_norm": 3.168403148651123,
710
+ "learning_rate": 3.496777722576811e-05,
711
+ "loss": 0.3585,
712
+ "step": 67
713
+ },
714
+ {
715
+ "epoch": 0.5396825396825397,
716
+ "grad_norm": 2.8590433597564697,
717
+ "learning_rate": 3.494966234843439e-05,
718
+ "loss": 0.2843,
719
+ "step": 68
720
+ },
721
+ {
722
+ "epoch": 0.5476190476190477,
723
+ "grad_norm": 2.4585649967193604,
724
+ "learning_rate": 3.4927532140745435e-05,
725
+ "loss": 0.1916,
726
+ "step": 69
727
+ },
728
+ {
729
+ "epoch": 0.5555555555555556,
730
+ "grad_norm": 3.0862460136413574,
731
+ "learning_rate": 3.490139271707e-05,
732
+ "loss": 0.351,
733
+ "step": 70
734
+ },
735
+ {
736
+ "epoch": 0.5555555555555556,
737
+ "eval_loss": 0.09706800431013107,
738
+ "eval_runtime": 113.496,
739
+ "eval_samples_per_second": 26.891,
740
+ "eval_steps_per_second": 0.211,
741
+ "eval_sts-test_pearson_cosine": 0.8816183440112817,
742
+ "eval_sts-test_pearson_dot": 0.863407251078466,
743
+ "eval_sts-test_pearson_euclidean": 0.9125994563651346,
744
+ "eval_sts-test_pearson_manhattan": 0.9121928260729458,
745
+ "eval_sts-test_pearson_max": 0.9125994563651346,
746
+ "eval_sts-test_spearman_cosine": 0.9103631836274073,
747
+ "eval_sts-test_spearman_dot": 0.8729154643762167,
748
+ "eval_sts-test_spearman_euclidean": 0.9106339755374351,
749
+ "eval_sts-test_spearman_manhattan": 0.9104940383430642,
750
+ "eval_sts-test_spearman_max": 0.9106339755374351,
751
+ "step": 70
752
+ },
753
+ {
754
+ "epoch": 0.5634920634920635,
755
+ "grad_norm": 2.948397636413574,
756
+ "learning_rate": 3.48712512994856e-05,
757
+ "loss": 0.3105,
758
+ "step": 71
759
+ },
760
+ {
761
+ "epoch": 0.5714285714285714,
762
+ "grad_norm": 2.904085159301758,
763
+ "learning_rate": 3.4837116215783116e-05,
764
+ "loss": 0.2847,
765
+ "step": 72
766
+ },
767
+ {
768
+ "epoch": 0.5793650793650794,
769
+ "grad_norm": 2.6948978900909424,
770
+ "learning_rate": 3.4798996897165926e-05,
771
+ "loss": 0.2641,
772
+ "step": 73
773
+ },
774
+ {
775
+ "epoch": 0.5873015873015873,
776
+ "grad_norm": 3.068554162979126,
777
+ "learning_rate": 3.475690387564411e-05,
778
+ "loss": 0.3305,
779
+ "step": 74
780
+ },
781
+ {
782
+ "epoch": 0.5952380952380952,
783
+ "grad_norm": 2.6903178691864014,
784
+ "learning_rate": 3.471084878112459e-05,
785
+ "loss": 0.2461,
786
+ "step": 75
787
+ },
788
+ {
789
+ "epoch": 0.5952380952380952,
790
+ "eval_loss": 0.09646341949701309,
791
+ "eval_runtime": 113.5342,
792
+ "eval_samples_per_second": 26.882,
793
+ "eval_steps_per_second": 0.211,
794
+ "eval_sts-test_pearson_cosine": 0.879746728283104,
795
+ "eval_sts-test_pearson_dot": 0.85998475002447,
796
+ "eval_sts-test_pearson_euclidean": 0.9117602609729114,
797
+ "eval_sts-test_pearson_manhattan": 0.9111396965114745,
798
+ "eval_sts-test_pearson_max": 0.9117602609729114,
799
+ "eval_sts-test_spearman_cosine": 0.9096228207862964,
800
+ "eval_sts-test_spearman_dot": 0.8689540379665887,
801
+ "eval_sts-test_spearman_euclidean": 0.9099527718365351,
802
+ "eval_sts-test_spearman_manhattan": 0.9098263942743658,
803
+ "eval_sts-test_spearman_max": 0.9099527718365351,
804
+ "step": 75
805
+ },
806
+ {
807
+ "epoch": 0.6031746031746031,
808
+ "grad_norm": 2.81105637550354,
809
+ "learning_rate": 3.4660844338197886e-05,
810
+ "loss": 0.259,
811
+ "step": 76
812
+ },
813
+ {
814
+ "epoch": 0.6111111111111112,
815
+ "grad_norm": 2.629365921020508,
816
+ "learning_rate": 3.460690436262242e-05,
817
+ "loss": 0.2506,
818
+ "step": 77
819
+ },
820
+ {
821
+ "epoch": 0.6190476190476191,
822
+ "grad_norm": 2.6665291786193848,
823
+ "learning_rate": 3.454904375750738e-05,
824
+ "loss": 0.2832,
825
+ "step": 78
826
+ },
827
+ {
828
+ "epoch": 0.626984126984127,
829
+ "grad_norm": 2.916246175765991,
830
+ "learning_rate": 3.448727850919509e-05,
831
+ "loss": 0.3322,
832
+ "step": 79
833
+ },
834
+ {
835
+ "epoch": 0.6349206349206349,
836
+ "grad_norm": 2.4879415035247803,
837
+ "learning_rate": 3.442162568284416e-05,
838
+ "loss": 0.2533,
839
+ "step": 80
840
+ },
841
+ {
842
+ "epoch": 0.6349206349206349,
843
+ "eval_loss": 0.10007175803184509,
844
+ "eval_runtime": 113.3295,
845
+ "eval_samples_per_second": 26.93,
846
+ "eval_steps_per_second": 0.212,
847
+ "eval_sts-test_pearson_cosine": 0.8791063595826033,
848
+ "eval_sts-test_pearson_dot": 0.8594763353424633,
849
+ "eval_sts-test_pearson_euclidean": 0.9109289279488433,
850
+ "eval_sts-test_pearson_manhattan": 0.9101783025650423,
851
+ "eval_sts-test_pearson_max": 0.9109289279488433,
852
+ "eval_sts-test_spearman_cosine": 0.9088725211378084,
853
+ "eval_sts-test_spearman_dot": 0.8680133664521414,
854
+ "eval_sts-test_spearman_euclidean": 0.9091277823327847,
855
+ "eval_sts-test_spearman_manhattan": 0.9091334209917199,
856
+ "eval_sts-test_spearman_max": 0.9091334209917199,
857
+ "step": 80
858
+ },
859
+ {
860
+ "epoch": 0.6428571428571429,
861
+ "grad_norm": 2.6558098793029785,
862
+ "learning_rate": 3.435210341771455e-05,
863
+ "loss": 0.2349,
864
+ "step": 81
865
+ },
866
+ {
867
+ "epoch": 0.6507936507936508,
868
+ "grad_norm": 2.690624475479126,
869
+ "learning_rate": 3.427873092215584e-05,
870
+ "loss": 0.2748,
871
+ "step": 82
872
+ },
873
+ {
874
+ "epoch": 0.6587301587301587,
875
+ "grad_norm": 2.451726198196411,
876
+ "learning_rate": 3.420152846830015e-05,
877
+ "loss": 0.223,
878
+ "step": 83
879
+ },
880
+ {
881
+ "epoch": 0.6666666666666666,
882
+ "grad_norm": 2.6376216411590576,
883
+ "learning_rate": 3.412051738646116e-05,
884
+ "loss": 0.2416,
885
+ "step": 84
886
+ },
887
+ {
888
+ "epoch": 0.6746031746031746,
889
+ "grad_norm": 2.8111939430236816,
890
+ "learning_rate": 3.403572005924071e-05,
891
+ "loss": 0.2637,
892
+ "step": 85
893
+ },
894
+ {
895
+ "epoch": 0.6746031746031746,
896
+ "eval_loss": 0.10335631668567657,
897
+ "eval_runtime": 113.4166,
898
+ "eval_samples_per_second": 26.91,
899
+ "eval_steps_per_second": 0.212,
900
+ "eval_sts-test_pearson_cosine": 0.8779388329417936,
901
+ "eval_sts-test_pearson_dot": 0.8608493769098732,
902
+ "eval_sts-test_pearson_euclidean": 0.9095252832629803,
903
+ "eval_sts-test_pearson_manhattan": 0.9090695197203245,
904
+ "eval_sts-test_pearson_max": 0.9095252832629803,
905
+ "eval_sts-test_spearman_cosine": 0.9082387985252446,
906
+ "eval_sts-test_spearman_dot": 0.8707913010030126,
907
+ "eval_sts-test_spearman_euclidean": 0.9083403391373417,
908
+ "eval_sts-test_spearman_manhattan": 0.9084906586243554,
909
+ "eval_sts-test_spearman_max": 0.9084906586243554,
910
+ "step": 85
911
+ },
912
+ {
913
+ "epoch": 0.6825396825396826,
914
+ "grad_norm": 2.859077215194702,
915
+ "learning_rate": 3.394715991534474e-05,
916
+ "loss": 0.2856,
917
+ "step": 86
918
+ },
919
+ {
920
+ "epoch": 0.6904761904761905,
921
+ "grad_norm": 2.433560371398926,
922
+ "learning_rate": 3.385486142311011e-05,
923
+ "loss": 0.2476,
924
+ "step": 87
925
+ },
926
+ {
927
+ "epoch": 0.6984126984126984,
928
+ "grad_norm": 2.6791834831237793,
929
+ "learning_rate": 3.375885008374425e-05,
930
+ "loss": 0.2427,
931
+ "step": 88
932
+ },
933
+ {
934
+ "epoch": 0.7063492063492064,
935
+ "grad_norm": 2.6574490070343018,
936
+ "learning_rate": 3.365915242427944e-05,
937
+ "loss": 0.2614,
938
+ "step": 89
939
+ },
940
+ {
941
+ "epoch": 0.7142857142857143,
942
+ "grad_norm": 2.5747766494750977,
943
+ "learning_rate": 3.355579599024361e-05,
944
+ "loss": 0.26,
945
+ "step": 90
946
+ },
947
+ {
948
+ "epoch": 0.7142857142857143,
949
+ "eval_loss": 0.10315236449241638,
950
+ "eval_runtime": 113.4044,
951
+ "eval_samples_per_second": 26.913,
952
+ "eval_steps_per_second": 0.212,
953
+ "eval_sts-test_pearson_cosine": 0.8793659111713259,
954
+ "eval_sts-test_pearson_dot": 0.8641308245754843,
955
+ "eval_sts-test_pearson_euclidean": 0.9095961426218309,
956
+ "eval_sts-test_pearson_manhattan": 0.9093977382821561,
957
+ "eval_sts-test_pearson_max": 0.9095961426218309,
958
+ "eval_sts-test_spearman_cosine": 0.9087700407492219,
959
+ "eval_sts-test_spearman_dot": 0.8756799287974091,
960
+ "eval_sts-test_spearman_euclidean": 0.9084703415516837,
961
+ "eval_sts-test_spearman_manhattan": 0.908642678659302,
962
+ "eval_sts-test_spearman_max": 0.9087700407492219,
963
+ "step": 90
964
+ },
965
+ {
966
+ "epoch": 0.7222222222222222,
967
+ "grad_norm": 2.3570423126220703,
968
+ "learning_rate": 3.3448809338049753e-05,
969
+ "loss": 0.1862,
970
+ "step": 91
971
+ },
972
+ {
973
+ "epoch": 0.7301587301587301,
974
+ "grad_norm": 2.486401319503784,
975
+ "learning_rate": 3.333822202710612e-05,
976
+ "loss": 0.267,
977
+ "step": 92
978
+ },
979
+ {
980
+ "epoch": 0.7380952380952381,
981
+ "grad_norm": 2.436018705368042,
982
+ "learning_rate": 3.322406461164916e-05,
983
+ "loss": 0.2175,
984
+ "step": 93
985
+ },
986
+ {
987
+ "epoch": 0.746031746031746,
988
+ "grad_norm": 2.2685282230377197,
989
+ "learning_rate": 3.310636863230172e-05,
990
+ "loss": 0.2079,
991
+ "step": 94
992
+ },
993
+ {
994
+ "epoch": 0.753968253968254,
995
+ "grad_norm": 2.564317464828491,
996
+ "learning_rate": 3.2985166607358637e-05,
997
+ "loss": 0.2562,
998
+ "step": 95
999
+ },
1000
+ {
1001
+ "epoch": 0.753968253968254,
1002
+ "eval_loss": 0.09990089386701584,
1003
+ "eval_runtime": 113.4641,
1004
+ "eval_samples_per_second": 26.898,
1005
+ "eval_steps_per_second": 0.212,
1006
+ "eval_sts-test_pearson_cosine": 0.8795677625682299,
1007
+ "eval_sts-test_pearson_dot": 0.8659349142313639,
1008
+ "eval_sts-test_pearson_euclidean": 0.9099982334792637,
1009
+ "eval_sts-test_pearson_manhattan": 0.9099098081017423,
1010
+ "eval_sts-test_pearson_max": 0.9099982334792637,
1011
+ "eval_sts-test_spearman_cosine": 0.9085842782631862,
1012
+ "eval_sts-test_spearman_dot": 0.8767303751560495,
1013
+ "eval_sts-test_spearman_euclidean": 0.9083879992307234,
1014
+ "eval_sts-test_spearman_manhattan": 0.9084153422514337,
1015
+ "eval_sts-test_spearman_max": 0.9085842782631862,
1016
+ "step": 95
1017
+ },
1018
+ {
1019
+ "epoch": 0.7619047619047619,
1020
+ "grad_norm": 2.594452381134033,
1021
+ "learning_rate": 3.286049202380226e-05,
1022
+ "loss": 0.2516,
1023
+ "step": 96
1024
+ },
1025
+ {
1026
+ "epoch": 0.7698412698412699,
1027
+ "grad_norm": 2.6356875896453857,
1028
+ "learning_rate": 3.273237932805032e-05,
1029
+ "loss": 0.2956,
1030
+ "step": 97
1031
+ },
1032
+ {
1033
+ "epoch": 0.7777777777777778,
1034
+ "grad_norm": 2.6263818740844727,
1035
+ "learning_rate": 3.260086391643865e-05,
1036
+ "loss": 0.2733,
1037
+ "step": 98
1038
+ },
1039
+ {
1040
+ "epoch": 0.7857142857142857,
1041
+ "grad_norm": 2.636934757232666,
1042
+ "learning_rate": 3.246598212544159e-05,
1043
+ "loss": 0.2919,
1044
+ "step": 99
1045
+ },
1046
+ {
1047
+ "epoch": 0.7936507936507936,
1048
+ "grad_norm": 2.710754632949829,
1049
+ "learning_rate": 3.2327771221632486e-05,
1050
+ "loss": 0.2997,
1051
+ "step": 100
1052
+ },
1053
+ {
1054
+ "epoch": 0.7936507936507936,
1055
+ "eval_loss": 0.10318750143051147,
1056
+ "eval_runtime": 113.3236,
1057
+ "eval_samples_per_second": 26.932,
1058
+ "eval_steps_per_second": 0.212,
1059
+ "eval_sts-test_pearson_cosine": 0.8759700287215791,
1060
+ "eval_sts-test_pearson_dot": 0.8604257798750153,
1061
+ "eval_sts-test_pearson_euclidean": 0.9085309767938886,
1062
+ "eval_sts-test_pearson_manhattan": 0.9086607224369581,
1063
+ "eval_sts-test_pearson_max": 0.9086607224369581,
1064
+ "eval_sts-test_spearman_cosine": 0.9069411909499396,
1065
+ "eval_sts-test_spearman_dot": 0.8719651713405704,
1066
+ "eval_sts-test_spearman_euclidean": 0.9080629260679763,
1067
+ "eval_sts-test_spearman_manhattan": 0.907642129957113,
1068
+ "eval_sts-test_spearman_max": 0.9080629260679763,
1069
+ "step": 100
1070
+ },
1071
+ {
1072
+ "epoch": 0.8015873015873016,
1073
+ "grad_norm": 2.516517162322998,
1074
+ "learning_rate": 3.218626939138736e-05,
1075
+ "loss": 0.2276,
1076
+ "step": 101
1077
+ },
1078
+ {
1079
+ "epoch": 0.8095238095238095,
1080
+ "grad_norm": 2.6900599002838135,
1081
+ "learning_rate": 3.204151573033428e-05,
1082
+ "loss": 0.2582,
1083
+ "step": 102
1084
+ },
1085
+ {
1086
+ "epoch": 0.8174603174603174,
1087
+ "grad_norm": 2.5696845054626465,
1088
+ "learning_rate": 3.189355023255171e-05,
1089
+ "loss": 0.2559,
1090
+ "step": 103
1091
+ },
1092
+ {
1093
+ "epoch": 0.8253968253968254,
1094
+ "grad_norm": 2.7647061347961426,
1095
+ "learning_rate": 3.174241377951843e-05,
1096
+ "loss": 0.2864,
1097
+ "step": 104
1098
+ },
1099
+ {
1100
+ "epoch": 0.8333333333333334,
1101
+ "grad_norm": 2.7451863288879395,
1102
+ "learning_rate": 3.1588148128818425e-05,
1103
+ "loss": 0.2839,
1104
+ "step": 105
1105
+ },
1106
+ {
1107
+ "epoch": 0.8333333333333334,
1108
+ "eval_loss": 0.10743121802806854,
1109
+ "eval_runtime": 113.407,
1110
+ "eval_samples_per_second": 26.912,
1111
+ "eval_steps_per_second": 0.212,
1112
+ "eval_sts-test_pearson_cosine": 0.8767875416189156,
1113
+ "eval_sts-test_pearson_dot": 0.8604239654952288,
1114
+ "eval_sts-test_pearson_euclidean": 0.9092313999029703,
1115
+ "eval_sts-test_pearson_manhattan": 0.9094243708597547,
1116
+ "eval_sts-test_pearson_max": 0.9094243708597547,
1117
+ "eval_sts-test_spearman_cosine": 0.9075548649973998,
1118
+ "eval_sts-test_spearman_dot": 0.8721893304088798,
1119
+ "eval_sts-test_spearman_euclidean": 0.9081462081654257,
1120
+ "eval_sts-test_spearman_manhattan": 0.9082743310267892,
1121
+ "eval_sts-test_spearman_max": 0.9082743310267892,
1122
+ "step": 105
1123
+ },
1124
+ {
1125
+ "epoch": 0.8412698412698413,
1126
+ "grad_norm": 2.5484859943389893,
1127
+ "learning_rate": 3.1430795902603625e-05,
1128
+ "loss": 0.2549,
1129
+ "step": 106
1130
+ },
1131
+ {
1132
+ "epoch": 0.8492063492063492,
1133
+ "grad_norm": 2.6235854625701904,
1134
+ "learning_rate": 3.127040057581783e-05,
1135
+ "loss": 0.2826,
1136
+ "step": 107
1137
+ },
1138
+ {
1139
+ "epoch": 0.8571428571428571,
1140
+ "grad_norm": 2.5238397121429443,
1141
+ "learning_rate": 3.110700646418496e-05,
1142
+ "loss": 0.2334,
1143
+ "step": 108
1144
+ },
1145
+ {
1146
+ "epoch": 0.8650793650793651,
1147
+ "grad_norm": 2.740260362625122,
1148
+ "learning_rate": 3.0940658711965065e-05,
1149
+ "loss": 0.2632,
1150
+ "step": 109
1151
+ },
1152
+ {
1153
+ "epoch": 0.873015873015873,
1154
+ "grad_norm": 2.510667562484741,
1155
+ "learning_rate": 3.077140327948137e-05,
1156
+ "loss": 0.2255,
1157
+ "step": 110
1158
+ },
1159
+ {
1160
+ "epoch": 0.873015873015873,
1161
+ "eval_loss": 0.10900076478719711,
1162
+ "eval_runtime": 113.5509,
1163
+ "eval_samples_per_second": 26.878,
1164
+ "eval_steps_per_second": 0.211,
1165
+ "eval_sts-test_pearson_cosine": 0.8751607943383615,
1166
+ "eval_sts-test_pearson_dot": 0.8589309347875178,
1167
+ "eval_sts-test_pearson_euclidean": 0.9068514756772725,
1168
+ "eval_sts-test_pearson_manhattan": 0.9076530218955405,
1169
+ "eval_sts-test_pearson_max": 0.9076530218955405,
1170
+ "eval_sts-test_spearman_cosine": 0.9056104674412049,
1171
+ "eval_sts-test_spearman_dot": 0.8704153456560634,
1172
+ "eval_sts-test_spearman_euclidean": 0.9057139771088031,
1173
+ "eval_sts-test_spearman_manhattan": 0.9064273122153821,
1174
+ "eval_sts-test_spearman_max": 0.9064273122153821,
1175
+ "step": 110
1176
+ },
1177
+ {
1178
+ "epoch": 0.8809523809523809,
1179
+ "grad_norm": 2.552116632461548,
1180
+ "learning_rate": 3.059928693042189e-05,
1181
+ "loss": 0.2589,
1182
+ "step": 111
1183
+ },
1184
+ {
1185
+ "epoch": 0.8888888888888888,
1186
+ "grad_norm": 2.5401554107666016,
1187
+ "learning_rate": 3.0424357218919025e-05,
1188
+ "loss": 0.2569,
1189
+ "step": 112
1190
+ },
1191
+ {
1192
+ "epoch": 0.8968253968253969,
1193
+ "grad_norm": 2.695404291152954,
1194
+ "learning_rate": 3.0246662476410844e-05,
1195
+ "loss": 0.2797,
1196
+ "step": 113
1197
+ },
1198
+ {
1199
+ "epoch": 0.9047619047619048,
1200
+ "grad_norm": 2.8901283740997314,
1201
+ "learning_rate": 3.0066251798287526e-05,
1202
+ "loss": 0.2742,
1203
+ "step": 114
1204
+ },
1205
+ {
1206
+ "epoch": 0.9126984126984127,
1207
+ "grad_norm": 2.4771928787231445,
1208
+ "learning_rate": 2.9883175030326795e-05,
1209
+ "loss": 0.2295,
1210
+ "step": 115
1211
+ },
1212
+ {
1213
+ "epoch": 0.9126984126984127,
1214
+ "eval_loss": 0.10703907907009125,
1215
+ "eval_runtime": 113.456,
1216
+ "eval_samples_per_second": 26.9,
1217
+ "eval_steps_per_second": 0.212,
1218
+ "eval_sts-test_pearson_cosine": 0.8721877249457892,
1219
+ "eval_sts-test_pearson_dot": 0.8560616623137738,
1220
+ "eval_sts-test_pearson_euclidean": 0.9030366016666834,
1221
+ "eval_sts-test_pearson_manhattan": 0.9045537484069119,
1222
+ "eval_sts-test_pearson_max": 0.9045537484069119,
1223
+ "eval_sts-test_spearman_cosine": 0.9013517136508925,
1224
+ "eval_sts-test_spearman_dot": 0.8659265703821992,
1225
+ "eval_sts-test_spearman_euclidean": 0.9015738141611778,
1226
+ "eval_sts-test_spearman_manhattan": 0.9030298859530699,
1227
+ "eval_sts-test_spearman_max": 0.9030298859530699,
1228
+ "step": 115
1229
+ },
1230
+ {
1231
+ "epoch": 0.9206349206349206,
1232
+ "grad_norm": 2.4024698734283447,
1233
+ "learning_rate": 2.969748275492197e-05,
1234
+ "loss": 0.2047,
1235
+ "step": 116
1236
+ },
1237
+ {
1238
+ "epoch": 0.9285714285714286,
1239
+ "grad_norm": 2.7750799655914307,
1240
+ "learning_rate": 2.9509226277106527e-05,
1241
+ "loss": 0.2577,
1242
+ "step": 117
1243
+ },
1244
+ {
1245
+ "epoch": 0.9365079365079365,
1246
+ "grad_norm": 2.545588731765747,
1247
+ "learning_rate": 2.9318457610379043e-05,
1248
+ "loss": 0.2614,
1249
+ "step": 118
1250
+ },
1251
+ {
1252
+ "epoch": 0.9444444444444444,
1253
+ "grad_norm": 2.685835123062134,
1254
+ "learning_rate": 2.9125229462332293e-05,
1255
+ "loss": 0.2722,
1256
+ "step": 119
1257
+ },
1258
+ {
1259
+ "epoch": 0.9523809523809523,
1260
+ "grad_norm": 2.3174188137054443,
1261
+ "learning_rate": 2.892959522009068e-05,
1262
+ "loss": 0.1927,
1263
+ "step": 120
1264
+ },
1265
+ {
1266
+ "epoch": 0.9523809523809523,
1267
+ "eval_loss": 0.10235559195280075,
1268
+ "eval_runtime": 113.4984,
1269
+ "eval_samples_per_second": 26.89,
1270
+ "eval_steps_per_second": 0.211,
1271
+ "eval_sts-test_pearson_cosine": 0.872488088039419,
1272
+ "eval_sts-test_pearson_dot": 0.8563761302721377,
1273
+ "eval_sts-test_pearson_euclidean": 0.9034767476820997,
1274
+ "eval_sts-test_pearson_manhattan": 0.9044620383979292,
1275
+ "eval_sts-test_pearson_max": 0.9044620383979292,
1276
+ "eval_sts-test_spearman_cosine": 0.9008235592639511,
1277
+ "eval_sts-test_spearman_dot": 0.864130657511301,
1278
+ "eval_sts-test_spearman_euclidean": 0.9016059455668568,
1279
+ "eval_sts-test_spearman_manhattan": 0.9027126890123276,
1280
+ "eval_sts-test_spearman_max": 0.9027126890123276,
1281
+ "step": 120
1282
+ },
1283
+ {
1284
+ "epoch": 0.9603174603174603,
1285
+ "grad_norm": 2.8979287147521973,
1286
+ "learning_rate": 2.8731608935559857e-05,
1287
+ "loss": 0.2649,
1288
+ "step": 121
1289
+ },
1290
+ {
1291
+ "epoch": 0.9682539682539683,
1292
+ "grad_norm": 2.485367774963379,
1293
+ "learning_rate": 2.8531325310492677e-05,
1294
+ "loss": 0.2386,
1295
+ "step": 122
1296
+ },
1297
+ {
1298
+ "epoch": 0.9761904761904762,
1299
+ "grad_norm": 2.662865400314331,
1300
+ "learning_rate": 2.8328799681375657e-05,
1301
+ "loss": 0.2801,
1302
+ "step": 123
1303
+ },
1304
+ {
1305
+ "epoch": 0.9841269841269841,
1306
+ "grad_norm": 2.693284511566162,
1307
+ "learning_rate": 2.812408800413997e-05,
1308
+ "loss": 0.2583,
1309
+ "step": 124
1310
+ },
1311
+ {
1312
+ "epoch": 0.9920634920634921,
1313
+ "grad_norm": 3.0903170108795166,
1314
+ "learning_rate": 2.79172468387014e-05,
1315
+ "loss": 0.3076,
1316
+ "step": 125
1317
+ },
1318
+ {
1319
+ "epoch": 0.9920634920634921,
1320
+ "eval_loss": 0.09488630294799805,
1321
+ "eval_runtime": 113.5341,
1322
+ "eval_samples_per_second": 26.882,
1323
+ "eval_steps_per_second": 0.211,
1324
+ "eval_sts-test_pearson_cosine": 0.8742114661417826,
1325
+ "eval_sts-test_pearson_dot": 0.8564803125306033,
1326
+ "eval_sts-test_pearson_euclidean": 0.905509251442753,
1327
+ "eval_sts-test_pearson_manhattan": 0.9061976864024648,
1328
+ "eval_sts-test_pearson_max": 0.9061976864024648,
1329
+ "eval_sts-test_spearman_cosine": 0.901563655624842,
1330
+ "eval_sts-test_spearman_dot": 0.8618259675496438,
1331
+ "eval_sts-test_spearman_euclidean": 0.90313473815851,
1332
+ "eval_sts-test_spearman_manhattan": 0.9041778900615435,
1333
+ "eval_sts-test_spearman_max": 0.9041778900615435,
1334
+ "step": 125
1335
+ },
1336
+ {
1337
+ "epoch": 1.0,
1338
+ "grad_norm": NaN,
1339
+ "learning_rate": 2.79172468387014e-05,
1340
+ "loss": 0.5477,
1341
+ "step": 126
1342
+ },
1343
+ {
1344
+ "epoch": 1.007936507936508,
1345
+ "grad_norm": 0.982935905456543,
1346
+ "learning_rate": 2.7708333333333334e-05,
1347
+ "loss": 0.0031,
1348
+ "step": 127
1349
+ },
1350
+ {
1351
+ "epoch": 1.0158730158730158,
1352
+ "grad_norm": 7.32542133619063e-08,
1353
+ "learning_rate": 2.7497405208877213e-05,
1354
+ "loss": 0.0,
1355
+ "step": 128
1356
+ },
1357
+ {
1358
+ "epoch": 1.0238095238095237,
1359
+ "grad_norm": 0.0,
1360
+ "learning_rate": 2.7284520742794878e-05,
1361
+ "loss": 0.0,
1362
+ "step": 129
1363
+ },
1364
+ {
1365
+ "epoch": 1.0317460317460319,
1366
+ "grad_norm": 0.0,
1367
+ "learning_rate": 2.706973875306696e-05,
1368
+ "loss": 0.0,
1369
+ "step": 130
1370
+ },
1371
+ {
1372
+ "epoch": 1.0317460317460319,
1373
+ "eval_loss": 0.09545031189918518,
1374
+ "eval_runtime": 114.0298,
1375
+ "eval_samples_per_second": 26.765,
1376
+ "eval_steps_per_second": 0.21,
1377
+ "eval_sts-test_pearson_cosine": 0.874702030760496,
1378
+ "eval_sts-test_pearson_dot": 0.8555439931828537,
1379
+ "eval_sts-test_pearson_euclidean": 0.9065531106539271,
1380
+ "eval_sts-test_pearson_manhattan": 0.9071871020121037,
1381
+ "eval_sts-test_pearson_max": 0.9071871020121037,
1382
+ "eval_sts-test_spearman_cosine": 0.9021036690960521,
1383
+ "eval_sts-test_spearman_dot": 0.8598106392441436,
1384
+ "eval_sts-test_spearman_euclidean": 0.9043656663543417,
1385
+ "eval_sts-test_spearman_manhattan": 0.9048875555646884,
1386
+ "eval_sts-test_spearman_max": 0.9048875555646884,
1387
+ "step": 130
1388
+ },
1389
+ {
1390
+ "epoch": 1.0396825396825398,
1391
+ "grad_norm": 0.0,
1392
+ "learning_rate": 2.6853118581942095e-05,
1393
+ "loss": 0.0,
1394
+ "step": 131
1395
+ },
1396
+ {
1397
+ "epoch": 1.0476190476190477,
1398
+ "grad_norm": 0.0,
1399
+ "learning_rate": 2.66347200795412e-05,
1400
+ "loss": 0.0,
1401
+ "step": 132
1402
+ },
1403
+ {
1404
+ "epoch": 1.0555555555555556,
1405
+ "grad_norm": 0.0,
1406
+ "learning_rate": 2.6414603587321415e-05,
1407
+ "loss": 0.0,
1408
+ "step": 133
1409
+ },
1410
+ {
1411
+ "epoch": 1.0634920634920635,
1412
+ "grad_norm": 0.0,
1413
+ "learning_rate": 2.6192829921404365e-05,
1414
+ "loss": 0.0,
1415
+ "step": 134
1416
+ },
1417
+ {
1418
+ "epoch": 1.0714285714285714,
1419
+ "grad_norm": 0.0,
1420
+ "learning_rate": 2.596946035577322e-05,
1421
+ "loss": 0.0,
1422
+ "step": 135
1423
+ },
1424
+ {
1425
+ "epoch": 1.0714285714285714,
1426
+ "eval_loss": 0.09676523506641388,
1427
+ "eval_runtime": 113.8781,
1428
+ "eval_samples_per_second": 26.801,
1429
+ "eval_steps_per_second": 0.211,
1430
+ "eval_sts-test_pearson_cosine": 0.8747276197594331,
1431
+ "eval_sts-test_pearson_dot": 0.8544620877290974,
1432
+ "eval_sts-test_pearson_euclidean": 0.9070156259790649,
1433
+ "eval_sts-test_pearson_manhattan": 0.9076369642785601,
1434
+ "eval_sts-test_pearson_max": 0.9076369642785601,
1435
+ "eval_sts-test_spearman_cosine": 0.9022577924402841,
1436
+ "eval_sts-test_spearman_dot": 0.8575864120504681,
1437
+ "eval_sts-test_spearman_euclidean": 0.9046890144393489,
1438
+ "eval_sts-test_spearman_manhattan": 0.9049111394794415,
1439
+ "eval_sts-test_spearman_max": 0.9049111394794415,
1440
+ "step": 135
1441
+ },
1442
+ {
1443
+ "epoch": 1.0793650793650793,
1444
+ "grad_norm": 0.0,
1445
+ "learning_rate": 2.5744556605343263e-05,
1446
+ "loss": 0.0,
1447
+ "step": 136
1448
+ },
1449
+ {
1450
+ "epoch": 1.0873015873015872,
1451
+ "grad_norm": 0.0,
1452
+ "learning_rate": 2.5518180808910628e-05,
1453
+ "loss": 0.0,
1454
+ "step": 137
1455
+ },
1456
+ {
1457
+ "epoch": 1.0952380952380953,
1458
+ "grad_norm": 0.0,
1459
+ "learning_rate": 2.5290395511983987e-05,
1460
+ "loss": 0.0,
1461
+ "step": 138
1462
+ },
1463
+ {
1464
+ "epoch": 1.1031746031746033,
1465
+ "grad_norm": 0.0,
1466
+ "learning_rate": 2.5061263649503735e-05,
1467
+ "loss": 0.0,
1468
+ "step": 139
1469
+ },
1470
+ {
1471
+ "epoch": 1.1111111111111112,
1472
+ "grad_norm": 0.0,
1473
+ "learning_rate": 2.4830848528453706e-05,
1474
+ "loss": 0.0,
1475
+ "step": 140
1476
+ },
1477
+ {
1478
+ "epoch": 1.1111111111111112,
1479
+ "eval_loss": 0.09783273935317993,
1480
+ "eval_runtime": 113.4913,
1481
+ "eval_samples_per_second": 26.892,
1482
+ "eval_steps_per_second": 0.211,
1483
+ "eval_sts-test_pearson_cosine": 0.8746125563575086,
1484
+ "eval_sts-test_pearson_dot": 0.8536822328210044,
1485
+ "eval_sts-test_pearson_euclidean": 0.9071769017286274,
1486
+ "eval_sts-test_pearson_manhattan": 0.9077849350046876,
1487
+ "eval_sts-test_pearson_max": 0.9077849350046876,
1488
+ "eval_sts-test_spearman_cosine": 0.9023702076088993,
1489
+ "eval_sts-test_spearman_dot": 0.8566226041338817,
1490
+ "eval_sts-test_spearman_euclidean": 0.9049979121752795,
1491
+ "eval_sts-test_spearman_manhattan": 0.9049124372660218,
1492
+ "eval_sts-test_spearman_max": 0.9049979121752795,
1493
+ "step": 140
1494
+ },
1495
+ {
1496
+ "epoch": 1.119047619047619,
1497
+ "grad_norm": 0.0,
1498
+ "learning_rate": 2.4599213810370067e-05,
1499
+ "loss": 0.0,
1500
+ "step": 141
1501
+ },
1502
+ {
1503
+ "epoch": 1.126984126984127,
1504
+ "grad_norm": 0.0,
1505
+ "learning_rate": 2.4366423493752155e-05,
1506
+ "loss": 0.0,
1507
+ "step": 142
1508
+ },
1509
+ {
1510
+ "epoch": 1.1349206349206349,
1511
+ "grad_norm": 0.0,
1512
+ "learning_rate": 2.4132541896380374e-05,
1513
+ "loss": 0.0,
1514
+ "step": 143
1515
+ },
1516
+ {
1517
+ "epoch": 1.1428571428571428,
1518
+ "grad_norm": 0.0,
1519
+ "learning_rate": 2.3897633637545755e-05,
1520
+ "loss": 0.0,
1521
+ "step": 144
1522
+ },
1523
+ {
1524
+ "epoch": 1.1507936507936507,
1525
+ "grad_norm": 0.0,
1526
+ "learning_rate": 2.366176362019625e-05,
1527
+ "loss": 0.0,
1528
+ "step": 145
1529
+ },
1530
+ {
1531
+ "epoch": 1.1507936507936507,
1532
+ "eval_loss": 0.09863892197608948,
1533
+ "eval_runtime": 113.5886,
1534
+ "eval_samples_per_second": 26.869,
1535
+ "eval_steps_per_second": 0.211,
1536
+ "eval_sts-test_pearson_cosine": 0.8744759141501874,
1537
+ "eval_sts-test_pearson_dot": 0.8531679877756508,
1538
+ "eval_sts-test_pearson_euclidean": 0.9072042306444432,
1539
+ "eval_sts-test_pearson_manhattan": 0.907805846396001,
1540
+ "eval_sts-test_pearson_max": 0.907805846396001,
1541
+ "eval_sts-test_spearman_cosine": 0.9023667170105107,
1542
+ "eval_sts-test_spearman_dot": 0.8560352885793809,
1543
+ "eval_sts-test_spearman_euclidean": 0.9048520230631434,
1544
+ "eval_sts-test_spearman_manhattan": 0.9049029947498682,
1545
+ "eval_sts-test_spearman_max": 0.9049029947498682,
1546
+ "step": 145
1547
+ },
1548
+ {
1549
+ "epoch": 1.1587301587301588,
1550
+ "grad_norm": 0.0,
1551
+ "learning_rate": 2.342499701300467e-05,
1552
+ "loss": 0.0,
1553
+ "step": 146
1554
+ },
1555
+ {
1556
+ "epoch": 1.1666666666666667,
1557
+ "grad_norm": 0.0,
1558
+ "learning_rate": 2.318739923236319e-05,
1559
+ "loss": 0.0,
1560
+ "step": 147
1561
+ },
1562
+ {
1563
+ "epoch": 1.1746031746031746,
1564
+ "grad_norm": 0.0,
1565
+ "learning_rate": 2.29490359243094e-05,
1566
+ "loss": 0.0,
1567
+ "step": 148
1568
+ },
1569
+ {
1570
+ "epoch": 1.1825396825396826,
1571
+ "grad_norm": 0.0,
1572
+ "learning_rate": 2.270997294638895e-05,
1573
+ "loss": 0.0,
1574
+ "step": 149
1575
+ },
1576
+ {
1577
+ "epoch": 1.1904761904761905,
1578
+ "grad_norm": 0.0,
1579
+ "learning_rate": 2.24702763494597e-05,
1580
+ "loss": 0.0,
1581
+ "step": 150
1582
+ },
1583
+ {
1584
+ "epoch": 1.1904761904761905,
1585
+ "eval_loss": 0.09906759858131409,
1586
+ "eval_runtime": 113.5604,
1587
+ "eval_samples_per_second": 26.876,
1588
+ "eval_steps_per_second": 0.211,
1589
+ "eval_sts-test_pearson_cosine": 0.8744060293405248,
1590
+ "eval_sts-test_pearson_dot": 0.8528829505611442,
1591
+ "eval_sts-test_pearson_euclidean": 0.9072185184616055,
1592
+ "eval_sts-test_pearson_manhattan": 0.9078100463432079,
1593
+ "eval_sts-test_pearson_max": 0.9078100463432079,
1594
+ "eval_sts-test_spearman_cosine": 0.9022761852087159,
1595
+ "eval_sts-test_spearman_dot": 0.8555268694987131,
1596
+ "eval_sts-test_spearman_euclidean": 0.9047365648087538,
1597
+ "eval_sts-test_spearman_manhattan": 0.9048373894006686,
1598
+ "eval_sts-test_spearman_max": 0.9048373894006686,
1599
+ "step": 150
1600
+ },
1601
+ {
1602
+ "epoch": 1.1984126984126984,
1603
+ "grad_norm": 0.0,
1604
+ "learning_rate": 2.2230012359442495e-05,
1605
+ "loss": 0.0,
1606
+ "step": 151
1607
+ },
1608
+ {
1609
+ "epoch": 1.2063492063492063,
1610
+ "grad_norm": 0.0,
1611
+ "learning_rate": 2.1989247359023566e-05,
1612
+ "loss": 0.0,
1613
+ "step": 152
1614
+ },
1615
+ {
1616
+ "epoch": 1.2142857142857142,
1617
+ "grad_norm": 0.0,
1618
+ "learning_rate": 2.174804786931362e-05,
1619
+ "loss": 0.0,
1620
+ "step": 153
1621
+ },
1622
+ {
1623
+ "epoch": 1.2222222222222223,
1624
+ "grad_norm": 0.0,
1625
+ "learning_rate": 2.150648053146869e-05,
1626
+ "loss": 0.0,
1627
+ "step": 154
1628
+ },
1629
+ {
1630
+ "epoch": 1.2301587301587302,
1631
+ "grad_norm": 0.0,
1632
+ "learning_rate": 2.126461208827777e-05,
1633
+ "loss": 0.0,
1634
+ "step": 155
1635
+ },
1636
+ {
1637
+ "epoch": 1.2301587301587302,
1638
+ "eval_loss": 0.09936919808387756,
1639
+ "eval_runtime": 113.5013,
1640
+ "eval_samples_per_second": 26.89,
1641
+ "eval_steps_per_second": 0.211,
1642
+ "eval_sts-test_pearson_cosine": 0.8743508789394699,
1643
+ "eval_sts-test_pearson_dot": 0.8527007540947626,
1644
+ "eval_sts-test_pearson_euclidean": 0.9072202919761476,
1645
+ "eval_sts-test_pearson_manhattan": 0.9078091191041777,
1646
+ "eval_sts-test_pearson_max": 0.9078091191041777,
1647
+ "eval_sts-test_spearman_cosine": 0.9023025884529369,
1648
+ "eval_sts-test_spearman_dot": 0.8550739867334323,
1649
+ "eval_sts-test_spearman_euclidean": 0.9046202541246842,
1650
+ "eval_sts-test_spearman_manhattan": 0.9049393775253795,
1651
+ "eval_sts-test_spearman_max": 0.9049393775253795,
1652
+ "step": 155
1653
+ },
1654
+ {
1655
+ "epoch": 1.2380952380952381,
1656
+ "grad_norm": 0.0,
1657
+ "learning_rate": 2.102250936572247e-05,
1658
+ "loss": 0.0,
1659
+ "step": 156
1660
+ },
1661
+ {
1662
+ "epoch": 1.246031746031746,
1663
+ "grad_norm": 0.0,
1664
+ "learning_rate": 2.0780239254513565e-05,
1665
+ "loss": 0.0,
1666
+ "step": 157
1667
+ },
1668
+ {
1669
+ "epoch": 1.253968253968254,
1670
+ "grad_norm": 0.0,
1671
+ "learning_rate": 2.0537868691609745e-05,
1672
+ "loss": 0.0,
1673
+ "step": 158
1674
+ },
1675
+ {
1676
+ "epoch": 1.2619047619047619,
1677
+ "grad_norm": 0.0,
1678
+ "learning_rate": 2.0295464641723583e-05,
1679
+ "loss": 0.0,
1680
+ "step": 159
1681
+ },
1682
+ {
1683
+ "epoch": 1.2698412698412698,
1684
+ "grad_norm": 0.0,
1685
+ "learning_rate": 2.005309407881977e-05,
1686
+ "loss": 0.0,
1687
+ "step": 160
1688
+ },
1689
+ {
1690
+ "epoch": 1.2698412698412698,
1691
+ "eval_loss": 0.09954561293125153,
1692
+ "eval_runtime": 113.8237,
1693
+ "eval_samples_per_second": 26.813,
1694
+ "eval_steps_per_second": 0.211,
1695
+ "eval_sts-test_pearson_cosine": 0.8743232038160612,
1696
+ "eval_sts-test_pearson_dot": 0.8526065970333739,
1697
+ "eval_sts-test_pearson_euclidean": 0.9072152805749177,
1698
+ "eval_sts-test_pearson_manhattan": 0.9078067984919014,
1699
+ "eval_sts-test_pearson_max": 0.9078067984919014,
1700
+ "eval_sts-test_spearman_cosine": 0.9022778857566487,
1701
+ "eval_sts-test_spearman_dot": 0.8549482356889225,
1702
+ "eval_sts-test_spearman_euclidean": 0.9046245971527523,
1703
+ "eval_sts-test_spearman_manhattan": 0.9048685362785969,
1704
+ "eval_sts-test_spearman_max": 0.9048685362785969,
1705
+ "step": 160
1706
+ },
1707
+ {
1708
+ "epoch": 1.2777777777777777,
1709
+ "grad_norm": 0.0,
1710
+ "learning_rate": 1.981082396761086e-05,
1711
+ "loss": 0.0,
1712
+ "step": 161
1713
+ },
1714
+ {
1715
+ "epoch": 1.2857142857142858,
1716
+ "grad_norm": 0.0,
1717
+ "learning_rate": 1.956872124505556e-05,
1718
+ "loss": 0.0,
1719
+ "step": 162
1720
+ },
1721
+ {
1722
+ "epoch": 1.2936507936507937,
1723
+ "grad_norm": 0.0,
1724
+ "learning_rate": 1.9326852801864646e-05,
1725
+ "loss": 0.0,
1726
+ "step": 163
1727
+ },
1728
+ {
1729
+ "epoch": 1.3015873015873016,
1730
+ "grad_norm": 0.0,
1731
+ "learning_rate": 1.908528546401971e-05,
1732
+ "loss": 0.0,
1733
+ "step": 164
1734
+ },
1735
+ {
1736
+ "epoch": 1.3095238095238095,
1737
+ "grad_norm": 0.0,
1738
+ "learning_rate": 1.8844085974309768e-05,
1739
+ "loss": 0.0,
1740
+ "step": 165
1741
+ },
1742
+ {
1743
+ "epoch": 1.3095238095238095,
1744
+ "eval_loss": 0.09964071214199066,
1745
+ "eval_runtime": 113.5481,
1746
+ "eval_samples_per_second": 26.878,
1747
+ "eval_steps_per_second": 0.211,
1748
+ "eval_sts-test_pearson_cosine": 0.874299456945445,
1749
+ "eval_sts-test_pearson_dot": 0.8525374757432119,
1750
+ "eval_sts-test_pearson_euclidean": 0.9072080565726848,
1751
+ "eval_sts-test_pearson_manhattan": 0.907801820994878,
1752
+ "eval_sts-test_pearson_max": 0.907801820994878,
1753
+ "eval_sts-test_spearman_cosine": 0.9022729631178957,
1754
+ "eval_sts-test_spearman_dot": 0.8546675557774756,
1755
+ "eval_sts-test_spearman_euclidean": 0.9046749423218178,
1756
+ "eval_sts-test_spearman_manhattan": 0.9047542415570032,
1757
+ "eval_sts-test_spearman_max": 0.9047542415570032,
1758
+ "step": 165
1759
+ },
1760
+ {
1761
+ "epoch": 1.3174603174603174,
1762
+ "grad_norm": 0.0,
1763
+ "learning_rate": 1.8603320973890842e-05,
1764
+ "loss": 0.0,
1765
+ "step": 166
1766
+ },
1767
+ {
1768
+ "epoch": 1.3253968253968254,
1769
+ "grad_norm": 0.0,
1770
+ "learning_rate": 1.836305698387363e-05,
1771
+ "loss": 0.0,
1772
+ "step": 167
1773
+ },
1774
+ {
1775
+ "epoch": 1.3333333333333333,
1776
+ "grad_norm": 0.0,
1777
+ "learning_rate": 1.812336038694438e-05,
1778
+ "loss": 0.0,
1779
+ "step": 168
1780
+ },
1781
+ {
1782
+ "epoch": 1.3412698412698412,
1783
+ "grad_norm": 0.0,
1784
+ "learning_rate": 1.7884297409023932e-05,
1785
+ "loss": 0.0,
1786
+ "step": 169
1787
+ },
1788
+ {
1789
+ "epoch": 1.3492063492063493,
1790
+ "grad_norm": 0.0,
1791
+ "learning_rate": 1.7645934100970145e-05,
1792
+ "loss": 0.0,
1793
+ "step": 170
1794
+ },
1795
+ {
1796
+ "epoch": 1.3492063492063493,
1797
+ "eval_loss": 0.09969252347946167,
1798
+ "eval_runtime": 113.9068,
1799
+ "eval_samples_per_second": 26.794,
1800
+ "eval_steps_per_second": 0.211,
1801
+ "eval_sts-test_pearson_cosine": 0.8742791376017494,
1802
+ "eval_sts-test_pearson_dot": 0.8524863854057724,
1803
+ "eval_sts-test_pearson_euclidean": 0.9072037622211908,
1804
+ "eval_sts-test_pearson_manhattan": 0.9077955012591168,
1805
+ "eval_sts-test_pearson_max": 0.9077955012591168,
1806
+ "eval_sts-test_spearman_cosine": 0.9022626703277757,
1807
+ "eval_sts-test_spearman_dot": 0.8547057286034424,
1808
+ "eval_sts-test_spearman_euclidean": 0.9046214198131936,
1809
+ "eval_sts-test_spearman_manhattan": 0.904806332025263,
1810
+ "eval_sts-test_spearman_max": 0.904806332025263,
1811
+ "step": 170
1812
+ },
1813
+ {
1814
+ "epoch": 1.3571428571428572,
1815
+ "grad_norm": 0.0,
1816
+ "learning_rate": 1.740833632032866e-05,
1817
+ "loss": 0.0,
1818
+ "step": 171
1819
+ },
1820
+ {
1821
+ "epoch": 1.3650793650793651,
1822
+ "grad_norm": 0.0,
1823
+ "learning_rate": 1.717156971313708e-05,
1824
+ "loss": 0.0,
1825
+ "step": 172
1826
+ },
1827
+ {
1828
+ "epoch": 1.373015873015873,
1829
+ "grad_norm": 0.0,
1830
+ "learning_rate": 1.6935699695787573e-05,
1831
+ "loss": 0.0,
1832
+ "step": 173
1833
+ },
1834
+ {
1835
+ "epoch": 1.380952380952381,
1836
+ "grad_norm": 0.0,
1837
+ "learning_rate": 1.6700791436952954e-05,
1838
+ "loss": 0.0,
1839
+ "step": 174
1840
+ },
1841
+ {
1842
+ "epoch": 1.3888888888888888,
1843
+ "grad_norm": 0.0,
1844
+ "learning_rate": 1.6466909839581176e-05,
1845
+ "loss": 0.0,
1846
+ "step": 175
1847
+ },
1848
+ {
1849
+ "epoch": 1.3888888888888888,
1850
+ "eval_loss": 0.09972869604825974,
1851
+ "eval_runtime": 113.3507,
1852
+ "eval_samples_per_second": 26.925,
1853
+ "eval_steps_per_second": 0.212,
1854
+ "eval_sts-test_pearson_cosine": 0.8742756437676581,
1855
+ "eval_sts-test_pearson_dot": 0.852466650373626,
1856
+ "eval_sts-test_pearson_euclidean": 0.9072071674153375,
1857
+ "eval_sts-test_pearson_manhattan": 0.9077967797332845,
1858
+ "eval_sts-test_pearson_max": 0.9077967797332845,
1859
+ "eval_sts-test_spearman_cosine": 0.9022777962541261,
1860
+ "eval_sts-test_spearman_dot": 0.8546733734414563,
1861
+ "eval_sts-test_spearman_euclidean": 0.9045410465477347,
1862
+ "eval_sts-test_spearman_manhattan": 0.9047774674616654,
1863
+ "eval_sts-test_spearman_max": 0.9047774674616654,
1864
+ "step": 175
1865
+ },
1866
+ {
1867
+ "epoch": 1.3968253968253967,
1868
+ "grad_norm": 0.0,
1869
+ "learning_rate": 1.6234119522963267e-05,
1870
+ "loss": 0.0,
1871
+ "step": 176
1872
+ },
1873
+ {
1874
+ "epoch": 1.4047619047619047,
1875
+ "grad_norm": 0.0,
1876
+ "learning_rate": 1.6002484804879622e-05,
1877
+ "loss": 0.0,
1878
+ "step": 177
1879
+ },
1880
+ {
1881
+ "epoch": 1.4126984126984128,
1882
+ "grad_norm": 0.0,
1883
+ "learning_rate": 1.5772069683829603e-05,
1884
+ "loss": 0.0,
1885
+ "step": 178
1886
+ },
1887
+ {
1888
+ "epoch": 1.4206349206349207,
1889
+ "grad_norm": 0.0,
1890
+ "learning_rate": 1.5542937821349347e-05,
1891
+ "loss": 0.0,
1892
+ "step": 179
1893
+ },
1894
+ {
1895
+ "epoch": 1.4285714285714286,
1896
+ "grad_norm": 0.0,
1897
+ "learning_rate": 1.5315152524422703e-05,
1898
+ "loss": 0.0,
1899
+ "step": 180
1900
+ },
1901
+ {
1902
+ "epoch": 1.4285714285714286,
1903
+ "eval_loss": 0.09972812980413437,
1904
+ "eval_runtime": 113.3751,
1905
+ "eval_samples_per_second": 26.919,
1906
+ "eval_steps_per_second": 0.212,
1907
+ "eval_sts-test_pearson_cosine": 0.874267739156544,
1908
+ "eval_sts-test_pearson_dot": 0.8524562621509095,
1909
+ "eval_sts-test_pearson_euclidean": 0.9072067299928095,
1910
+ "eval_sts-test_pearson_manhattan": 0.907793793015005,
1911
+ "eval_sts-test_pearson_max": 0.907793793015005,
1912
+ "eval_sts-test_spearman_cosine": 0.902276095706193,
1913
+ "eval_sts-test_spearman_dot": 0.8546634834126889,
1914
+ "eval_sts-test_spearman_euclidean": 0.9045989546799751,
1915
+ "eval_sts-test_spearman_manhattan": 0.904807495558059,
1916
+ "eval_sts-test_spearman_max": 0.904807495558059,
1917
+ "step": 180
1918
+ },
1919
+ {
1920
+ "epoch": 1.4365079365079365,
1921
+ "grad_norm": 0.0,
1922
+ "learning_rate": 1.508877672799007e-05,
1923
+ "loss": 0.0,
1924
+ "step": 181
1925
+ },
1926
+ {
1927
+ "epoch": 1.4444444444444444,
1928
+ "grad_norm": 0.0,
1929
+ "learning_rate": 1.486387297756011e-05,
1930
+ "loss": 0.0,
1931
+ "step": 182
1932
+ },
1933
+ {
1934
+ "epoch": 1.4523809523809523,
1935
+ "grad_norm": 0.0,
1936
+ "learning_rate": 1.4640503411928961e-05,
1937
+ "loss": 0.0,
1938
+ "step": 183
1939
+ },
1940
+ {
1941
+ "epoch": 1.4603174603174602,
1942
+ "grad_norm": 0.0,
1943
+ "learning_rate": 1.4418729746011916e-05,
1944
+ "loss": 0.0,
1945
+ "step": 184
1946
+ },
1947
+ {
1948
+ "epoch": 1.4682539682539681,
1949
+ "grad_norm": 0.0,
1950
+ "learning_rate": 1.4198613253792132e-05,
1951
+ "loss": 0.0,
1952
+ "step": 185
1953
+ },
1954
+ {
1955
+ "epoch": 1.4682539682539681,
1956
+ "eval_loss": 0.09976483881473541,
1957
+ "eval_runtime": 113.7822,
1958
+ "eval_samples_per_second": 26.823,
1959
+ "eval_steps_per_second": 0.211,
1960
+ "eval_sts-test_pearson_cosine": 0.8742870531942544,
1961
+ "eval_sts-test_pearson_dot": 0.8524755889148932,
1962
+ "eval_sts-test_pearson_euclidean": 0.907218239020885,
1963
+ "eval_sts-test_pearson_manhattan": 0.9078067867399013,
1964
+ "eval_sts-test_pearson_max": 0.9078067867399013,
1965
+ "eval_sts-test_spearman_cosine": 0.9022592692319099,
1966
+ "eval_sts-test_spearman_dot": 0.854695972828459,
1967
+ "eval_sts-test_spearman_euclidean": 0.9046065623944117,
1968
+ "eval_sts-test_spearman_manhattan": 0.9048098673749128,
1969
+ "eval_sts-test_spearman_max": 0.9048098673749128,
1970
+ "step": 185
1971
+ },
1972
+ {
1973
+ "epoch": 1.4761904761904763,
1974
+ "grad_norm": 0.0,
1975
+ "learning_rate": 1.3980214751391232e-05,
1976
+ "loss": 0.0,
1977
+ "step": 186
1978
+ },
1979
+ {
1980
+ "epoch": 1.4841269841269842,
1981
+ "grad_norm": 0.0,
1982
+ "learning_rate": 1.3763594580266373e-05,
1983
+ "loss": 0.0,
1984
+ "step": 187
1985
+ },
1986
+ {
1987
+ "epoch": 1.492063492063492,
1988
+ "grad_norm": 0.0,
1989
+ "learning_rate": 1.354881259053846e-05,
1990
+ "loss": 0.0,
1991
+ "step": 188
1992
+ },
1993
+ {
1994
+ "epoch": 1.5,
1995
+ "grad_norm": 0.0,
1996
+ "learning_rate": 1.3335928124456112e-05,
1997
+ "loss": 0.0,
1998
+ "step": 189
1999
+ },
2000
+ {
2001
+ "epoch": 1.507936507936508,
2002
+ "grad_norm": 0.0,
2003
+ "learning_rate": 1.3125000000000002e-05,
2004
+ "loss": 0.0,
2005
+ "step": 190
2006
+ },
2007
+ {
2008
+ "epoch": 1.507936507936508,
2009
+ "eval_loss": 0.09976605325937271,
2010
+ "eval_runtime": 113.3826,
2011
+ "eval_samples_per_second": 26.918,
2012
+ "eval_steps_per_second": 0.212,
2013
+ "eval_sts-test_pearson_cosine": 0.874263623361507,
2014
+ "eval_sts-test_pearson_dot": 0.8524433275217005,
2015
+ "eval_sts-test_pearson_euclidean": 0.9072030414821628,
2016
+ "eval_sts-test_pearson_manhattan": 0.9077920475707721,
2017
+ "eval_sts-test_pearson_max": 0.9077920475707721,
2018
+ "eval_sts-test_spearman_cosine": 0.9022537200754974,
2019
+ "eval_sts-test_spearman_dot": 0.8545940742062709,
2020
+ "eval_sts-test_spearman_euclidean": 0.9046282667561865,
2021
+ "eval_sts-test_spearman_manhattan": 0.9047952337124378,
2022
+ "eval_sts-test_spearman_max": 0.9047952337124378,
2023
+ "step": 190
2024
+ },
2025
+ {
2026
+ "epoch": 1.5158730158730158,
2027
+ "grad_norm": 0.0,
2028
+ "learning_rate": 1.2916086494631928e-05,
2029
+ "loss": 0.0,
2030
+ "step": 191
2031
+ },
2032
+ {
2033
+ "epoch": 1.5238095238095237,
2034
+ "grad_norm": 0.0,
2035
+ "learning_rate": 1.270924532919336e-05,
2036
+ "loss": 0.0,
2037
+ "step": 192
2038
+ },
2039
+ {
2040
+ "epoch": 1.5317460317460319,
2041
+ "grad_norm": 0.0,
2042
+ "learning_rate": 1.2504533651957674e-05,
2043
+ "loss": 0.0,
2044
+ "step": 193
2045
+ },
2046
+ {
2047
+ "epoch": 1.5396825396825398,
2048
+ "grad_norm": 0.0,
2049
+ "learning_rate": 1.2302008022840655e-05,
2050
+ "loss": 0.0,
2051
+ "step": 194
2052
+ },
2053
+ {
2054
+ "epoch": 1.5476190476190477,
2055
+ "grad_norm": 0.0,
2056
+ "learning_rate": 1.2101724397773472e-05,
2057
+ "loss": 0.0,
2058
+ "step": 195
2059
+ },
2060
+ {
2061
+ "epoch": 1.5476190476190477,
2062
+ "eval_loss": 0.0997559204697609,
2063
+ "eval_runtime": 113.435,
2064
+ "eval_samples_per_second": 26.905,
2065
+ "eval_steps_per_second": 0.212,
2066
+ "eval_sts-test_pearson_cosine": 0.8742609202554884,
2067
+ "eval_sts-test_pearson_dot": 0.8524475712791351,
2068
+ "eval_sts-test_pearson_euclidean": 0.9071965067276254,
2069
+ "eval_sts-test_pearson_manhattan": 0.9077883011236819,
2070
+ "eval_sts-test_pearson_max": 0.9077883011236819,
2071
+ "eval_sts-test_spearman_cosine": 0.9022419952450129,
2072
+ "eval_sts-test_spearman_dot": 0.8546614696059263,
2073
+ "eval_sts-test_spearman_euclidean": 0.9045798011400996,
2074
+ "eval_sts-test_spearman_manhattan": 0.9047780939793248,
2075
+ "eval_sts-test_spearman_max": 0.9047780939793248,
2076
+ "step": 195
2077
+ },
2078
+ {
2079
+ "epoch": 1.5555555555555556,
2080
+ "grad_norm": 0.0,
2081
+ "learning_rate": 1.1903738113242652e-05,
2082
+ "loss": 0.0,
2083
+ "step": 196
2084
+ },
2085
+ {
2086
+ "epoch": 1.5634920634920635,
2087
+ "grad_norm": 0.0,
2088
+ "learning_rate": 1.1708103871001038e-05,
2089
+ "loss": 0.0,
2090
+ "step": 197
2091
+ },
2092
+ {
2093
+ "epoch": 1.5714285714285714,
2094
+ "grad_norm": 0.0,
2095
+ "learning_rate": 1.1514875722954288e-05,
2096
+ "loss": 0.0,
2097
+ "step": 198
2098
+ },
2099
+ {
2100
+ "epoch": 1.5793650793650793,
2101
+ "grad_norm": 0.0,
2102
+ "learning_rate": 1.1324107056226802e-05,
2103
+ "loss": 0.0,
2104
+ "step": 199
2105
+ },
2106
+ {
2107
+ "epoch": 1.5873015873015872,
2108
+ "grad_norm": 0.0,
2109
+ "learning_rate": 1.1135850578411364e-05,
2110
+ "loss": 0.0,
2111
+ "step": 200
2112
+ },
2113
+ {
2114
+ "epoch": 1.5873015873015872,
2115
+ "eval_loss": 0.09975530952215195,
2116
+ "eval_runtime": 113.4016,
2117
+ "eval_samples_per_second": 26.913,
2118
+ "eval_steps_per_second": 0.212,
2119
+ "eval_sts-test_pearson_cosine": 0.8742691362759177,
2120
+ "eval_sts-test_pearson_dot": 0.8524477872044587,
2121
+ "eval_sts-test_pearson_euclidean": 0.9072012894574042,
2122
+ "eval_sts-test_pearson_manhattan": 0.9077892471366175,
2123
+ "eval_sts-test_pearson_max": 0.9077892471366175,
2124
+ "eval_sts-test_spearman_cosine": 0.9022147864780868,
2125
+ "eval_sts-test_spearman_dot": 0.8547034910403729,
2126
+ "eval_sts-test_spearman_euclidean": 0.904585216042728,
2127
+ "eval_sts-test_spearman_manhattan": 0.9047465890913055,
2128
+ "eval_sts-test_spearman_max": 0.9047465890913055,
2129
+ "step": 200
2130
+ },
2131
+ {
2132
+ "epoch": 1.5952380952380953,
2133
+ "grad_norm": 0.0,
2134
+ "learning_rate": 1.0950158303006534e-05,
2135
+ "loss": 0.0,
2136
+ "step": 201
2137
+ },
2138
+ {
2139
+ "epoch": 1.6031746031746033,
2140
+ "grad_norm": 0.0,
2141
+ "learning_rate": 1.0767081535045804e-05,
2142
+ "loss": 0.0,
2143
+ "step": 202
2144
+ },
2145
+ {
2146
+ "epoch": 1.6111111111111112,
2147
+ "grad_norm": 0.0,
2148
+ "learning_rate": 1.0586670856922482e-05,
2149
+ "loss": 0.0,
2150
+ "step": 203
2151
+ },
2152
+ {
2153
+ "epoch": 1.619047619047619,
2154
+ "grad_norm": 0.0,
2155
+ "learning_rate": 1.0408976114414303e-05,
2156
+ "loss": 0.0,
2157
+ "step": 204
2158
+ },
2159
+ {
2160
+ "epoch": 1.626984126984127,
2161
+ "grad_norm": 0.0,
2162
+ "learning_rate": 1.0234046402911438e-05,
2163
+ "loss": 0.0,
2164
+ "step": 205
2165
+ },
2166
+ {
2167
+ "epoch": 1.626984126984127,
2168
+ "eval_loss": 0.09976042062044144,
2169
+ "eval_runtime": 113.351,
2170
+ "eval_samples_per_second": 26.925,
2171
+ "eval_steps_per_second": 0.212,
2172
+ "eval_sts-test_pearson_cosine": 0.8742827064396023,
2173
+ "eval_sts-test_pearson_dot": 0.8524658058205945,
2174
+ "eval_sts-test_pearson_euclidean": 0.9072144921384091,
2175
+ "eval_sts-test_pearson_manhattan": 0.9078074095863298,
2176
+ "eval_sts-test_pearson_max": 0.9078074095863298,
2177
+ "eval_sts-test_spearman_cosine": 0.9022449488282648,
2178
+ "eval_sts-test_spearman_dot": 0.8547093534556153,
2179
+ "eval_sts-test_spearman_euclidean": 0.9046033403035915,
2180
+ "eval_sts-test_spearman_manhattan": 0.9048224424793636,
2181
+ "eval_sts-test_spearman_max": 0.9048224424793636,
2182
+ "step": 205
2183
+ },
2184
+ {
2185
+ "epoch": 1.6349206349206349,
2186
+ "grad_norm": 0.0,
2187
+ "learning_rate": 1.0061930053851954e-05,
2188
+ "loss": 0.0,
2189
+ "step": 206
2190
+ },
2191
+ {
2192
+ "epoch": 1.6428571428571428,
2193
+ "grad_norm": 0.0,
2194
+ "learning_rate": 9.892674621368259e-06,
2195
+ "loss": 0.0,
2196
+ "step": 207
2197
+ },
2198
+ {
2199
+ "epoch": 1.6507936507936507,
2200
+ "grad_norm": 0.0,
2201
+ "learning_rate": 9.72632686914837e-06,
2202
+ "loss": 0.0,
2203
+ "step": 208
2204
+ }
2205
+ ],
2206
+ "logging_steps": 1,
2207
+ "max_steps": 252,
2208
+ "num_input_tokens_seen": 0,
2209
+ "num_train_epochs": 2,
2210
+ "save_steps": 26,
2211
+ "stateful_callbacks": {
2212
+ "TrainerControl": {
2213
+ "args": {
2214
+ "should_epoch_stop": false,
2215
+ "should_evaluate": false,
2216
+ "should_log": false,
2217
+ "should_save": true,
2218
+ "should_training_stop": false
2219
+ },
2220
+ "attributes": {}
2221
+ }
2222
+ },
2223
+ "total_flos": 0.0,
2224
+ "train_batch_size": 960,
2225
+ "trial_name": null,
2226
+ "trial_params": null
2227
+ }
checkpoint-208/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee09b1db931d465aec016b4dfd4ea584c8c7e7c02278d7fcac63d526c4d30767
3
+ size 5752