bobox commited on
Commit
1961d6c
·
verified ·
1 Parent(s): 14a96b5

Training in progress, step 104, checkpoint

Browse files
checkpoint-104/1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
checkpoint-104/README.md ADDED
@@ -0,0 +1,635 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: bobox/DeBERTa-small-ST-v1-test-step3
3
+ datasets: []
4
+ language: []
5
+ library_name: sentence-transformers
6
+ metrics:
7
+ - pearson_cosine
8
+ - spearman_cosine
9
+ - pearson_manhattan
10
+ - spearman_manhattan
11
+ - pearson_euclidean
12
+ - spearman_euclidean
13
+ - pearson_dot
14
+ - spearman_dot
15
+ - pearson_max
16
+ - spearman_max
17
+ pipeline_tag: sentence-similarity
18
+ tags:
19
+ - sentence-transformers
20
+ - sentence-similarity
21
+ - feature-extraction
22
+ - generated_from_trainer
23
+ - dataset_size:120849
24
+ - loss:CachedGISTEmbedLoss
25
+ widget:
26
+ - source_sentence: '"Today I lost those who for 24 years I called...my family," said
27
+ Enes Kanter of the Oklahoma City Thunder.
28
+
29
+ Turkish President Recep Tayyip Erdogan blames Mr Gulen for inciting a failed coup
30
+ last month and is seeking the cleric''s extradition to Turkey.
31
+
32
+ Mr Gulen, who has a large following, denies being involved in the coup.
33
+
34
+ Kanter''s father, Mehmet, disowned his son in a letter published on Monday by
35
+ Sabah, a pro-government newspaper.
36
+
37
+ Mehmet Kanter wrote his son had been "hypnotised" by the Gulen movement.
38
+
39
+ "With a feeling of shame I apologise to our president and the Turkish people for
40
+ having such a son," the letter said.
41
+
42
+ Q&A on the Gulen movement
43
+
44
+ Mr Gulen is regarded by followers as a spiritual leader and sometimes described
45
+ as Turkey''s second most powerful man.
46
+
47
+ Enes Kanter has been a vocal supporter of Mr Gulen on Twitter.
48
+
49
+ The movement - known in Turkey as Hizmet, or service - runs schools all over Turkey
50
+ and around the world, including in Turkic former Soviet republics, Muslim countries
51
+ such as Pakistan and Western nations including Romania and the US, where it runs
52
+ more than 100 schools.
53
+
54
+ In May 2016, the Turkish government formally declared the Gulen movement a terrorist
55
+ organisation.
56
+
57
+ After the failed coup, suspected Gulen supporters in Turkey were purged in a wave
58
+ of arrests.
59
+
60
+ Western nations have been critical of the government''s response to the coup.
61
+ US officials have said they will extradite Mr Gulen only if Turkey provides evidence.'
62
+ sentences:
63
+ - 'The Thinker | Rodin Museum H. 189 cm ; W. 98 cm ; D. 140 cm S.2838 When conceived
64
+ in 1880 in its original size (approx. 70 cm) as the crowning element of The Gates
65
+ of Hell , seated on the tympanum , The Thinker was entitled The Poet. He represented
66
+ Dante, author of the Divine Comedy which had inspired The Gates, leaning forward
67
+ to observe the circles of Hell, while meditating on his work. The Thinker was
68
+ therefore initially both a being with a tortured body, almost a damned soul, and
69
+ a free-thinking man, determined to transcend his suffering through poetry. The
70
+ pose of this figure owes much to Carpeaux’s Ugolino (1861) and to the seated portrait
71
+ of Lorenzo de’ Medici carved by Michelangelo (1526-31).   While remaining in place
72
+ on the monumental Gates of Hell, The Thinker was exhibited individually in 1888
73
+ and thus became an independent work. Enlarged in 1904, its colossal version proved
74
+ even more popular: this image of a man lost in thought, but whose powerful body
75
+ suggests a great capacity for action, has became one of the most celebrated sculptures
76
+ ever known. Numerous casts exist worldwide, including the one now in the gardens
77
+ of the Musée Rodin, a gift to the City of Paris installed outside the Panthéon
78
+ in 1906, and another in the gardens of Rodin’s house in Meudon, on the tomb of
79
+ the sculptor and his wife. George Bernard Shaw in the Pose of "The Thinker" Rodin,
80
+ the Monument to Victor Hugo and The Thinker Rodin''s "Thinker" in Dr Linde''s
81
+ Garden in Lübeck'
82
+ - An American basketball player has cut ties with his Turkish family over his support
83
+ for Pennsylvania-based preacher Fethullah Gulen.
84
+ - Police are investigating a death at a bus stop in Fife.
85
+ - source_sentence: Two adorable birds perched on a piece of bamboo.
86
+ sentences:
87
+ - Two birds are sitting perched on a tree limb
88
+ - A young boy with a spoon looking at a birthday cupcake.
89
+ - As part of his attempt to turn the Austrian right , Dessaix ordered a battalion
90
+ to move along the Aire stream near Tairier and Crache .
91
+ - source_sentence: how do venom snake keepers make money?
92
+ sentences:
93
+ - "The USDA regulates who can buy and sell snake venom. It is very important to\
94
+ \ learn about these regulations so that you can operate properly. On average,\
95
+ \ snake milkers make around $2,500 per month, but snake venom is an expensive\
96
+ \ market. One gram of certain types of snake venom can sell for $2,000.If you\
97
+ \ are crazy enough to capture, milk, and breed snakes, please take the precaution\
98
+ \ to wear protective clothing and always have antivenom close at hand.nake milkers\
99
+ \ have an insane job. They â\x80\x9Cmilkâ\x80\x9D snakes for their venom. This\
100
+ \ means that every single day, a snake milker handles deadly, venomous snakes.\
101
+ \ Itâ\x80\x99s a hands on job where you put your fingers millimeters away from\
102
+ \ the sharp, fangs of asps, vipers, cobras, corals, mambas, kraits, and rattlesnakes."
103
+ - a greenhouse is used to protect plants by keeping them warm
104
+ - Nashville Mayor Megan Barry has said her 22-year-old son died of what appeared
105
+ to be a drug overdose, according to a family statement.
106
+ - source_sentence: Adult bees include workers, a queen and what other type?
107
+ sentences:
108
+ - "matter vibrating can cause sound. Thus, sound is a wave in air . \n matter vibrating\
109
+ \ can cause a wave in air"
110
+ - His references in electronic music are Todd Terry , Armand Van Helden , Roger
111
+ Sanchez , Tiesto and the Epic Sax Guy.
112
+ - 'Look at the honeybees in Figure below . Honeybees live in colonies that may consist
113
+ of thousands of individual bees. Generally, there are three types of adult bees
114
+ in a colony: workers, a queen, and drones.'
115
+ - source_sentence: can an object have constant non zero velocity and changing acceleration?
116
+ sentences:
117
+ - when an animal sheds its fur , its fur becomes less dense
118
+ - Acceleration is defined as the time derivative of the velocity; if the velocity
119
+ is unchanging the acceleration is zero. Velocity is a vector, speed is a scalar
120
+ magnitude of the vector. If the velocity vector changes direction you can have
121
+ constant speed (not velocity) with a non-zero acceleration.
122
+ - Acne treatment is individual and customized to the type of acne you have. On average,
123
+ mild acne responds in 1-2 months, moderate acne responds in 2-4 months and severe
124
+ acne can take 4-6 months to clear, granted that the most effective measures can
125
+ be used.
126
+ model-index:
127
+ - name: SentenceTransformer based on bobox/DeBERTa-small-ST-v1-test-step3
128
+ results:
129
+ - task:
130
+ type: semantic-similarity
131
+ name: Semantic Similarity
132
+ dataset:
133
+ name: sts test
134
+ type: sts-test
135
+ metrics:
136
+ - type: pearson_cosine
137
+ value: 0.8759700287215791
138
+ name: Pearson Cosine
139
+ - type: spearman_cosine
140
+ value: 0.9069411909499396
141
+ name: Spearman Cosine
142
+ - type: pearson_manhattan
143
+ value: 0.9086607224369581
144
+ name: Pearson Manhattan
145
+ - type: spearman_manhattan
146
+ value: 0.907642129957113
147
+ name: Spearman Manhattan
148
+ - type: pearson_euclidean
149
+ value: 0.9085309767938886
150
+ name: Pearson Euclidean
151
+ - type: spearman_euclidean
152
+ value: 0.9080629260679763
153
+ name: Spearman Euclidean
154
+ - type: pearson_dot
155
+ value: 0.8604257798750153
156
+ name: Pearson Dot
157
+ - type: spearman_dot
158
+ value: 0.8719651713405704
159
+ name: Spearman Dot
160
+ - type: pearson_max
161
+ value: 0.9086607224369581
162
+ name: Pearson Max
163
+ - type: spearman_max
164
+ value: 0.9080629260679763
165
+ name: Spearman Max
166
+ ---
167
+
168
+ # SentenceTransformer based on bobox/DeBERTa-small-ST-v1-test-step3
169
+
170
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [bobox/DeBERTa-small-ST-v1-test-step3](https://huggingface.co/bobox/DeBERTa-small-ST-v1-test-step3) on the bobox/enhanced_nli-50_k dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
171
+
172
+ ## Model Details
173
+
174
+ ### Model Description
175
+ - **Model Type:** Sentence Transformer
176
+ - **Base model:** [bobox/DeBERTa-small-ST-v1-test-step3](https://huggingface.co/bobox/DeBERTa-small-ST-v1-test-step3) <!-- at revision df9aaa75fe0c2791e5ed35ff33de1689d9a5f5ff -->
177
+ - **Maximum Sequence Length:** 512 tokens
178
+ - **Output Dimensionality:** 768 tokens
179
+ - **Similarity Function:** Cosine Similarity
180
+ - **Training Dataset:**
181
+ - bobox/enhanced_nli-50_k
182
+ <!-- - **Language:** Unknown -->
183
+ <!-- - **License:** Unknown -->
184
+
185
+ ### Model Sources
186
+
187
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
188
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
189
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
190
+
191
+ ### Full Model Architecture
192
+
193
+ ```
194
+ SentenceTransformer(
195
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model
196
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
197
+ )
198
+ ```
199
+
200
+ ## Usage
201
+
202
+ ### Direct Usage (Sentence Transformers)
203
+
204
+ First install the Sentence Transformers library:
205
+
206
+ ```bash
207
+ pip install -U sentence-transformers
208
+ ```
209
+
210
+ Then you can load this model and run inference.
211
+ ```python
212
+ from sentence_transformers import SentenceTransformer
213
+
214
+ # Download from the 🤗 Hub
215
+ model = SentenceTransformer("bobox/DeBERTa-small-ST-v1-test-UnifiedDatasets-checkpoints-tmp")
216
+ # Run inference
217
+ sentences = [
218
+ 'can an object have constant non zero velocity and changing acceleration?',
219
+ 'Acceleration is defined as the time derivative of the velocity; if the velocity is unchanging the acceleration is zero. Velocity is a vector, speed is a scalar magnitude of the vector. If the velocity vector changes direction you can have constant speed (not velocity) with a non-zero acceleration.',
220
+ 'Acne treatment is individual and customized to the type of acne you have. On average, mild acne responds in 1-2 months, moderate acne responds in 2-4 months and severe acne can take 4-6 months to clear, granted that the most effective measures can be used.',
221
+ ]
222
+ embeddings = model.encode(sentences)
223
+ print(embeddings.shape)
224
+ # [3, 768]
225
+
226
+ # Get the similarity scores for the embeddings
227
+ similarities = model.similarity(embeddings, embeddings)
228
+ print(similarities.shape)
229
+ # [3, 3]
230
+ ```
231
+
232
+ <!--
233
+ ### Direct Usage (Transformers)
234
+
235
+ <details><summary>Click to see the direct usage in Transformers</summary>
236
+
237
+ </details>
238
+ -->
239
+
240
+ <!--
241
+ ### Downstream Usage (Sentence Transformers)
242
+
243
+ You can finetune this model on your own dataset.
244
+
245
+ <details><summary>Click to expand</summary>
246
+
247
+ </details>
248
+ -->
249
+
250
+ <!--
251
+ ### Out-of-Scope Use
252
+
253
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
254
+ -->
255
+
256
+ ## Evaluation
257
+
258
+ ### Metrics
259
+
260
+ #### Semantic Similarity
261
+ * Dataset: `sts-test`
262
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
263
+
264
+ | Metric | Value |
265
+ |:--------------------|:-----------|
266
+ | pearson_cosine | 0.876 |
267
+ | **spearman_cosine** | **0.9069** |
268
+ | pearson_manhattan | 0.9087 |
269
+ | spearman_manhattan | 0.9076 |
270
+ | pearson_euclidean | 0.9085 |
271
+ | spearman_euclidean | 0.9081 |
272
+ | pearson_dot | 0.8604 |
273
+ | spearman_dot | 0.872 |
274
+ | pearson_max | 0.9087 |
275
+ | spearman_max | 0.9081 |
276
+
277
+ <!--
278
+ ## Bias, Risks and Limitations
279
+
280
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
281
+ -->
282
+
283
+ <!--
284
+ ### Recommendations
285
+
286
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
287
+ -->
288
+
289
+ ## Training Details
290
+
291
+ ### Training Dataset
292
+
293
+ #### bobox/enhanced_nli-50_k
294
+
295
+ * Dataset: bobox/enhanced_nli-50_k
296
+ * Size: 120,849 training samples
297
+ * Columns: <code>sentence1</code> and <code>sentence2</code>
298
+ * Approximate statistics based on the first 1000 samples:
299
+ | | sentence1 | sentence2 |
300
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
301
+ | type | string | string |
302
+ | details | <ul><li>min: 4 tokens</li><li>mean: 32.01 tokens</li><li>max: 336 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 60.45 tokens</li><li>max: 512 tokens</li></ul> |
303
+ * Samples:
304
+ | sentence1 | sentence2 |
305
+ |:---------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
306
+ | <code>A lady working in a kitchen with several different types of dishes.</code> | <code>A woman is cooking and cleaning in her kitchen.</code> |
307
+ | <code>can you renew your licence online sa?</code> | <code>You can renew your licence online for as long as your photo is valid. Renew your driver's licence online with a mySA GOV account. With a mySA GOV account, you can access a legally compliant digital licence through the mySA GOV app.</code> |
308
+ | <code>how can coconut oil lower cholesterol</code> | <code>It has been shown that lauric acid increases the good HDL cholesterol in the blood to help improve cholesterol ratio levels. Coconut oil lowers cholesterol by promoting its conversion to pregnenolone, a molecule that is a precursor to many of the hormones our bodies need. Coconut can help restore normal thyroid function. When the thyroid does not function optimally, it can contribute to higher levels of bad cholesterol.</code> |
309
+ * Loss: [<code>CachedGISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedgistembedloss) with these parameters:
310
+ ```json
311
+ {'guide': SentenceTransformer(
312
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
313
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
314
+ (2): Normalize()
315
+ ), 'temperature': 0.025}
316
+ ```
317
+
318
+ ### Evaluation Dataset
319
+
320
+ #### bobox/enhanced_nli-50_k
321
+
322
+ * Dataset: bobox/enhanced_nli-50_k
323
+ * Size: 3,052 evaluation samples
324
+ * Columns: <code>sentence1</code> and <code>sentence2</code>
325
+ * Approximate statistics based on the first 1000 samples:
326
+ | | sentence1 | sentence2 |
327
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
328
+ | type | string | string |
329
+ | details | <ul><li>min: 4 tokens</li><li>mean: 32.91 tokens</li><li>max: 342 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 60.3 tokens</li><li>max: 408 tokens</li></ul> |
330
+ * Samples:
331
+ | sentence1 | sentence2 |
332
+ |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
333
+ | <code>The body was found in the River Avon in Bath, Avon and Somerset Police said.<br>Officers said although formal identification had not yet taken place, Henry Burke's family had been told.<br>Earlier officers said they were looking for Mr Burke, who was last seen leaving a nightclub in George Street late on Thursday.<br>A force spokesman said the death was being treated as unexplained and inquiries were continuing.<br>Mr Burke's girlfriend, Em Comley, earlier said he had been texting her "throughout the night" but then the messages suddenly stopped just after midnight.</code> | <code>A man's body has been found in a river after search and rescue teams were called in to try and find a missing 19-year-old student.</code> |
334
+ | <code>what happens when the president of united states is impeached?</code> | <code>Parliament votes on the proposal by secret ballot, and if two thirds of all representatives agree, the president is impeached. Once impeached, the president's powers are suspended, and the Constitutional Court decides whether or not the President should be removed from office.</code> |
335
+ | <code>What can feed at more than one trophic level?</code> | <code>Many consumers feed at more than one trophic level.. Nuts are also consumed by deer, turkey, foxes, wood ducks and squirrels. <br> wood ducks can feed at more than one trophic level</code> |
336
+ * Loss: [<code>CachedGISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedgistembedloss) with these parameters:
337
+ ```json
338
+ {'guide': SentenceTransformer(
339
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
340
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
341
+ (2): Normalize()
342
+ ), 'temperature': 0.025}
343
+ ```
344
+
345
+ ### Training Hyperparameters
346
+ #### Non-Default Hyperparameters
347
+
348
+ - `eval_strategy`: steps
349
+ - `per_device_train_batch_size`: 960
350
+ - `per_device_eval_batch_size`: 128
351
+ - `learning_rate`: 3.5e-05
352
+ - `weight_decay`: 0.0001
353
+ - `num_train_epochs`: 2
354
+ - `lr_scheduler_type`: cosine_with_min_lr
355
+ - `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 5.833333333333333e-06}
356
+ - `warmup_ratio`: 0.25
357
+ - `save_safetensors`: False
358
+ - `fp16`: True
359
+ - `push_to_hub`: True
360
+ - `hub_model_id`: bobox/DeBERTa-small-ST-v1-test-UnifiedDatasets-checkpoints-tmp
361
+ - `hub_strategy`: all_checkpoints
362
+ - `batch_sampler`: no_duplicates
363
+
364
+ #### All Hyperparameters
365
+ <details><summary>Click to expand</summary>
366
+
367
+ - `overwrite_output_dir`: False
368
+ - `do_predict`: False
369
+ - `eval_strategy`: steps
370
+ - `prediction_loss_only`: True
371
+ - `per_device_train_batch_size`: 960
372
+ - `per_device_eval_batch_size`: 128
373
+ - `per_gpu_train_batch_size`: None
374
+ - `per_gpu_eval_batch_size`: None
375
+ - `gradient_accumulation_steps`: 1
376
+ - `eval_accumulation_steps`: None
377
+ - `torch_empty_cache_steps`: None
378
+ - `learning_rate`: 3.5e-05
379
+ - `weight_decay`: 0.0001
380
+ - `adam_beta1`: 0.9
381
+ - `adam_beta2`: 0.999
382
+ - `adam_epsilon`: 1e-08
383
+ - `max_grad_norm`: 1.0
384
+ - `num_train_epochs`: 2
385
+ - `max_steps`: -1
386
+ - `lr_scheduler_type`: cosine_with_min_lr
387
+ - `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 5.833333333333333e-06}
388
+ - `warmup_ratio`: 0.25
389
+ - `warmup_steps`: 0
390
+ - `log_level`: passive
391
+ - `log_level_replica`: warning
392
+ - `log_on_each_node`: True
393
+ - `logging_nan_inf_filter`: True
394
+ - `save_safetensors`: False
395
+ - `save_on_each_node`: False
396
+ - `save_only_model`: False
397
+ - `restore_callback_states_from_checkpoint`: False
398
+ - `no_cuda`: False
399
+ - `use_cpu`: False
400
+ - `use_mps_device`: False
401
+ - `seed`: 42
402
+ - `data_seed`: None
403
+ - `jit_mode_eval`: False
404
+ - `use_ipex`: False
405
+ - `bf16`: False
406
+ - `fp16`: True
407
+ - `fp16_opt_level`: O1
408
+ - `half_precision_backend`: auto
409
+ - `bf16_full_eval`: False
410
+ - `fp16_full_eval`: False
411
+ - `tf32`: None
412
+ - `local_rank`: 0
413
+ - `ddp_backend`: None
414
+ - `tpu_num_cores`: None
415
+ - `tpu_metrics_debug`: False
416
+ - `debug`: []
417
+ - `dataloader_drop_last`: False
418
+ - `dataloader_num_workers`: 0
419
+ - `dataloader_prefetch_factor`: None
420
+ - `past_index`: -1
421
+ - `disable_tqdm`: False
422
+ - `remove_unused_columns`: True
423
+ - `label_names`: None
424
+ - `load_best_model_at_end`: False
425
+ - `ignore_data_skip`: False
426
+ - `fsdp`: []
427
+ - `fsdp_min_num_params`: 0
428
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
429
+ - `fsdp_transformer_layer_cls_to_wrap`: None
430
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
431
+ - `deepspeed`: None
432
+ - `label_smoothing_factor`: 0.0
433
+ - `optim`: adamw_torch
434
+ - `optim_args`: None
435
+ - `adafactor`: False
436
+ - `group_by_length`: False
437
+ - `length_column_name`: length
438
+ - `ddp_find_unused_parameters`: None
439
+ - `ddp_bucket_cap_mb`: None
440
+ - `ddp_broadcast_buffers`: False
441
+ - `dataloader_pin_memory`: True
442
+ - `dataloader_persistent_workers`: False
443
+ - `skip_memory_metrics`: True
444
+ - `use_legacy_prediction_loop`: False
445
+ - `push_to_hub`: True
446
+ - `resume_from_checkpoint`: None
447
+ - `hub_model_id`: bobox/DeBERTa-small-ST-v1-test-UnifiedDatasets-checkpoints-tmp
448
+ - `hub_strategy`: all_checkpoints
449
+ - `hub_private_repo`: False
450
+ - `hub_always_push`: False
451
+ - `gradient_checkpointing`: False
452
+ - `gradient_checkpointing_kwargs`: None
453
+ - `include_inputs_for_metrics`: False
454
+ - `eval_do_concat_batches`: True
455
+ - `fp16_backend`: auto
456
+ - `push_to_hub_model_id`: None
457
+ - `push_to_hub_organization`: None
458
+ - `mp_parameters`:
459
+ - `auto_find_batch_size`: False
460
+ - `full_determinism`: False
461
+ - `torchdynamo`: None
462
+ - `ray_scope`: last
463
+ - `ddp_timeout`: 1800
464
+ - `torch_compile`: False
465
+ - `torch_compile_backend`: None
466
+ - `torch_compile_mode`: None
467
+ - `dispatch_batches`: None
468
+ - `split_batches`: None
469
+ - `include_tokens_per_second`: False
470
+ - `include_num_input_tokens_seen`: False
471
+ - `neftune_noise_alpha`: None
472
+ - `optim_target_modules`: None
473
+ - `batch_eval_metrics`: False
474
+ - `eval_on_start`: False
475
+ - `eval_use_gather_object`: False
476
+ - `batch_sampler`: no_duplicates
477
+ - `multi_dataset_batch_sampler`: proportional
478
+
479
+ </details>
480
+
481
+ ### Training Logs
482
+ <details><summary>Click to expand</summary>
483
+
484
+ | Epoch | Step | Training Loss | loss | sts-test_spearman_cosine |
485
+ |:------:|:----:|:-------------:|:------:|:------------------------:|
486
+ | 0.0079 | 1 | 0.404 | - | - |
487
+ | 0.0159 | 2 | 0.3185 | - | - |
488
+ | 0.0238 | 3 | 0.2821 | - | - |
489
+ | 0.0317 | 4 | 0.4036 | - | - |
490
+ | 0.0397 | 5 | 0.3442 | 0.1253 | 0.9078 |
491
+ | 0.0476 | 6 | 0.4145 | - | - |
492
+ | 0.0556 | 7 | 0.4224 | - | - |
493
+ | 0.0635 | 8 | 0.4048 | - | - |
494
+ | 0.0714 | 9 | 0.3899 | - | - |
495
+ | 0.0794 | 10 | 0.4127 | 0.1237 | 0.9079 |
496
+ | 0.0873 | 11 | 0.3496 | - | - |
497
+ | 0.0952 | 12 | 0.3731 | - | - |
498
+ | 0.1032 | 13 | 0.3929 | - | - |
499
+ | 0.1111 | 14 | 0.2957 | - | - |
500
+ | 0.1190 | 15 | 0.3324 | 0.1206 | 0.9083 |
501
+ | 0.1270 | 16 | 0.3341 | - | - |
502
+ | 0.1349 | 17 | 0.3466 | - | - |
503
+ | 0.1429 | 18 | 0.3558 | - | - |
504
+ | 0.1508 | 19 | 0.2634 | - | - |
505
+ | 0.1587 | 20 | 0.3095 | 0.1156 | 0.9088 |
506
+ | 0.1667 | 21 | 0.2973 | - | - |
507
+ | 0.1746 | 22 | 0.2884 | - | - |
508
+ | 0.1825 | 23 | 0.3697 | - | - |
509
+ | 0.1905 | 24 | 0.2683 | - | - |
510
+ | 0.1984 | 25 | 0.3026 | 0.1096 | 0.9088 |
511
+ | 0.2063 | 26 | 0.2441 | - | - |
512
+ | 0.2143 | 27 | 0.3145 | - | - |
513
+ | 0.2222 | 28 | 0.3119 | - | - |
514
+ | 0.2302 | 29 | 0.2766 | - | - |
515
+ | 0.2381 | 30 | 0.3343 | 0.1054 | 0.9084 |
516
+ | 0.2460 | 31 | 0.344 | - | - |
517
+ | 0.2540 | 32 | 0.3005 | - | - |
518
+ | 0.2619 | 33 | 0.2526 | - | - |
519
+ | 0.2698 | 34 | 0.2422 | - | - |
520
+ | 0.2778 | 35 | 0.3447 | 0.1022 | 0.9072 |
521
+ | 0.2857 | 36 | 0.2809 | - | - |
522
+ | 0.2937 | 37 | 0.2836 | - | - |
523
+ | 0.3016 | 38 | 0.2878 | - | - |
524
+ | 0.3095 | 39 | 0.2738 | - | - |
525
+ | 0.3175 | 40 | 0.2806 | 0.1003 | 0.9065 |
526
+ | 0.3254 | 41 | 0.2797 | - | - |
527
+ | 0.3333 | 42 | 0.3217 | - | - |
528
+ | 0.3413 | 43 | 0.2544 | - | - |
529
+ | 0.3492 | 44 | 0.3203 | - | - |
530
+ | 0.3571 | 45 | 0.2987 | 0.0990 | 0.9064 |
531
+ | 0.3651 | 46 | 0.2765 | - | - |
532
+ | 0.3730 | 47 | 0.2716 | - | - |
533
+ | 0.3810 | 48 | 0.3726 | - | - |
534
+ | 0.3889 | 49 | 0.2963 | - | - |
535
+ | 0.3968 | 50 | 0.2784 | 0.0952 | 0.9072 |
536
+ | 0.4048 | 51 | 0.2437 | - | - |
537
+ | 0.4127 | 52 | 0.2258 | - | - |
538
+ | 0.4206 | 53 | 0.2821 | - | - |
539
+ | 0.4286 | 54 | 0.249 | - | - |
540
+ | 0.4365 | 55 | 0.2813 | 0.0928 | 0.9080 |
541
+ | 0.4444 | 56 | 0.3003 | - | - |
542
+ | 0.4524 | 57 | 0.2812 | - | - |
543
+ | 0.4603 | 58 | 0.2619 | - | - |
544
+ | 0.4683 | 59 | 0.299 | - | - |
545
+ | 0.4762 | 60 | 0.2706 | 0.0927 | 0.9088 |
546
+ | 0.4841 | 61 | 0.297 | - | - |
547
+ | 0.4921 | 62 | 0.2906 | - | - |
548
+ | 0.5 | 63 | 0.2914 | - | - |
549
+ | 0.5079 | 64 | 0.2669 | - | - |
550
+ | 0.5159 | 65 | 0.2723 | 0.0946 | 0.9093 |
551
+ | 0.5238 | 66 | 0.3194 | - | - |
552
+ | 0.5317 | 67 | 0.3585 | - | - |
553
+ | 0.5397 | 68 | 0.2843 | - | - |
554
+ | 0.5476 | 69 | 0.1916 | - | - |
555
+ | 0.5556 | 70 | 0.351 | 0.0971 | 0.9104 |
556
+ | 0.5635 | 71 | 0.3105 | - | - |
557
+ | 0.5714 | 72 | 0.2847 | - | - |
558
+ | 0.5794 | 73 | 0.2641 | - | - |
559
+ | 0.5873 | 74 | 0.3305 | - | - |
560
+ | 0.5952 | 75 | 0.2461 | 0.0965 | 0.9096 |
561
+ | 0.6032 | 76 | 0.259 | - | - |
562
+ | 0.6111 | 77 | 0.2506 | - | - |
563
+ | 0.6190 | 78 | 0.2832 | - | - |
564
+ | 0.6270 | 79 | 0.3322 | - | - |
565
+ | 0.6349 | 80 | 0.2533 | 0.1001 | 0.9089 |
566
+ | 0.6429 | 81 | 0.2349 | - | - |
567
+ | 0.6508 | 82 | 0.2748 | - | - |
568
+ | 0.6587 | 83 | 0.223 | - | - |
569
+ | 0.6667 | 84 | 0.2416 | - | - |
570
+ | 0.6746 | 85 | 0.2637 | 0.1034 | 0.9082 |
571
+ | 0.6825 | 86 | 0.2856 | - | - |
572
+ | 0.6905 | 87 | 0.2476 | - | - |
573
+ | 0.6984 | 88 | 0.2427 | - | - |
574
+ | 0.7063 | 89 | 0.2614 | - | - |
575
+ | 0.7143 | 90 | 0.26 | 0.1032 | 0.9088 |
576
+ | 0.7222 | 91 | 0.1862 | - | - |
577
+ | 0.7302 | 92 | 0.267 | - | - |
578
+ | 0.7381 | 93 | 0.2175 | - | - |
579
+ | 0.7460 | 94 | 0.2079 | - | - |
580
+ | 0.7540 | 95 | 0.2562 | 0.0999 | 0.9086 |
581
+ | 0.7619 | 96 | 0.2516 | - | - |
582
+ | 0.7698 | 97 | 0.2956 | - | - |
583
+ | 0.7778 | 98 | 0.2733 | - | - |
584
+ | 0.7857 | 99 | 0.2919 | - | - |
585
+ | 0.7937 | 100 | 0.2997 | 0.1032 | 0.9069 |
586
+ | 0.8016 | 101 | 0.2276 | - | - |
587
+ | 0.8095 | 102 | 0.2582 | - | - |
588
+ | 0.8175 | 103 | 0.2559 | - | - |
589
+ | 0.8254 | 104 | 0.2864 | - | - |
590
+
591
+ </details>
592
+
593
+ ### Framework Versions
594
+ - Python: 3.10.14
595
+ - Sentence Transformers: 3.0.1
596
+ - Transformers: 4.44.0
597
+ - PyTorch: 2.4.0
598
+ - Accelerate: 0.33.0
599
+ - Datasets: 2.21.0
600
+ - Tokenizers: 0.19.1
601
+
602
+ ## Citation
603
+
604
+ ### BibTeX
605
+
606
+ #### Sentence Transformers
607
+ ```bibtex
608
+ @inproceedings{reimers-2019-sentence-bert,
609
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
610
+ author = "Reimers, Nils and Gurevych, Iryna",
611
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
612
+ month = "11",
613
+ year = "2019",
614
+ publisher = "Association for Computational Linguistics",
615
+ url = "https://arxiv.org/abs/1908.10084",
616
+ }
617
+ ```
618
+
619
+ <!--
620
+ ## Glossary
621
+
622
+ *Clearly define terms in order to be accessible across audiences.*
623
+ -->
624
+
625
+ <!--
626
+ ## Model Card Authors
627
+
628
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
629
+ -->
630
+
631
+ <!--
632
+ ## Model Card Contact
633
+
634
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
635
+ -->
checkpoint-104/added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "[MASK]": 128000
3
+ }
checkpoint-104/config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "bobox/DeBERTa-small-ST-v1-test-step3",
3
+ "architectures": [
4
+ "DebertaV2Model"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 768,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 3072,
12
+ "layer_norm_eps": 1e-07,
13
+ "max_position_embeddings": 512,
14
+ "max_relative_positions": -1,
15
+ "model_type": "deberta-v2",
16
+ "norm_rel_ebd": "layer_norm",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 6,
19
+ "pad_token_id": 0,
20
+ "pooler_dropout": 0,
21
+ "pooler_hidden_act": "gelu",
22
+ "pooler_hidden_size": 768,
23
+ "pos_att_type": [
24
+ "p2c",
25
+ "c2p"
26
+ ],
27
+ "position_biased_input": false,
28
+ "position_buckets": 256,
29
+ "relative_attention": true,
30
+ "share_att_key": true,
31
+ "torch_dtype": "float32",
32
+ "transformers_version": "4.44.0",
33
+ "type_vocab_size": 0,
34
+ "vocab_size": 128100
35
+ }
checkpoint-104/config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.44.0",
5
+ "pytorch": "2.4.0"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
checkpoint-104/modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
checkpoint-104/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:19b9ab520ef36fbb227f910d3dcdc7e23a063f3e2d3d9629d527d931c14bdfb8
3
+ size 1130520122
checkpoint-104/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:77bba9a09534e0e255068f41d0e2f4fd9b701e3c1d889645d40ce276e1d0c26f
3
+ size 565251810
checkpoint-104/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8188f4edc12e637b6260f63a37d3e373c8a0e6aa7c6c6451be2e92a2db3cd046
3
+ size 14244
checkpoint-104/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dbf2562bdf39c1f0a353f4a8763e43a81eca3573eb5fafed8c5a7f1adf3cd437
3
+ size 1064
checkpoint-104/sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
checkpoint-104/special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "[CLS]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "[SEP]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "[MASK]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "[PAD]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "[SEP]",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": true,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
checkpoint-104/spm.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c679fbf93643d19aab7ee10c0b99e460bdbc02fedf34b92b05af343b4af586fd
3
+ size 2464616
checkpoint-104/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-104/tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[CLS]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[SEP]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[UNK]",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128000": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "[CLS]",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "[CLS]",
47
+ "do_lower_case": false,
48
+ "eos_token": "[SEP]",
49
+ "mask_token": "[MASK]",
50
+ "max_length": 512,
51
+ "model_max_length": 512,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "[PAD]",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "[SEP]",
57
+ "sp_model_kwargs": {},
58
+ "split_by_punct": false,
59
+ "stride": 0,
60
+ "tokenizer_class": "DebertaV2Tokenizer",
61
+ "truncation_side": "right",
62
+ "truncation_strategy": "longest_first",
63
+ "unk_token": "[UNK]",
64
+ "vocab_type": "spm"
65
+ }
checkpoint-104/trainer_state.json ADDED
@@ -0,0 +1,1121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 0.8253968253968254,
5
+ "eval_steps": 5,
6
+ "global_step": 104,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.007936507936507936,
13
+ "grad_norm": 3.5297670364379883,
14
+ "learning_rate": 5.555555555555555e-07,
15
+ "loss": 0.404,
16
+ "step": 1
17
+ },
18
+ {
19
+ "epoch": 0.015873015873015872,
20
+ "grad_norm": 3.6838796138763428,
21
+ "learning_rate": 1.111111111111111e-06,
22
+ "loss": 0.3185,
23
+ "step": 2
24
+ },
25
+ {
26
+ "epoch": 0.023809523809523808,
27
+ "grad_norm": 3.5556721687316895,
28
+ "learning_rate": 1.6666666666666665e-06,
29
+ "loss": 0.2821,
30
+ "step": 3
31
+ },
32
+ {
33
+ "epoch": 0.031746031746031744,
34
+ "grad_norm": 3.922109842300415,
35
+ "learning_rate": 2.222222222222222e-06,
36
+ "loss": 0.4036,
37
+ "step": 4
38
+ },
39
+ {
40
+ "epoch": 0.03968253968253968,
41
+ "grad_norm": 3.9366657733917236,
42
+ "learning_rate": 2.7777777777777775e-06,
43
+ "loss": 0.3442,
44
+ "step": 5
45
+ },
46
+ {
47
+ "epoch": 0.03968253968253968,
48
+ "eval_loss": 0.12529698014259338,
49
+ "eval_runtime": 113.8002,
50
+ "eval_samples_per_second": 26.819,
51
+ "eval_steps_per_second": 0.211,
52
+ "eval_sts-test_pearson_cosine": 0.886081184413048,
53
+ "eval_sts-test_pearson_dot": 0.8767533438290611,
54
+ "eval_sts-test_pearson_euclidean": 0.9080817963557108,
55
+ "eval_sts-test_pearson_manhattan": 0.9087794191320873,
56
+ "eval_sts-test_pearson_max": 0.9087794191320873,
57
+ "eval_sts-test_spearman_cosine": 0.9077787555581409,
58
+ "eval_sts-test_spearman_dot": 0.8792746633711961,
59
+ "eval_sts-test_spearman_euclidean": 0.9039925750881216,
60
+ "eval_sts-test_spearman_manhattan": 0.904489537845873,
61
+ "eval_sts-test_spearman_max": 0.9077787555581409,
62
+ "step": 5
63
+ },
64
+ {
65
+ "epoch": 0.047619047619047616,
66
+ "grad_norm": 3.8135547637939453,
67
+ "learning_rate": 3.333333333333333e-06,
68
+ "loss": 0.4145,
69
+ "step": 6
70
+ },
71
+ {
72
+ "epoch": 0.05555555555555555,
73
+ "grad_norm": 4.132374286651611,
74
+ "learning_rate": 3.888888888888889e-06,
75
+ "loss": 0.4224,
76
+ "step": 7
77
+ },
78
+ {
79
+ "epoch": 0.06349206349206349,
80
+ "grad_norm": 3.9953386783599854,
81
+ "learning_rate": 4.444444444444444e-06,
82
+ "loss": 0.4048,
83
+ "step": 8
84
+ },
85
+ {
86
+ "epoch": 0.07142857142857142,
87
+ "grad_norm": 4.023675918579102,
88
+ "learning_rate": 4.9999999999999996e-06,
89
+ "loss": 0.3899,
90
+ "step": 9
91
+ },
92
+ {
93
+ "epoch": 0.07936507936507936,
94
+ "grad_norm": 3.854191780090332,
95
+ "learning_rate": 5.555555555555555e-06,
96
+ "loss": 0.4127,
97
+ "step": 10
98
+ },
99
+ {
100
+ "epoch": 0.07936507936507936,
101
+ "eval_loss": 0.12369368970394135,
102
+ "eval_runtime": 113.6707,
103
+ "eval_samples_per_second": 26.849,
104
+ "eval_steps_per_second": 0.211,
105
+ "eval_sts-test_pearson_cosine": 0.8860118050647048,
106
+ "eval_sts-test_pearson_dot": 0.8760605933678182,
107
+ "eval_sts-test_pearson_euclidean": 0.9086480781293332,
108
+ "eval_sts-test_pearson_manhattan": 0.9092897840847158,
109
+ "eval_sts-test_pearson_max": 0.9092897840847158,
110
+ "eval_sts-test_spearman_cosine": 0.9078577415344969,
111
+ "eval_sts-test_spearman_dot": 0.8791339654053815,
112
+ "eval_sts-test_spearman_euclidean": 0.9047648028546915,
113
+ "eval_sts-test_spearman_manhattan": 0.9052383607027356,
114
+ "eval_sts-test_spearman_max": 0.9078577415344969,
115
+ "step": 10
116
+ },
117
+ {
118
+ "epoch": 0.0873015873015873,
119
+ "grad_norm": 3.8079540729522705,
120
+ "learning_rate": 6.11111111111111e-06,
121
+ "loss": 0.3496,
122
+ "step": 11
123
+ },
124
+ {
125
+ "epoch": 0.09523809523809523,
126
+ "grad_norm": 3.929018259048462,
127
+ "learning_rate": 6.666666666666666e-06,
128
+ "loss": 0.3731,
129
+ "step": 12
130
+ },
131
+ {
132
+ "epoch": 0.10317460317460317,
133
+ "grad_norm": 4.284013271331787,
134
+ "learning_rate": 7.222222222222221e-06,
135
+ "loss": 0.3929,
136
+ "step": 13
137
+ },
138
+ {
139
+ "epoch": 0.1111111111111111,
140
+ "grad_norm": 3.3490402698516846,
141
+ "learning_rate": 7.777777777777777e-06,
142
+ "loss": 0.2957,
143
+ "step": 14
144
+ },
145
+ {
146
+ "epoch": 0.11904761904761904,
147
+ "grad_norm": 3.553280830383301,
148
+ "learning_rate": 8.333333333333332e-06,
149
+ "loss": 0.3324,
150
+ "step": 15
151
+ },
152
+ {
153
+ "epoch": 0.11904761904761904,
154
+ "eval_loss": 0.12056715041399002,
155
+ "eval_runtime": 113.718,
156
+ "eval_samples_per_second": 26.838,
157
+ "eval_steps_per_second": 0.211,
158
+ "eval_sts-test_pearson_cosine": 0.8856265458568289,
159
+ "eval_sts-test_pearson_dot": 0.8743050518330721,
160
+ "eval_sts-test_pearson_euclidean": 0.9095228583162331,
161
+ "eval_sts-test_pearson_manhattan": 0.9101600217218586,
162
+ "eval_sts-test_pearson_max": 0.9101600217218586,
163
+ "eval_sts-test_spearman_cosine": 0.908261263658463,
164
+ "eval_sts-test_spearman_dot": 0.87867141636764,
165
+ "eval_sts-test_spearman_euclidean": 0.9060734192402989,
166
+ "eval_sts-test_spearman_manhattan": 0.9066336155303966,
167
+ "eval_sts-test_spearman_max": 0.908261263658463,
168
+ "step": 15
169
+ },
170
+ {
171
+ "epoch": 0.12698412698412698,
172
+ "grad_norm": 3.6310322284698486,
173
+ "learning_rate": 8.888888888888888e-06,
174
+ "loss": 0.3341,
175
+ "step": 16
176
+ },
177
+ {
178
+ "epoch": 0.1349206349206349,
179
+ "grad_norm": 3.6535122394561768,
180
+ "learning_rate": 9.444444444444443e-06,
181
+ "loss": 0.3466,
182
+ "step": 17
183
+ },
184
+ {
185
+ "epoch": 0.14285714285714285,
186
+ "grad_norm": 3.6199331283569336,
187
+ "learning_rate": 9.999999999999999e-06,
188
+ "loss": 0.3558,
189
+ "step": 18
190
+ },
191
+ {
192
+ "epoch": 0.15079365079365079,
193
+ "grad_norm": 3.089895248413086,
194
+ "learning_rate": 1.0555555555555554e-05,
195
+ "loss": 0.2634,
196
+ "step": 19
197
+ },
198
+ {
199
+ "epoch": 0.15873015873015872,
200
+ "grad_norm": 3.320916175842285,
201
+ "learning_rate": 1.111111111111111e-05,
202
+ "loss": 0.3095,
203
+ "step": 20
204
+ },
205
+ {
206
+ "epoch": 0.15873015873015872,
207
+ "eval_loss": 0.11563990265130997,
208
+ "eval_runtime": 113.5377,
209
+ "eval_samples_per_second": 26.881,
210
+ "eval_steps_per_second": 0.211,
211
+ "eval_sts-test_pearson_cosine": 0.8848740042612456,
212
+ "eval_sts-test_pearson_dot": 0.8724689429546052,
213
+ "eval_sts-test_pearson_euclidean": 0.9104294765782397,
214
+ "eval_sts-test_pearson_manhattan": 0.9111381492292419,
215
+ "eval_sts-test_pearson_max": 0.9111381492292419,
216
+ "eval_sts-test_spearman_cosine": 0.9087803335393421,
217
+ "eval_sts-test_spearman_dot": 0.8777188410176626,
218
+ "eval_sts-test_spearman_euclidean": 0.9069791847708608,
219
+ "eval_sts-test_spearman_manhattan": 0.9078148698260838,
220
+ "eval_sts-test_spearman_max": 0.9087803335393421,
221
+ "step": 20
222
+ },
223
+ {
224
+ "epoch": 0.16666666666666666,
225
+ "grad_norm": 3.0193159580230713,
226
+ "learning_rate": 1.1666666666666665e-05,
227
+ "loss": 0.2973,
228
+ "step": 21
229
+ },
230
+ {
231
+ "epoch": 0.1746031746031746,
232
+ "grad_norm": 3.3553476333618164,
233
+ "learning_rate": 1.222222222222222e-05,
234
+ "loss": 0.2884,
235
+ "step": 22
236
+ },
237
+ {
238
+ "epoch": 0.18253968253968253,
239
+ "grad_norm": 3.5176496505737305,
240
+ "learning_rate": 1.2777777777777775e-05,
241
+ "loss": 0.3697,
242
+ "step": 23
243
+ },
244
+ {
245
+ "epoch": 0.19047619047619047,
246
+ "grad_norm": 3.2073943614959717,
247
+ "learning_rate": 1.3333333333333332e-05,
248
+ "loss": 0.2683,
249
+ "step": 24
250
+ },
251
+ {
252
+ "epoch": 0.1984126984126984,
253
+ "grad_norm": 3.2101964950561523,
254
+ "learning_rate": 1.3888888888888886e-05,
255
+ "loss": 0.3026,
256
+ "step": 25
257
+ },
258
+ {
259
+ "epoch": 0.1984126984126984,
260
+ "eval_loss": 0.10958973318338394,
261
+ "eval_runtime": 113.6214,
262
+ "eval_samples_per_second": 26.861,
263
+ "eval_steps_per_second": 0.211,
264
+ "eval_sts-test_pearson_cosine": 0.8832622086480311,
265
+ "eval_sts-test_pearson_dot": 0.8697582354953435,
266
+ "eval_sts-test_pearson_euclidean": 0.9107566690862425,
267
+ "eval_sts-test_pearson_manhattan": 0.9115546986654615,
268
+ "eval_sts-test_pearson_max": 0.9115546986654615,
269
+ "eval_sts-test_spearman_cosine": 0.9087605087305455,
270
+ "eval_sts-test_spearman_dot": 0.8760767382321666,
271
+ "eval_sts-test_spearman_euclidean": 0.9073999361304628,
272
+ "eval_sts-test_spearman_manhattan": 0.9084107328715103,
273
+ "eval_sts-test_spearman_max": 0.9087605087305455,
274
+ "step": 25
275
+ },
276
+ {
277
+ "epoch": 0.20634920634920634,
278
+ "grad_norm": 2.84037709236145,
279
+ "learning_rate": 1.4444444444444442e-05,
280
+ "loss": 0.2441,
281
+ "step": 26
282
+ },
283
+ {
284
+ "epoch": 0.21428571428571427,
285
+ "grad_norm": 3.3099992275238037,
286
+ "learning_rate": 1.4999999999999999e-05,
287
+ "loss": 0.3145,
288
+ "step": 27
289
+ },
290
+ {
291
+ "epoch": 0.2222222222222222,
292
+ "grad_norm": 3.061953067779541,
293
+ "learning_rate": 1.5555555555555555e-05,
294
+ "loss": 0.3119,
295
+ "step": 28
296
+ },
297
+ {
298
+ "epoch": 0.23015873015873015,
299
+ "grad_norm": 3.0163729190826416,
300
+ "learning_rate": 1.6111111111111108e-05,
301
+ "loss": 0.2766,
302
+ "step": 29
303
+ },
304
+ {
305
+ "epoch": 0.23809523809523808,
306
+ "grad_norm": 3.140418291091919,
307
+ "learning_rate": 1.6666666666666664e-05,
308
+ "loss": 0.3343,
309
+ "step": 30
310
+ },
311
+ {
312
+ "epoch": 0.23809523809523808,
313
+ "eval_loss": 0.10535401105880737,
314
+ "eval_runtime": 113.5942,
315
+ "eval_samples_per_second": 26.868,
316
+ "eval_steps_per_second": 0.211,
317
+ "eval_sts-test_pearson_cosine": 0.8819465403802665,
318
+ "eval_sts-test_pearson_dot": 0.866997957398371,
319
+ "eval_sts-test_pearson_euclidean": 0.9110501477101954,
320
+ "eval_sts-test_pearson_manhattan": 0.9119047974126511,
321
+ "eval_sts-test_pearson_max": 0.9119047974126511,
322
+ "eval_sts-test_spearman_cosine": 0.9084358383291508,
323
+ "eval_sts-test_spearman_dot": 0.8727757956894143,
324
+ "eval_sts-test_spearman_euclidean": 0.9077817538926543,
325
+ "eval_sts-test_spearman_manhattan": 0.9089103807049453,
326
+ "eval_sts-test_spearman_max": 0.9089103807049453,
327
+ "step": 30
328
+ },
329
+ {
330
+ "epoch": 0.24603174603174602,
331
+ "grad_norm": 3.1329221725463867,
332
+ "learning_rate": 1.722222222222222e-05,
333
+ "loss": 0.344,
334
+ "step": 31
335
+ },
336
+ {
337
+ "epoch": 0.25396825396825395,
338
+ "grad_norm": 2.9861748218536377,
339
+ "learning_rate": 1.7777777777777777e-05,
340
+ "loss": 0.3005,
341
+ "step": 32
342
+ },
343
+ {
344
+ "epoch": 0.2619047619047619,
345
+ "grad_norm": 2.8316733837127686,
346
+ "learning_rate": 1.8333333333333333e-05,
347
+ "loss": 0.2526,
348
+ "step": 33
349
+ },
350
+ {
351
+ "epoch": 0.2698412698412698,
352
+ "grad_norm": 2.8335487842559814,
353
+ "learning_rate": 1.8888888888888886e-05,
354
+ "loss": 0.2422,
355
+ "step": 34
356
+ },
357
+ {
358
+ "epoch": 0.2777777777777778,
359
+ "grad_norm": 3.0785422325134277,
360
+ "learning_rate": 1.9444444444444442e-05,
361
+ "loss": 0.3447,
362
+ "step": 35
363
+ },
364
+ {
365
+ "epoch": 0.2777777777777778,
366
+ "eval_loss": 0.10223711282014847,
367
+ "eval_runtime": 113.9847,
368
+ "eval_samples_per_second": 26.776,
369
+ "eval_steps_per_second": 0.211,
370
+ "eval_sts-test_pearson_cosine": 0.8812001280334643,
371
+ "eval_sts-test_pearson_dot": 0.8652746969985129,
372
+ "eval_sts-test_pearson_euclidean": 0.9105701789873448,
373
+ "eval_sts-test_pearson_manhattan": 0.9116177887236803,
374
+ "eval_sts-test_pearson_max": 0.9116177887236803,
375
+ "eval_sts-test_spearman_cosine": 0.9072243769320245,
376
+ "eval_sts-test_spearman_dot": 0.8716048789351082,
377
+ "eval_sts-test_spearman_euclidean": 0.9073166540330135,
378
+ "eval_sts-test_spearman_manhattan": 0.9081332302996223,
379
+ "eval_sts-test_spearman_max": 0.9081332302996223,
380
+ "step": 35
381
+ },
382
+ {
383
+ "epoch": 0.2857142857142857,
384
+ "grad_norm": 2.944396734237671,
385
+ "learning_rate": 1.9999999999999998e-05,
386
+ "loss": 0.2809,
387
+ "step": 36
388
+ },
389
+ {
390
+ "epoch": 0.29365079365079366,
391
+ "grad_norm": 2.8323400020599365,
392
+ "learning_rate": 2.0555555555555555e-05,
393
+ "loss": 0.2836,
394
+ "step": 37
395
+ },
396
+ {
397
+ "epoch": 0.30158730158730157,
398
+ "grad_norm": 2.8760273456573486,
399
+ "learning_rate": 2.1111111111111107e-05,
400
+ "loss": 0.2878,
401
+ "step": 38
402
+ },
403
+ {
404
+ "epoch": 0.30952380952380953,
405
+ "grad_norm": 2.744379758834839,
406
+ "learning_rate": 2.1666666666666667e-05,
407
+ "loss": 0.2738,
408
+ "step": 39
409
+ },
410
+ {
411
+ "epoch": 0.31746031746031744,
412
+ "grad_norm": 2.8519983291625977,
413
+ "learning_rate": 2.222222222222222e-05,
414
+ "loss": 0.2806,
415
+ "step": 40
416
+ },
417
+ {
418
+ "epoch": 0.31746031746031744,
419
+ "eval_loss": 0.10033170133829117,
420
+ "eval_runtime": 113.5147,
421
+ "eval_samples_per_second": 26.886,
422
+ "eval_steps_per_second": 0.211,
423
+ "eval_sts-test_pearson_cosine": 0.8802115569848467,
424
+ "eval_sts-test_pearson_dot": 0.8634798448575132,
425
+ "eval_sts-test_pearson_euclidean": 0.9094188438238102,
426
+ "eval_sts-test_pearson_manhattan": 0.9105849471172345,
427
+ "eval_sts-test_pearson_max": 0.9105849471172345,
428
+ "eval_sts-test_spearman_cosine": 0.9064710789490229,
429
+ "eval_sts-test_spearman_dot": 0.8693704037025742,
430
+ "eval_sts-test_spearman_euclidean": 0.9064271779615981,
431
+ "eval_sts-test_spearman_manhattan": 0.9073247092600637,
432
+ "eval_sts-test_spearman_max": 0.9073247092600637,
433
+ "step": 40
434
+ },
435
+ {
436
+ "epoch": 0.3253968253968254,
437
+ "grad_norm": 2.9139747619628906,
438
+ "learning_rate": 2.2777777777777776e-05,
439
+ "loss": 0.2797,
440
+ "step": 41
441
+ },
442
+ {
443
+ "epoch": 0.3333333333333333,
444
+ "grad_norm": 2.9206557273864746,
445
+ "learning_rate": 2.333333333333333e-05,
446
+ "loss": 0.3217,
447
+ "step": 42
448
+ },
449
+ {
450
+ "epoch": 0.3412698412698413,
451
+ "grad_norm": 2.755398988723755,
452
+ "learning_rate": 2.388888888888889e-05,
453
+ "loss": 0.2544,
454
+ "step": 43
455
+ },
456
+ {
457
+ "epoch": 0.3492063492063492,
458
+ "grad_norm": 3.0441982746124268,
459
+ "learning_rate": 2.444444444444444e-05,
460
+ "loss": 0.3203,
461
+ "step": 44
462
+ },
463
+ {
464
+ "epoch": 0.35714285714285715,
465
+ "grad_norm": 2.978891611099243,
466
+ "learning_rate": 2.4999999999999998e-05,
467
+ "loss": 0.2987,
468
+ "step": 45
469
+ },
470
+ {
471
+ "epoch": 0.35714285714285715,
472
+ "eval_loss": 0.09902294725179672,
473
+ "eval_runtime": 113.6912,
474
+ "eval_samples_per_second": 26.845,
475
+ "eval_steps_per_second": 0.211,
476
+ "eval_sts-test_pearson_cosine": 0.8796209948380269,
477
+ "eval_sts-test_pearson_dot": 0.8617122615494917,
478
+ "eval_sts-test_pearson_euclidean": 0.9092272396432914,
479
+ "eval_sts-test_pearson_manhattan": 0.9100341993020892,
480
+ "eval_sts-test_pearson_max": 0.9100341993020892,
481
+ "eval_sts-test_spearman_cosine": 0.9063911531961779,
482
+ "eval_sts-test_spearman_dot": 0.867835166929281,
483
+ "eval_sts-test_spearman_euclidean": 0.9066020658911155,
484
+ "eval_sts-test_spearman_manhattan": 0.9072894005148261,
485
+ "eval_sts-test_spearman_max": 0.9072894005148261,
486
+ "step": 45
487
+ },
488
+ {
489
+ "epoch": 0.36507936507936506,
490
+ "grad_norm": 2.9183595180511475,
491
+ "learning_rate": 2.555555555555555e-05,
492
+ "loss": 0.2765,
493
+ "step": 46
494
+ },
495
+ {
496
+ "epoch": 0.373015873015873,
497
+ "grad_norm": 2.960238456726074,
498
+ "learning_rate": 2.611111111111111e-05,
499
+ "loss": 0.2716,
500
+ "step": 47
501
+ },
502
+ {
503
+ "epoch": 0.38095238095238093,
504
+ "grad_norm": 3.23356294631958,
505
+ "learning_rate": 2.6666666666666663e-05,
506
+ "loss": 0.3726,
507
+ "step": 48
508
+ },
509
+ {
510
+ "epoch": 0.3888888888888889,
511
+ "grad_norm": 2.974705457687378,
512
+ "learning_rate": 2.722222222222222e-05,
513
+ "loss": 0.2963,
514
+ "step": 49
515
+ },
516
+ {
517
+ "epoch": 0.3968253968253968,
518
+ "grad_norm": 2.8041574954986572,
519
+ "learning_rate": 2.7777777777777772e-05,
520
+ "loss": 0.2784,
521
+ "step": 50
522
+ },
523
+ {
524
+ "epoch": 0.3968253968253968,
525
+ "eval_loss": 0.09521521627902985,
526
+ "eval_runtime": 113.6139,
527
+ "eval_samples_per_second": 26.863,
528
+ "eval_steps_per_second": 0.211,
529
+ "eval_sts-test_pearson_cosine": 0.8802451373465323,
530
+ "eval_sts-test_pearson_dot": 0.8609764645232105,
531
+ "eval_sts-test_pearson_euclidean": 0.9103012041260427,
532
+ "eval_sts-test_pearson_manhattan": 0.9108880877390901,
533
+ "eval_sts-test_pearson_max": 0.9108880877390901,
534
+ "eval_sts-test_spearman_cosine": 0.9071928272927434,
535
+ "eval_sts-test_spearman_dot": 0.867374407941995,
536
+ "eval_sts-test_spearman_euclidean": 0.9083242734345022,
537
+ "eval_sts-test_spearman_manhattan": 0.9086424996542565,
538
+ "eval_sts-test_spearman_max": 0.9086424996542565,
539
+ "step": 50
540
+ },
541
+ {
542
+ "epoch": 0.40476190476190477,
543
+ "grad_norm": 2.6451456546783447,
544
+ "learning_rate": 2.8333333333333332e-05,
545
+ "loss": 0.2437,
546
+ "step": 51
547
+ },
548
+ {
549
+ "epoch": 0.4126984126984127,
550
+ "grad_norm": 2.7020044326782227,
551
+ "learning_rate": 2.8888888888888885e-05,
552
+ "loss": 0.2258,
553
+ "step": 52
554
+ },
555
+ {
556
+ "epoch": 0.42063492063492064,
557
+ "grad_norm": 2.7229156494140625,
558
+ "learning_rate": 2.944444444444444e-05,
559
+ "loss": 0.2821,
560
+ "step": 53
561
+ },
562
+ {
563
+ "epoch": 0.42857142857142855,
564
+ "grad_norm": 2.770799398422241,
565
+ "learning_rate": 2.9999999999999997e-05,
566
+ "loss": 0.249,
567
+ "step": 54
568
+ },
569
+ {
570
+ "epoch": 0.4365079365079365,
571
+ "grad_norm": 2.762690305709839,
572
+ "learning_rate": 3.0555555555555554e-05,
573
+ "loss": 0.2813,
574
+ "step": 55
575
+ },
576
+ {
577
+ "epoch": 0.4365079365079365,
578
+ "eval_loss": 0.09280610829591751,
579
+ "eval_runtime": 113.4966,
580
+ "eval_samples_per_second": 26.891,
581
+ "eval_steps_per_second": 0.211,
582
+ "eval_sts-test_pearson_cosine": 0.8804507408393794,
583
+ "eval_sts-test_pearson_dot": 0.8631869703781383,
584
+ "eval_sts-test_pearson_euclidean": 0.9108211341698824,
585
+ "eval_sts-test_pearson_manhattan": 0.9114068237803576,
586
+ "eval_sts-test_pearson_max": 0.9114068237803576,
587
+ "eval_sts-test_spearman_cosine": 0.9079720810073518,
588
+ "eval_sts-test_spearman_dot": 0.8709471248951776,
589
+ "eval_sts-test_spearman_euclidean": 0.9085633794241165,
590
+ "eval_sts-test_spearman_manhattan": 0.9093315348258998,
591
+ "eval_sts-test_spearman_max": 0.9093315348258998,
592
+ "step": 55
593
+ },
594
+ {
595
+ "epoch": 0.4444444444444444,
596
+ "grad_norm": 2.9767086505889893,
597
+ "learning_rate": 3.111111111111111e-05,
598
+ "loss": 0.3003,
599
+ "step": 56
600
+ },
601
+ {
602
+ "epoch": 0.4523809523809524,
603
+ "grad_norm": 2.816253185272217,
604
+ "learning_rate": 3.1666666666666666e-05,
605
+ "loss": 0.2812,
606
+ "step": 57
607
+ },
608
+ {
609
+ "epoch": 0.4603174603174603,
610
+ "grad_norm": 2.5184807777404785,
611
+ "learning_rate": 3.2222222222222216e-05,
612
+ "loss": 0.2619,
613
+ "step": 58
614
+ },
615
+ {
616
+ "epoch": 0.46825396825396826,
617
+ "grad_norm": 2.7500715255737305,
618
+ "learning_rate": 3.277777777777777e-05,
619
+ "loss": 0.299,
620
+ "step": 59
621
+ },
622
+ {
623
+ "epoch": 0.47619047619047616,
624
+ "grad_norm": 2.5309386253356934,
625
+ "learning_rate": 3.333333333333333e-05,
626
+ "loss": 0.2706,
627
+ "step": 60
628
+ },
629
+ {
630
+ "epoch": 0.47619047619047616,
631
+ "eval_loss": 0.09274312108755112,
632
+ "eval_runtime": 113.479,
633
+ "eval_samples_per_second": 26.895,
634
+ "eval_steps_per_second": 0.211,
635
+ "eval_sts-test_pearson_cosine": 0.8814996628266308,
636
+ "eval_sts-test_pearson_dot": 0.8647617194348185,
637
+ "eval_sts-test_pearson_euclidean": 0.9116395612568413,
638
+ "eval_sts-test_pearson_manhattan": 0.9121591417317261,
639
+ "eval_sts-test_pearson_max": 0.9121591417317261,
640
+ "eval_sts-test_spearman_cosine": 0.9087614932582961,
641
+ "eval_sts-test_spearman_dot": 0.8732032149869635,
642
+ "eval_sts-test_spearman_euclidean": 0.9101066714244602,
643
+ "eval_sts-test_spearman_manhattan": 0.9099515188012163,
644
+ "eval_sts-test_spearman_max": 0.9101066714244602,
645
+ "step": 60
646
+ },
647
+ {
648
+ "epoch": 0.48412698412698413,
649
+ "grad_norm": 2.7175261974334717,
650
+ "learning_rate": 3.3888888888888884e-05,
651
+ "loss": 0.297,
652
+ "step": 61
653
+ },
654
+ {
655
+ "epoch": 0.49206349206349204,
656
+ "grad_norm": 2.7492423057556152,
657
+ "learning_rate": 3.444444444444444e-05,
658
+ "loss": 0.2906,
659
+ "step": 62
660
+ },
661
+ {
662
+ "epoch": 0.5,
663
+ "grad_norm": 2.815702438354492,
664
+ "learning_rate": 3.5e-05,
665
+ "loss": 0.2914,
666
+ "step": 63
667
+ },
668
+ {
669
+ "epoch": 0.5079365079365079,
670
+ "grad_norm": 2.9056921005249023,
671
+ "learning_rate": 3.499798538091195e-05,
672
+ "loss": 0.2669,
673
+ "step": 64
674
+ },
675
+ {
676
+ "epoch": 0.5158730158730159,
677
+ "grad_norm": 2.832461357116699,
678
+ "learning_rate": 3.4991942080268184e-05,
679
+ "loss": 0.2723,
680
+ "step": 65
681
+ },
682
+ {
683
+ "epoch": 0.5158730158730159,
684
+ "eval_loss": 0.09455278515815735,
685
+ "eval_runtime": 113.5618,
686
+ "eval_samples_per_second": 26.875,
687
+ "eval_steps_per_second": 0.211,
688
+ "eval_sts-test_pearson_cosine": 0.8827592572843797,
689
+ "eval_sts-test_pearson_dot": 0.8655702748779494,
690
+ "eval_sts-test_pearson_euclidean": 0.9124138196335778,
691
+ "eval_sts-test_pearson_manhattan": 0.9124858955018784,
692
+ "eval_sts-test_pearson_max": 0.9124858955018784,
693
+ "eval_sts-test_spearman_cosine": 0.9092536676310787,
694
+ "eval_sts-test_spearman_dot": 0.87468645079452,
695
+ "eval_sts-test_spearman_euclidean": 0.910149408879089,
696
+ "eval_sts-test_spearman_manhattan": 0.9104867886387189,
697
+ "eval_sts-test_spearman_max": 0.9104867886387189,
698
+ "step": 65
699
+ },
700
+ {
701
+ "epoch": 0.5238095238095238,
702
+ "grad_norm": 2.834491729736328,
703
+ "learning_rate": 3.4981871767775944e-05,
704
+ "loss": 0.3194,
705
+ "step": 66
706
+ },
707
+ {
708
+ "epoch": 0.5317460317460317,
709
+ "grad_norm": 3.168403148651123,
710
+ "learning_rate": 3.496777722576811e-05,
711
+ "loss": 0.3585,
712
+ "step": 67
713
+ },
714
+ {
715
+ "epoch": 0.5396825396825397,
716
+ "grad_norm": 2.8590433597564697,
717
+ "learning_rate": 3.494966234843439e-05,
718
+ "loss": 0.2843,
719
+ "step": 68
720
+ },
721
+ {
722
+ "epoch": 0.5476190476190477,
723
+ "grad_norm": 2.4585649967193604,
724
+ "learning_rate": 3.4927532140745435e-05,
725
+ "loss": 0.1916,
726
+ "step": 69
727
+ },
728
+ {
729
+ "epoch": 0.5555555555555556,
730
+ "grad_norm": 3.0862460136413574,
731
+ "learning_rate": 3.490139271707e-05,
732
+ "loss": 0.351,
733
+ "step": 70
734
+ },
735
+ {
736
+ "epoch": 0.5555555555555556,
737
+ "eval_loss": 0.09706800431013107,
738
+ "eval_runtime": 113.496,
739
+ "eval_samples_per_second": 26.891,
740
+ "eval_steps_per_second": 0.211,
741
+ "eval_sts-test_pearson_cosine": 0.8816183440112817,
742
+ "eval_sts-test_pearson_dot": 0.863407251078466,
743
+ "eval_sts-test_pearson_euclidean": 0.9125994563651346,
744
+ "eval_sts-test_pearson_manhattan": 0.9121928260729458,
745
+ "eval_sts-test_pearson_max": 0.9125994563651346,
746
+ "eval_sts-test_spearman_cosine": 0.9103631836274073,
747
+ "eval_sts-test_spearman_dot": 0.8729154643762167,
748
+ "eval_sts-test_spearman_euclidean": 0.9106339755374351,
749
+ "eval_sts-test_spearman_manhattan": 0.9104940383430642,
750
+ "eval_sts-test_spearman_max": 0.9106339755374351,
751
+ "step": 70
752
+ },
753
+ {
754
+ "epoch": 0.5634920634920635,
755
+ "grad_norm": 2.948397636413574,
756
+ "learning_rate": 3.48712512994856e-05,
757
+ "loss": 0.3105,
758
+ "step": 71
759
+ },
760
+ {
761
+ "epoch": 0.5714285714285714,
762
+ "grad_norm": 2.904085159301758,
763
+ "learning_rate": 3.4837116215783116e-05,
764
+ "loss": 0.2847,
765
+ "step": 72
766
+ },
767
+ {
768
+ "epoch": 0.5793650793650794,
769
+ "grad_norm": 2.6948978900909424,
770
+ "learning_rate": 3.4798996897165926e-05,
771
+ "loss": 0.2641,
772
+ "step": 73
773
+ },
774
+ {
775
+ "epoch": 0.5873015873015873,
776
+ "grad_norm": 3.068554162979126,
777
+ "learning_rate": 3.475690387564411e-05,
778
+ "loss": 0.3305,
779
+ "step": 74
780
+ },
781
+ {
782
+ "epoch": 0.5952380952380952,
783
+ "grad_norm": 2.6903178691864014,
784
+ "learning_rate": 3.471084878112459e-05,
785
+ "loss": 0.2461,
786
+ "step": 75
787
+ },
788
+ {
789
+ "epoch": 0.5952380952380952,
790
+ "eval_loss": 0.09646341949701309,
791
+ "eval_runtime": 113.5342,
792
+ "eval_samples_per_second": 26.882,
793
+ "eval_steps_per_second": 0.211,
794
+ "eval_sts-test_pearson_cosine": 0.879746728283104,
795
+ "eval_sts-test_pearson_dot": 0.85998475002447,
796
+ "eval_sts-test_pearson_euclidean": 0.9117602609729114,
797
+ "eval_sts-test_pearson_manhattan": 0.9111396965114745,
798
+ "eval_sts-test_pearson_max": 0.9117602609729114,
799
+ "eval_sts-test_spearman_cosine": 0.9096228207862964,
800
+ "eval_sts-test_spearman_dot": 0.8689540379665887,
801
+ "eval_sts-test_spearman_euclidean": 0.9099527718365351,
802
+ "eval_sts-test_spearman_manhattan": 0.9098263942743658,
803
+ "eval_sts-test_spearman_max": 0.9099527718365351,
804
+ "step": 75
805
+ },
806
+ {
807
+ "epoch": 0.6031746031746031,
808
+ "grad_norm": 2.81105637550354,
809
+ "learning_rate": 3.4660844338197886e-05,
810
+ "loss": 0.259,
811
+ "step": 76
812
+ },
813
+ {
814
+ "epoch": 0.6111111111111112,
815
+ "grad_norm": 2.629365921020508,
816
+ "learning_rate": 3.460690436262242e-05,
817
+ "loss": 0.2506,
818
+ "step": 77
819
+ },
820
+ {
821
+ "epoch": 0.6190476190476191,
822
+ "grad_norm": 2.6665291786193848,
823
+ "learning_rate": 3.454904375750738e-05,
824
+ "loss": 0.2832,
825
+ "step": 78
826
+ },
827
+ {
828
+ "epoch": 0.626984126984127,
829
+ "grad_norm": 2.916246175765991,
830
+ "learning_rate": 3.448727850919509e-05,
831
+ "loss": 0.3322,
832
+ "step": 79
833
+ },
834
+ {
835
+ "epoch": 0.6349206349206349,
836
+ "grad_norm": 2.4879415035247803,
837
+ "learning_rate": 3.442162568284416e-05,
838
+ "loss": 0.2533,
839
+ "step": 80
840
+ },
841
+ {
842
+ "epoch": 0.6349206349206349,
843
+ "eval_loss": 0.10007175803184509,
844
+ "eval_runtime": 113.3295,
845
+ "eval_samples_per_second": 26.93,
846
+ "eval_steps_per_second": 0.212,
847
+ "eval_sts-test_pearson_cosine": 0.8791063595826033,
848
+ "eval_sts-test_pearson_dot": 0.8594763353424633,
849
+ "eval_sts-test_pearson_euclidean": 0.9109289279488433,
850
+ "eval_sts-test_pearson_manhattan": 0.9101783025650423,
851
+ "eval_sts-test_pearson_max": 0.9109289279488433,
852
+ "eval_sts-test_spearman_cosine": 0.9088725211378084,
853
+ "eval_sts-test_spearman_dot": 0.8680133664521414,
854
+ "eval_sts-test_spearman_euclidean": 0.9091277823327847,
855
+ "eval_sts-test_spearman_manhattan": 0.9091334209917199,
856
+ "eval_sts-test_spearman_max": 0.9091334209917199,
857
+ "step": 80
858
+ },
859
+ {
860
+ "epoch": 0.6428571428571429,
861
+ "grad_norm": 2.6558098793029785,
862
+ "learning_rate": 3.435210341771455e-05,
863
+ "loss": 0.2349,
864
+ "step": 81
865
+ },
866
+ {
867
+ "epoch": 0.6507936507936508,
868
+ "grad_norm": 2.690624475479126,
869
+ "learning_rate": 3.427873092215584e-05,
870
+ "loss": 0.2748,
871
+ "step": 82
872
+ },
873
+ {
874
+ "epoch": 0.6587301587301587,
875
+ "grad_norm": 2.451726198196411,
876
+ "learning_rate": 3.420152846830015e-05,
877
+ "loss": 0.223,
878
+ "step": 83
879
+ },
880
+ {
881
+ "epoch": 0.6666666666666666,
882
+ "grad_norm": 2.6376216411590576,
883
+ "learning_rate": 3.412051738646116e-05,
884
+ "loss": 0.2416,
885
+ "step": 84
886
+ },
887
+ {
888
+ "epoch": 0.6746031746031746,
889
+ "grad_norm": 2.8111939430236816,
890
+ "learning_rate": 3.403572005924071e-05,
891
+ "loss": 0.2637,
892
+ "step": 85
893
+ },
894
+ {
895
+ "epoch": 0.6746031746031746,
896
+ "eval_loss": 0.10335631668567657,
897
+ "eval_runtime": 113.4166,
898
+ "eval_samples_per_second": 26.91,
899
+ "eval_steps_per_second": 0.212,
900
+ "eval_sts-test_pearson_cosine": 0.8779388329417936,
901
+ "eval_sts-test_pearson_dot": 0.8608493769098732,
902
+ "eval_sts-test_pearson_euclidean": 0.9095252832629803,
903
+ "eval_sts-test_pearson_manhattan": 0.9090695197203245,
904
+ "eval_sts-test_pearson_max": 0.9095252832629803,
905
+ "eval_sts-test_spearman_cosine": 0.9082387985252446,
906
+ "eval_sts-test_spearman_dot": 0.8707913010030126,
907
+ "eval_sts-test_spearman_euclidean": 0.9083403391373417,
908
+ "eval_sts-test_spearman_manhattan": 0.9084906586243554,
909
+ "eval_sts-test_spearman_max": 0.9084906586243554,
910
+ "step": 85
911
+ },
912
+ {
913
+ "epoch": 0.6825396825396826,
914
+ "grad_norm": 2.859077215194702,
915
+ "learning_rate": 3.394715991534474e-05,
916
+ "loss": 0.2856,
917
+ "step": 86
918
+ },
919
+ {
920
+ "epoch": 0.6904761904761905,
921
+ "grad_norm": 2.433560371398926,
922
+ "learning_rate": 3.385486142311011e-05,
923
+ "loss": 0.2476,
924
+ "step": 87
925
+ },
926
+ {
927
+ "epoch": 0.6984126984126984,
928
+ "grad_norm": 2.6791834831237793,
929
+ "learning_rate": 3.375885008374425e-05,
930
+ "loss": 0.2427,
931
+ "step": 88
932
+ },
933
+ {
934
+ "epoch": 0.7063492063492064,
935
+ "grad_norm": 2.6574490070343018,
936
+ "learning_rate": 3.365915242427944e-05,
937
+ "loss": 0.2614,
938
+ "step": 89
939
+ },
940
+ {
941
+ "epoch": 0.7142857142857143,
942
+ "grad_norm": 2.5747766494750977,
943
+ "learning_rate": 3.355579599024361e-05,
944
+ "loss": 0.26,
945
+ "step": 90
946
+ },
947
+ {
948
+ "epoch": 0.7142857142857143,
949
+ "eval_loss": 0.10315236449241638,
950
+ "eval_runtime": 113.4044,
951
+ "eval_samples_per_second": 26.913,
952
+ "eval_steps_per_second": 0.212,
953
+ "eval_sts-test_pearson_cosine": 0.8793659111713259,
954
+ "eval_sts-test_pearson_dot": 0.8641308245754843,
955
+ "eval_sts-test_pearson_euclidean": 0.9095961426218309,
956
+ "eval_sts-test_pearson_manhattan": 0.9093977382821561,
957
+ "eval_sts-test_pearson_max": 0.9095961426218309,
958
+ "eval_sts-test_spearman_cosine": 0.9087700407492219,
959
+ "eval_sts-test_spearman_dot": 0.8756799287974091,
960
+ "eval_sts-test_spearman_euclidean": 0.9084703415516837,
961
+ "eval_sts-test_spearman_manhattan": 0.908642678659302,
962
+ "eval_sts-test_spearman_max": 0.9087700407492219,
963
+ "step": 90
964
+ },
965
+ {
966
+ "epoch": 0.7222222222222222,
967
+ "grad_norm": 2.3570423126220703,
968
+ "learning_rate": 3.3448809338049753e-05,
969
+ "loss": 0.1862,
970
+ "step": 91
971
+ },
972
+ {
973
+ "epoch": 0.7301587301587301,
974
+ "grad_norm": 2.486401319503784,
975
+ "learning_rate": 3.333822202710612e-05,
976
+ "loss": 0.267,
977
+ "step": 92
978
+ },
979
+ {
980
+ "epoch": 0.7380952380952381,
981
+ "grad_norm": 2.436018705368042,
982
+ "learning_rate": 3.322406461164916e-05,
983
+ "loss": 0.2175,
984
+ "step": 93
985
+ },
986
+ {
987
+ "epoch": 0.746031746031746,
988
+ "grad_norm": 2.2685282230377197,
989
+ "learning_rate": 3.310636863230172e-05,
990
+ "loss": 0.2079,
991
+ "step": 94
992
+ },
993
+ {
994
+ "epoch": 0.753968253968254,
995
+ "grad_norm": 2.564317464828491,
996
+ "learning_rate": 3.2985166607358637e-05,
997
+ "loss": 0.2562,
998
+ "step": 95
999
+ },
1000
+ {
1001
+ "epoch": 0.753968253968254,
1002
+ "eval_loss": 0.09990089386701584,
1003
+ "eval_runtime": 113.4641,
1004
+ "eval_samples_per_second": 26.898,
1005
+ "eval_steps_per_second": 0.212,
1006
+ "eval_sts-test_pearson_cosine": 0.8795677625682299,
1007
+ "eval_sts-test_pearson_dot": 0.8659349142313639,
1008
+ "eval_sts-test_pearson_euclidean": 0.9099982334792637,
1009
+ "eval_sts-test_pearson_manhattan": 0.9099098081017423,
1010
+ "eval_sts-test_pearson_max": 0.9099982334792637,
1011
+ "eval_sts-test_spearman_cosine": 0.9085842782631862,
1012
+ "eval_sts-test_spearman_dot": 0.8767303751560495,
1013
+ "eval_sts-test_spearman_euclidean": 0.9083879992307234,
1014
+ "eval_sts-test_spearman_manhattan": 0.9084153422514337,
1015
+ "eval_sts-test_spearman_max": 0.9085842782631862,
1016
+ "step": 95
1017
+ },
1018
+ {
1019
+ "epoch": 0.7619047619047619,
1020
+ "grad_norm": 2.594452381134033,
1021
+ "learning_rate": 3.286049202380226e-05,
1022
+ "loss": 0.2516,
1023
+ "step": 96
1024
+ },
1025
+ {
1026
+ "epoch": 0.7698412698412699,
1027
+ "grad_norm": 2.6356875896453857,
1028
+ "learning_rate": 3.273237932805032e-05,
1029
+ "loss": 0.2956,
1030
+ "step": 97
1031
+ },
1032
+ {
1033
+ "epoch": 0.7777777777777778,
1034
+ "grad_norm": 2.6263818740844727,
1035
+ "learning_rate": 3.260086391643865e-05,
1036
+ "loss": 0.2733,
1037
+ "step": 98
1038
+ },
1039
+ {
1040
+ "epoch": 0.7857142857142857,
1041
+ "grad_norm": 2.636934757232666,
1042
+ "learning_rate": 3.246598212544159e-05,
1043
+ "loss": 0.2919,
1044
+ "step": 99
1045
+ },
1046
+ {
1047
+ "epoch": 0.7936507936507936,
1048
+ "grad_norm": 2.710754632949829,
1049
+ "learning_rate": 3.2327771221632486e-05,
1050
+ "loss": 0.2997,
1051
+ "step": 100
1052
+ },
1053
+ {
1054
+ "epoch": 0.7936507936507936,
1055
+ "eval_loss": 0.10318750143051147,
1056
+ "eval_runtime": 113.3236,
1057
+ "eval_samples_per_second": 26.932,
1058
+ "eval_steps_per_second": 0.212,
1059
+ "eval_sts-test_pearson_cosine": 0.8759700287215791,
1060
+ "eval_sts-test_pearson_dot": 0.8604257798750153,
1061
+ "eval_sts-test_pearson_euclidean": 0.9085309767938886,
1062
+ "eval_sts-test_pearson_manhattan": 0.9086607224369581,
1063
+ "eval_sts-test_pearson_max": 0.9086607224369581,
1064
+ "eval_sts-test_spearman_cosine": 0.9069411909499396,
1065
+ "eval_sts-test_spearman_dot": 0.8719651713405704,
1066
+ "eval_sts-test_spearman_euclidean": 0.9080629260679763,
1067
+ "eval_sts-test_spearman_manhattan": 0.907642129957113,
1068
+ "eval_sts-test_spearman_max": 0.9080629260679763,
1069
+ "step": 100
1070
+ },
1071
+ {
1072
+ "epoch": 0.8015873015873016,
1073
+ "grad_norm": 2.516517162322998,
1074
+ "learning_rate": 3.218626939138736e-05,
1075
+ "loss": 0.2276,
1076
+ "step": 101
1077
+ },
1078
+ {
1079
+ "epoch": 0.8095238095238095,
1080
+ "grad_norm": 2.6900599002838135,
1081
+ "learning_rate": 3.204151573033428e-05,
1082
+ "loss": 0.2582,
1083
+ "step": 102
1084
+ },
1085
+ {
1086
+ "epoch": 0.8174603174603174,
1087
+ "grad_norm": 2.5696845054626465,
1088
+ "learning_rate": 3.189355023255171e-05,
1089
+ "loss": 0.2559,
1090
+ "step": 103
1091
+ },
1092
+ {
1093
+ "epoch": 0.8253968253968254,
1094
+ "grad_norm": 2.7647061347961426,
1095
+ "learning_rate": 3.174241377951843e-05,
1096
+ "loss": 0.2864,
1097
+ "step": 104
1098
+ }
1099
+ ],
1100
+ "logging_steps": 1,
1101
+ "max_steps": 252,
1102
+ "num_input_tokens_seen": 0,
1103
+ "num_train_epochs": 2,
1104
+ "save_steps": 26,
1105
+ "stateful_callbacks": {
1106
+ "TrainerControl": {
1107
+ "args": {
1108
+ "should_epoch_stop": false,
1109
+ "should_evaluate": false,
1110
+ "should_log": false,
1111
+ "should_save": true,
1112
+ "should_training_stop": false
1113
+ },
1114
+ "attributes": {}
1115
+ }
1116
+ },
1117
+ "total_flos": 0.0,
1118
+ "train_batch_size": 960,
1119
+ "trial_name": null,
1120
+ "trial_params": null
1121
+ }
checkpoint-104/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee09b1db931d465aec016b4dfd4ea584c8c7e7c02278d7fcac63d526c4d30767
3
+ size 5752