bobox commited on
Commit
4a760c7
·
verified ·
1 Parent(s): c101582

Training in progress, step 328, checkpoint

Browse files
checkpoint-328/1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
checkpoint-328/README.md ADDED
@@ -0,0 +1,852 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: bobox/DeBERTa-small-ST-v1-test-step3
3
+ datasets: []
4
+ language: []
5
+ library_name: sentence-transformers
6
+ metrics:
7
+ - pearson_cosine
8
+ - spearman_cosine
9
+ - pearson_manhattan
10
+ - spearman_manhattan
11
+ - pearson_euclidean
12
+ - spearman_euclidean
13
+ - pearson_dot
14
+ - spearman_dot
15
+ - pearson_max
16
+ - spearman_max
17
+ pipeline_tag: sentence-similarity
18
+ tags:
19
+ - sentence-transformers
20
+ - sentence-similarity
21
+ - feature-extraction
22
+ - generated_from_trainer
23
+ - dataset_size:260034
24
+ - loss:CachedGISTEmbedLoss
25
+ widget:
26
+ - source_sentence: who used to present one man and his dog
27
+ sentences:
28
+ - One Man and His Dog One Man and His Dog is a BBC television series in the United
29
+ Kingdom featuring sheepdog trials, originally presented by Phil Drabble, with
30
+ commentary by Eric Halsall and, later, by Ray Ollerenshaw. It was first aired
31
+ on 17 February 1976 and continues today (since 2013) as a special annual edition
32
+ of Countryfile. In 1994, Robin Page replaced Drabble as the main presenter. Gus
33
+ Dermody took over as commentator until 2012.
34
+ - 'animal adjectives [was: ratto, Ratte, raton] - Google Groups animal adjectives
35
+ [was: ratto, Ratte, raton] Showing 1-9 of 9 messages While trying find the pronunciation
36
+ of the word "munger", I encountered the nearby word    murine [MYOO-ryn] = relating
37
+ to mice or rats    [from Latin _murinus_, which derives from _mus_,    mouse,
38
+ whose genetive form is _muris_] So if you need an adjective to refer to lab rodents
39
+ like _ratto_ or _mausu_, "murine" it is. (I would never have discovered this except
40
+ in an alphabetically arranged dictionary.) There are a lot of animal adjectives
41
+ of this type, such as ovine (sheep), equine (horse), bovine (bull, cow, calf),
42
+ aquiline (eagle), murine (rats and mice).   But what is needed is a way to lookup
43
+ an animal and find what the proper adjective is.  For example, is there an adjective
44
+ form for "goat"? for "seal"? for "elephant"? for "whale"? for "walrus"? By the
45
+ way, I never did find out how "munger" is pronounced; the answer is not found
46
+ in'
47
+ - A boat is docked and filled with bicycles next to a grassy area on a body of water.
48
+ - source_sentence: There were 29 Muslims fatalities in the Cave of the Patriarchs
49
+ massacre .
50
+ sentences:
51
+ - 'Urban Dictionary: Dog and Bone Dog and Bone Cockney rhyming slang for phone -
52
+ the telephone. ''''Pick up the dog and bone now'''' by Brendan April 05, 2003
53
+ Create a mug The Urban Dictionary Mug One side has the word, one side has the
54
+ definition. Microwave and dishwasher safe. Lotsa space for your liquids. Buy the
55
+ t-shirt The Urban Dictionary T-Shirt Smooth, soft, slim fit American Apparel shirt.
56
+ Custom printed. 100% fine jersey cotton, except for heather grey (90% cotton).
57
+ ^Same as above except can be shortened further to ''Dogs'' or just ''dog'' Get
58
+ on the dogs and give us a bell when your ready. by Phaze October 14, 2004'
59
+ - RAF College Cranwell - Local Area Information RAF College Cranwell Local Area
60
+ Information Local Area Information RAF College Cranwell is situated in the North
61
+ Kesteven District Council area in the heart of rural Lincolnshire, 5 miles from
62
+ Sleaford and 14 miles from the City of Lincoln, surrounded by bustling market
63
+ towns, picturesque villages and landscapes steeped in aviation history. Lincolnshire
64
+ is currently home to several operational RAF airfields and was a key location
65
+ during WWII for bomber stations. Museums, memorials, former airfields, heritage
66
+ and visitor centres bear witness to the bravery of the men and women of this time.
67
+ The ancient City of Lincoln dates back at least to Roman times and boasts a spectacular
68
+ Cathedral and Castle area, whilst Sleaford is the home to the National Centre
69
+ for Craft & Design. Please click on the Logo to access website
70
+ - 29 Muslims were killed and more than 100 others wounded . [ Settlers remember
71
+ gunman Goldstein ; Hebron riots continue ] .
72
+ - source_sentence: What requires energy for growth?
73
+ sentences:
74
+ - "an organism requires energy for growth. Fish Fish are the ultimate aquatic organism.\
75
+ \ \n a fish require energy for growth"
76
+ - In August , after the end of the war in June 1902 , Higgins Southampton left the
77
+ `` SSBavarian '' and returned to Cape Town the following month .
78
+ - Rhinestone Cowboy "Rhinestone Cowboy" is a song written by Larry Weiss and most
79
+ famously recorded by American country music singer Glen Campbell. The song enjoyed
80
+ huge popularity with both country and pop audiences when it was released in 1975.
81
+ - source_sentence: Burning wood is used to produce what type of energy?
82
+ sentences:
83
+ - Shawnee Trails Council was formed from the merger of the Four Rivers Council and
84
+ the Audubon Council .
85
+ - A Mercedes parked next to a parking meter on a street.
86
+ - "burning wood is used to produce heat. Heat is kinetic energy. \n burning wood\
87
+ \ is used to produce kinetic energy."
88
+ - source_sentence: As of March , more than 413,000 cases have been confirmed in more
89
+ than 190 countries with more than 107,000 recoveries .
90
+ sentences:
91
+ - As of 24 March , more than 414,000 cases of COVID-19 have been reported in more
92
+ than 190 countries and territories , resulting in more than 18,500 deaths and
93
+ more than 108,000 recoveries .
94
+ - 'Pope Francis makes first visit as head of state to Italy\''s president - YouTube
95
+ Pope Francis makes first visit as head of state to Italy\''s president Want to
96
+ watch this again later? Sign in to add this video to a playlist. Need to report
97
+ the video? Sign in to report inappropriate content. The interactive transcript
98
+ could not be loaded. Loading... Rating is available when the video has been rented.
99
+ This feature is not available right now. Please try again later. Published on
100
+ Nov 14, 2013 Pope Francis stepped out of the Vatican, several hundred feet into
101
+ the heart of Rome, to meet with Italian President Giorgio Napolitano, and the
102
+ country\''s Council of Ministers. . --------------------- Suscríbete al canal:
103
+ http://smarturl.it/RomeReports Visita nuestra web: http://www.romereports.com/
104
+ ROME REPORTS, www.romereports.com, is an independent international TV News Agency
105
+ based in Rome covering the activity of the Pope, the life of the Vatican and current
106
+ social, cultural and religious debates. Reporting on the Catholic Church requires
107
+ proximity to the source, in-depth knowledge of the Institution, and a high standard
108
+ of creativity and technical excellence. As few broadcasters have a permanent correspondent
109
+ in Rome, ROME REPORTS is geared to inform the public and meet the needs of television
110
+ broadcasting companies around the world through daily news packages, weekly newsprograms
111
+ and documentaries. ---------------------'
112
+ - German shepherds and retrievers are commonly used, but the Belgian Malinois has
113
+ proven to be one of the most outstanding working dogs used in military service.
114
+ Around 85 percent of military working dogs are purchased in Germany or the Netherlands,
115
+ where they have been breeding dogs for military purposes for hundreds of years.
116
+ In addition, the Air Force Security Forces Center, Army Veterinary Corps and the
117
+ 341st Training Squadron combine efforts to raise their own dogs; nearly 15 percent
118
+ of all military working dogs are now bred here.
119
+ model-index:
120
+ - name: SentenceTransformer based on bobox/DeBERTa-small-ST-v1-test-step3
121
+ results:
122
+ - task:
123
+ type: semantic-similarity
124
+ name: Semantic Similarity
125
+ dataset:
126
+ name: sts test
127
+ type: sts-test
128
+ metrics:
129
+ - type: pearson_cosine
130
+ value: 0.8788980244871143
131
+ name: Pearson Cosine
132
+ - type: spearman_cosine
133
+ value: 0.9074493862743003
134
+ name: Spearman Cosine
135
+ - type: pearson_manhattan
136
+ value: 0.909351159725011
137
+ name: Pearson Manhattan
138
+ - type: spearman_manhattan
139
+ value: 0.9076191725600193
140
+ name: Spearman Manhattan
141
+ - type: pearson_euclidean
142
+ value: 0.9090675882181183
143
+ name: Pearson Euclidean
144
+ - type: spearman_euclidean
145
+ value: 0.9075559837789346
146
+ name: Spearman Euclidean
147
+ - type: pearson_dot
148
+ value: 0.8616979928109831
149
+ name: Pearson Dot
150
+ - type: spearman_dot
151
+ value: 0.8701774479505067
152
+ name: Spearman Dot
153
+ - type: pearson_max
154
+ value: 0.909351159725011
155
+ name: Pearson Max
156
+ - type: spearman_max
157
+ value: 0.9076191725600193
158
+ name: Spearman Max
159
+ ---
160
+
161
+ # SentenceTransformer based on bobox/DeBERTa-small-ST-v1-test-step3
162
+
163
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [bobox/DeBERTa-small-ST-v1-test-step3](https://huggingface.co/bobox/DeBERTa-small-ST-v1-test-step3) on the bobox/enhanced_nli-50_k dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
164
+
165
+ ## Model Details
166
+
167
+ ### Model Description
168
+ - **Model Type:** Sentence Transformer
169
+ - **Base model:** [bobox/DeBERTa-small-ST-v1-test-step3](https://huggingface.co/bobox/DeBERTa-small-ST-v1-test-step3) <!-- at revision df9aaa75fe0c2791e5ed35ff33de1689d9a5f5ff -->
170
+ - **Maximum Sequence Length:** 512 tokens
171
+ - **Output Dimensionality:** 768 tokens
172
+ - **Similarity Function:** Cosine Similarity
173
+ - **Training Dataset:**
174
+ - bobox/enhanced_nli-50_k
175
+ <!-- - **Language:** Unknown -->
176
+ <!-- - **License:** Unknown -->
177
+
178
+ ### Model Sources
179
+
180
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
181
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
182
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
183
+
184
+ ### Full Model Architecture
185
+
186
+ ```
187
+ SentenceTransformer(
188
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model
189
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
190
+ )
191
+ ```
192
+
193
+ ## Usage
194
+
195
+ ### Direct Usage (Sentence Transformers)
196
+
197
+ First install the Sentence Transformers library:
198
+
199
+ ```bash
200
+ pip install -U sentence-transformers
201
+ ```
202
+
203
+ Then you can load this model and run inference.
204
+ ```python
205
+ from sentence_transformers import SentenceTransformer
206
+
207
+ # Download from the 🤗 Hub
208
+ model = SentenceTransformer("bobox/DeBERTa-small-ST-v1-test-UnifiedDatasets-Ft2-checkpoints-tmp")
209
+ # Run inference
210
+ sentences = [
211
+ 'As of March , more than 413,000 cases have been confirmed in more than 190 countries with more than 107,000 recoveries .',
212
+ 'As of 24 March , more than 414,000 cases of COVID-19 have been reported in more than 190 countries and territories , resulting in more than 18,500 deaths and more than 108,000 recoveries .',
213
+ 'German shepherds and retrievers are commonly used, but the Belgian Malinois has proven to be one of the most outstanding working dogs used in military service. Around 85 percent of military working dogs are purchased in Germany or the Netherlands, where they have been breeding dogs for military purposes for hundreds of years. In addition, the Air Force Security Forces Center, Army Veterinary Corps and the 341st Training Squadron combine efforts to raise their own dogs; nearly 15 percent of all military working dogs are now bred here.',
214
+ ]
215
+ embeddings = model.encode(sentences)
216
+ print(embeddings.shape)
217
+ # [3, 768]
218
+
219
+ # Get the similarity scores for the embeddings
220
+ similarities = model.similarity(embeddings, embeddings)
221
+ print(similarities.shape)
222
+ # [3, 3]
223
+ ```
224
+
225
+ <!--
226
+ ### Direct Usage (Transformers)
227
+
228
+ <details><summary>Click to see the direct usage in Transformers</summary>
229
+
230
+ </details>
231
+ -->
232
+
233
+ <!--
234
+ ### Downstream Usage (Sentence Transformers)
235
+
236
+ You can finetune this model on your own dataset.
237
+
238
+ <details><summary>Click to expand</summary>
239
+
240
+ </details>
241
+ -->
242
+
243
+ <!--
244
+ ### Out-of-Scope Use
245
+
246
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
247
+ -->
248
+
249
+ ## Evaluation
250
+
251
+ ### Metrics
252
+
253
+ #### Semantic Similarity
254
+ * Dataset: `sts-test`
255
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
256
+
257
+ | Metric | Value |
258
+ |:--------------------|:-----------|
259
+ | pearson_cosine | 0.8789 |
260
+ | **spearman_cosine** | **0.9074** |
261
+ | pearson_manhattan | 0.9094 |
262
+ | spearman_manhattan | 0.9076 |
263
+ | pearson_euclidean | 0.9091 |
264
+ | spearman_euclidean | 0.9076 |
265
+ | pearson_dot | 0.8617 |
266
+ | spearman_dot | 0.8702 |
267
+ | pearson_max | 0.9094 |
268
+ | spearman_max | 0.9076 |
269
+
270
+ <!--
271
+ ## Bias, Risks and Limitations
272
+
273
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
274
+ -->
275
+
276
+ <!--
277
+ ### Recommendations
278
+
279
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
280
+ -->
281
+
282
+ ## Training Details
283
+
284
+ ### Training Dataset
285
+
286
+ #### bobox/enhanced_nli-50_k
287
+
288
+ * Dataset: bobox/enhanced_nli-50_k
289
+ * Size: 260,034 training samples
290
+ * Columns: <code>sentence1</code> and <code>sentence2</code>
291
+ * Approximate statistics based on the first 1000 samples:
292
+ | | sentence1 | sentence2 |
293
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
294
+ | type | string | string |
295
+ | details | <ul><li>min: 4 tokens</li><li>mean: 39.12 tokens</li><li>max: 344 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 60.17 tokens</li><li>max: 442 tokens</li></ul> |
296
+ * Samples:
297
+ | sentence1 | sentence2 |
298
+ |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
299
+ | <code>Temple Meads Railway Station is in which English city?</code> | <code>Bristol Temple Meads station roof to be replaced - BBC News BBC News Bristol Temple Meads station roof to be replaced 17 October 2013 Image caption Bristol Temple Meads was designed by Isambard Kingdom Brunel Image caption It will cost Network Rail £15m to replace the station's roof Image caption A pact has been signed to redevelop the station over the next 25 years The entire roof on Bristol Temple Meads railway station is to be replaced. Network Rail says it has secured £15m to carry out maintenance of the roof and install new lighting and cables. The announcement was made as a pact was signed to "significantly transform" the station over the next 25 years. Network Rail, Bristol City Council, the West of England Local Enterprise Partnership, Homes and Communities Agency and English Heritage are supporting the plan. Each has signed the 25-year memorandum of understanding to redevelop the station. Patrick Hallgate, of Network Rail Western, said: "Our plans for Bristol will see the railway significantly transformed by the end of the decade, with more seats, better connections and more frequent services." The railway station was designed by Isambard Kingdom Brunel and opened in 1840.</code> |
300
+ | <code>Where do most of the digestion reactions occur?</code> | <code>Most of the digestion reactions occur in the small intestine.</code> |
301
+ | <code>Sacko, 22, joined Sporting from French top-flight side Bordeaux in 2014, but has so far been limited to playing for the Portuguese club's B team.<br>The former France Under-20 player joined Ligue 2 side Sochaux on loan in February and scored twice in 14 games.<br>He is Leeds' third signing of the transfer window, following the arrivals of Marcus Antonsson and Kyle Bartley.<br>Find all the latest football transfers on our dedicated page.</code> | <code>Leeds have signed Sporting Lisbon forward Hadi Sacko on a season-long loan with a view to a permanent deal.</code> |
302
+ * Loss: [<code>CachedGISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedgistembedloss) with these parameters:
303
+ ```json
304
+ {'guide': SentenceTransformer(
305
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
306
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
307
+ (2): Normalize()
308
+ ), 'temperature': 0.025}
309
+ ```
310
+
311
+ ### Evaluation Dataset
312
+
313
+ #### bobox/enhanced_nli-50_k
314
+
315
+ * Dataset: bobox/enhanced_nli-50_k
316
+ * Size: 1,506 evaluation samples
317
+ * Columns: <code>sentence1</code> and <code>sentence2</code>
318
+ * Approximate statistics based on the first 1000 samples:
319
+ | | sentence1 | sentence2 |
320
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
321
+ | type | string | string |
322
+ | details | <ul><li>min: 3 tokens</li><li>mean: 31.16 tokens</li><li>max: 340 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 62.3 tokens</li><li>max: 455 tokens</li></ul> |
323
+ * Samples:
324
+ | sentence1 | sentence2 |
325
+ |:----------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
326
+ | <code>Interestingly, snakes use their forked tongues to smell.</code> | <code>Snakes use their tongue to smell things.</code> |
327
+ | <code>A voltaic cell generates an electric current through a reaction known as a(n) spontaneous redox.</code> | <code>A voltaic cell uses what type of reaction to generate an electric current</code> |
328
+ | <code>As of March 22 , there were more than 321,000 cases with over 13,600 deaths and more than 96,000 recoveries reported worldwide .</code> | <code>As of 22 March , more than 321,000 cases of COVID-19 have been reported in over 180 countries and territories , resulting in more than 13,600 deaths and 96,000 recoveries .</code> |
329
+ * Loss: [<code>CachedGISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedgistembedloss) with these parameters:
330
+ ```json
331
+ {'guide': SentenceTransformer(
332
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
333
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
334
+ (2): Normalize()
335
+ ), 'temperature': 0.025}
336
+ ```
337
+
338
+ ### Training Hyperparameters
339
+ #### Non-Default Hyperparameters
340
+
341
+ - `eval_strategy`: steps
342
+ - `per_device_train_batch_size`: 320
343
+ - `per_device_eval_batch_size`: 128
344
+ - `learning_rate`: 2e-05
345
+ - `weight_decay`: 0.0001
346
+ - `num_train_epochs`: 1
347
+ - `lr_scheduler_type`: cosine_with_restarts
348
+ - `lr_scheduler_kwargs`: {'num_cycles': 3}
349
+ - `warmup_ratio`: 0.25
350
+ - `save_safetensors`: False
351
+ - `fp16`: True
352
+ - `push_to_hub`: True
353
+ - `hub_model_id`: bobox/DeBERTa-small-ST-v1-test-UnifiedDatasets-Ft2-checkpoints-tmp
354
+ - `hub_strategy`: all_checkpoints
355
+ - `batch_sampler`: no_duplicates
356
+
357
+ #### All Hyperparameters
358
+ <details><summary>Click to expand</summary>
359
+
360
+ - `overwrite_output_dir`: False
361
+ - `do_predict`: False
362
+ - `eval_strategy`: steps
363
+ - `prediction_loss_only`: True
364
+ - `per_device_train_batch_size`: 320
365
+ - `per_device_eval_batch_size`: 128
366
+ - `per_gpu_train_batch_size`: None
367
+ - `per_gpu_eval_batch_size`: None
368
+ - `gradient_accumulation_steps`: 1
369
+ - `eval_accumulation_steps`: None
370
+ - `torch_empty_cache_steps`: None
371
+ - `learning_rate`: 2e-05
372
+ - `weight_decay`: 0.0001
373
+ - `adam_beta1`: 0.9
374
+ - `adam_beta2`: 0.999
375
+ - `adam_epsilon`: 1e-08
376
+ - `max_grad_norm`: 1.0
377
+ - `num_train_epochs`: 1
378
+ - `max_steps`: -1
379
+ - `lr_scheduler_type`: cosine_with_restarts
380
+ - `lr_scheduler_kwargs`: {'num_cycles': 3}
381
+ - `warmup_ratio`: 0.25
382
+ - `warmup_steps`: 0
383
+ - `log_level`: passive
384
+ - `log_level_replica`: warning
385
+ - `log_on_each_node`: True
386
+ - `logging_nan_inf_filter`: True
387
+ - `save_safetensors`: False
388
+ - `save_on_each_node`: False
389
+ - `save_only_model`: False
390
+ - `restore_callback_states_from_checkpoint`: False
391
+ - `no_cuda`: False
392
+ - `use_cpu`: False
393
+ - `use_mps_device`: False
394
+ - `seed`: 42
395
+ - `data_seed`: None
396
+ - `jit_mode_eval`: False
397
+ - `use_ipex`: False
398
+ - `bf16`: False
399
+ - `fp16`: True
400
+ - `fp16_opt_level`: O1
401
+ - `half_precision_backend`: auto
402
+ - `bf16_full_eval`: False
403
+ - `fp16_full_eval`: False
404
+ - `tf32`: None
405
+ - `local_rank`: 0
406
+ - `ddp_backend`: None
407
+ - `tpu_num_cores`: None
408
+ - `tpu_metrics_debug`: False
409
+ - `debug`: []
410
+ - `dataloader_drop_last`: False
411
+ - `dataloader_num_workers`: 0
412
+ - `dataloader_prefetch_factor`: None
413
+ - `past_index`: -1
414
+ - `disable_tqdm`: False
415
+ - `remove_unused_columns`: True
416
+ - `label_names`: None
417
+ - `load_best_model_at_end`: False
418
+ - `ignore_data_skip`: False
419
+ - `fsdp`: []
420
+ - `fsdp_min_num_params`: 0
421
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
422
+ - `fsdp_transformer_layer_cls_to_wrap`: None
423
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
424
+ - `deepspeed`: None
425
+ - `label_smoothing_factor`: 0.0
426
+ - `optim`: adamw_torch
427
+ - `optim_args`: None
428
+ - `adafactor`: False
429
+ - `group_by_length`: False
430
+ - `length_column_name`: length
431
+ - `ddp_find_unused_parameters`: None
432
+ - `ddp_bucket_cap_mb`: None
433
+ - `ddp_broadcast_buffers`: False
434
+ - `dataloader_pin_memory`: True
435
+ - `dataloader_persistent_workers`: False
436
+ - `skip_memory_metrics`: True
437
+ - `use_legacy_prediction_loop`: False
438
+ - `push_to_hub`: True
439
+ - `resume_from_checkpoint`: None
440
+ - `hub_model_id`: bobox/DeBERTa-small-ST-v1-test-UnifiedDatasets-Ft2-checkpoints-tmp
441
+ - `hub_strategy`: all_checkpoints
442
+ - `hub_private_repo`: False
443
+ - `hub_always_push`: False
444
+ - `gradient_checkpointing`: False
445
+ - `gradient_checkpointing_kwargs`: None
446
+ - `include_inputs_for_metrics`: False
447
+ - `eval_do_concat_batches`: True
448
+ - `fp16_backend`: auto
449
+ - `push_to_hub_model_id`: None
450
+ - `push_to_hub_organization`: None
451
+ - `mp_parameters`:
452
+ - `auto_find_batch_size`: False
453
+ - `full_determinism`: False
454
+ - `torchdynamo`: None
455
+ - `ray_scope`: last
456
+ - `ddp_timeout`: 1800
457
+ - `torch_compile`: False
458
+ - `torch_compile_backend`: None
459
+ - `torch_compile_mode`: None
460
+ - `dispatch_batches`: None
461
+ - `split_batches`: None
462
+ - `include_tokens_per_second`: False
463
+ - `include_num_input_tokens_seen`: False
464
+ - `neftune_noise_alpha`: None
465
+ - `optim_target_modules`: None
466
+ - `batch_eval_metrics`: False
467
+ - `eval_on_start`: False
468
+ - `eval_use_gather_object`: False
469
+ - `batch_sampler`: no_duplicates
470
+ - `multi_dataset_batch_sampler`: proportional
471
+
472
+ </details>
473
+
474
+ ### Training Logs
475
+ <details><summary>Click to expand</summary>
476
+
477
+ | Epoch | Step | Training Loss | loss | sts-test_spearman_cosine |
478
+ |:------:|:----:|:-------------:|:------:|:------------------------:|
479
+ | 0.0012 | 1 | 0.3208 | - | - |
480
+ | 0.0025 | 2 | 0.1703 | - | - |
481
+ | 0.0037 | 3 | 0.3362 | - | - |
482
+ | 0.0049 | 4 | 0.3346 | - | - |
483
+ | 0.0062 | 5 | 0.2484 | - | - |
484
+ | 0.0074 | 6 | 0.2249 | - | - |
485
+ | 0.0086 | 7 | 0.2724 | - | - |
486
+ | 0.0098 | 8 | 0.251 | - | - |
487
+ | 0.0111 | 9 | 0.2413 | - | - |
488
+ | 0.0123 | 10 | 0.382 | - | - |
489
+ | 0.0135 | 11 | 0.2695 | - | - |
490
+ | 0.0148 | 12 | 0.2392 | - | - |
491
+ | 0.0160 | 13 | 0.3603 | - | - |
492
+ | 0.0172 | 14 | 0.3282 | - | - |
493
+ | 0.0185 | 15 | 0.2878 | - | - |
494
+ | 0.0197 | 16 | 0.3046 | - | - |
495
+ | 0.0209 | 17 | 0.3946 | - | - |
496
+ | 0.0221 | 18 | 0.2038 | - | - |
497
+ | 0.0234 | 19 | 0.3542 | - | - |
498
+ | 0.0246 | 20 | 0.2369 | - | - |
499
+ | 0.0258 | 21 | 0.1967 | 0.1451 | 0.9081 |
500
+ | 0.0271 | 22 | 0.2368 | - | - |
501
+ | 0.0283 | 23 | 0.263 | - | - |
502
+ | 0.0295 | 24 | 0.3595 | - | - |
503
+ | 0.0308 | 25 | 0.3073 | - | - |
504
+ | 0.0320 | 26 | 0.2232 | - | - |
505
+ | 0.0332 | 27 | 0.1822 | - | - |
506
+ | 0.0344 | 28 | 0.251 | - | - |
507
+ | 0.0357 | 29 | 0.2677 | - | - |
508
+ | 0.0369 | 30 | 0.3252 | - | - |
509
+ | 0.0381 | 31 | 0.2058 | - | - |
510
+ | 0.0394 | 32 | 0.3083 | - | - |
511
+ | 0.0406 | 33 | 0.2109 | - | - |
512
+ | 0.0418 | 34 | 0.2751 | - | - |
513
+ | 0.0431 | 35 | 0.2269 | - | - |
514
+ | 0.0443 | 36 | 0.2333 | - | - |
515
+ | 0.0455 | 37 | 0.2747 | - | - |
516
+ | 0.0467 | 38 | 0.1285 | - | - |
517
+ | 0.0480 | 39 | 0.3659 | - | - |
518
+ | 0.0492 | 40 | 0.3991 | - | - |
519
+ | 0.0504 | 41 | 0.2647 | - | - |
520
+ | 0.0517 | 42 | 0.3627 | 0.1373 | 0.9084 |
521
+ | 0.0529 | 43 | 0.2026 | - | - |
522
+ | 0.0541 | 44 | 0.1923 | - | - |
523
+ | 0.0554 | 45 | 0.2369 | - | - |
524
+ | 0.0566 | 46 | 0.2268 | - | - |
525
+ | 0.0578 | 47 | 0.2975 | - | - |
526
+ | 0.0590 | 48 | 0.1922 | - | - |
527
+ | 0.0603 | 49 | 0.1906 | - | - |
528
+ | 0.0615 | 50 | 0.2379 | - | - |
529
+ | 0.0627 | 51 | 0.3796 | - | - |
530
+ | 0.0640 | 52 | 0.1821 | - | - |
531
+ | 0.0652 | 53 | 0.1257 | - | - |
532
+ | 0.0664 | 54 | 0.2368 | - | - |
533
+ | 0.0677 | 55 | 0.294 | - | - |
534
+ | 0.0689 | 56 | 0.2594 | - | - |
535
+ | 0.0701 | 57 | 0.2972 | - | - |
536
+ | 0.0713 | 58 | 0.2297 | - | - |
537
+ | 0.0726 | 59 | 0.1487 | - | - |
538
+ | 0.0738 | 60 | 0.182 | - | - |
539
+ | 0.0750 | 61 | 0.2516 | - | - |
540
+ | 0.0763 | 62 | 0.2809 | - | - |
541
+ | 0.0775 | 63 | 0.1371 | 0.1308 | 0.9068 |
542
+ | 0.0787 | 64 | 0.2149 | - | - |
543
+ | 0.0800 | 65 | 0.1806 | - | - |
544
+ | 0.0812 | 66 | 0.1458 | - | - |
545
+ | 0.0824 | 67 | 0.249 | - | - |
546
+ | 0.0836 | 68 | 0.2787 | - | - |
547
+ | 0.0849 | 69 | 0.288 | - | - |
548
+ | 0.0861 | 70 | 0.1461 | - | - |
549
+ | 0.0873 | 71 | 0.2304 | - | - |
550
+ | 0.0886 | 72 | 0.3505 | - | - |
551
+ | 0.0898 | 73 | 0.2227 | - | - |
552
+ | 0.0910 | 74 | 0.1746 | - | - |
553
+ | 0.0923 | 75 | 0.1484 | - | - |
554
+ | 0.0935 | 76 | 0.1346 | - | - |
555
+ | 0.0947 | 77 | 0.2112 | - | - |
556
+ | 0.0959 | 78 | 0.3138 | - | - |
557
+ | 0.0972 | 79 | 0.2675 | - | - |
558
+ | 0.0984 | 80 | 0.2849 | - | - |
559
+ | 0.0996 | 81 | 0.1719 | - | - |
560
+ | 0.1009 | 82 | 0.2749 | - | - |
561
+ | 0.1021 | 83 | 0.3097 | - | - |
562
+ | 0.1033 | 84 | 0.2068 | 0.1260 | 0.9045 |
563
+ | 0.1046 | 85 | 0.22 | - | - |
564
+ | 0.1058 | 86 | 0.2977 | - | - |
565
+ | 0.1070 | 87 | 0.209 | - | - |
566
+ | 0.1082 | 88 | 0.2215 | - | - |
567
+ | 0.1095 | 89 | 0.1948 | - | - |
568
+ | 0.1107 | 90 | 0.2084 | - | - |
569
+ | 0.1119 | 91 | 0.1823 | - | - |
570
+ | 0.1132 | 92 | 0.255 | - | - |
571
+ | 0.1144 | 93 | 0.2675 | - | - |
572
+ | 0.1156 | 94 | 0.18 | - | - |
573
+ | 0.1169 | 95 | 0.2891 | - | - |
574
+ | 0.1181 | 96 | 0.253 | - | - |
575
+ | 0.1193 | 97 | 0.3481 | - | - |
576
+ | 0.1205 | 98 | 0.1688 | - | - |
577
+ | 0.1218 | 99 | 0.1808 | - | - |
578
+ | 0.1230 | 100 | 0.2821 | - | - |
579
+ | 0.1242 | 101 | 0.1856 | - | - |
580
+ | 0.1255 | 102 | 0.1441 | - | - |
581
+ | 0.1267 | 103 | 0.226 | - | - |
582
+ | 0.1279 | 104 | 0.1662 | - | - |
583
+ | 0.1292 | 105 | 0.2043 | 0.1187 | 0.9051 |
584
+ | 0.1304 | 106 | 0.3907 | - | - |
585
+ | 0.1316 | 107 | 0.1332 | - | - |
586
+ | 0.1328 | 108 | 0.2243 | - | - |
587
+ | 0.1341 | 109 | 0.162 | - | - |
588
+ | 0.1353 | 110 | 0.1481 | - | - |
589
+ | 0.1365 | 111 | 0.2163 | - | - |
590
+ | 0.1378 | 112 | 0.24 | - | - |
591
+ | 0.1390 | 113 | 0.1406 | - | - |
592
+ | 0.1402 | 114 | 0.1522 | - | - |
593
+ | 0.1415 | 115 | 0.2593 | - | - |
594
+ | 0.1427 | 116 | 0.2426 | - | - |
595
+ | 0.1439 | 117 | 0.1781 | - | - |
596
+ | 0.1451 | 118 | 0.264 | - | - |
597
+ | 0.1464 | 119 | 0.1944 | - | - |
598
+ | 0.1476 | 120 | 0.1341 | - | - |
599
+ | 0.1488 | 121 | 0.155 | - | - |
600
+ | 0.1501 | 122 | 0.2052 | - | - |
601
+ | 0.1513 | 123 | 0.2023 | - | - |
602
+ | 0.1525 | 124 | 0.1519 | - | - |
603
+ | 0.1538 | 125 | 0.2118 | - | - |
604
+ | 0.1550 | 126 | 0.2489 | 0.1147 | 0.9058 |
605
+ | 0.1562 | 127 | 0.1988 | - | - |
606
+ | 0.1574 | 128 | 0.1541 | - | - |
607
+ | 0.1587 | 129 | 0.1819 | - | - |
608
+ | 0.1599 | 130 | 0.1582 | - | - |
609
+ | 0.1611 | 131 | 0.2866 | - | - |
610
+ | 0.1624 | 132 | 0.2766 | - | - |
611
+ | 0.1636 | 133 | 0.1299 | - | - |
612
+ | 0.1648 | 134 | 0.2558 | - | - |
613
+ | 0.1661 | 135 | 0.1687 | - | - |
614
+ | 0.1673 | 136 | 0.173 | - | - |
615
+ | 0.1685 | 137 | 0.2276 | - | - |
616
+ | 0.1697 | 138 | 0.2174 | - | - |
617
+ | 0.1710 | 139 | 0.2666 | - | - |
618
+ | 0.1722 | 140 | 0.1524 | - | - |
619
+ | 0.1734 | 141 | 0.1179 | - | - |
620
+ | 0.1747 | 142 | 0.2475 | - | - |
621
+ | 0.1759 | 143 | 0.2662 | - | - |
622
+ | 0.1771 | 144 | 0.1596 | - | - |
623
+ | 0.1784 | 145 | 0.2331 | - | - |
624
+ | 0.1796 | 146 | 0.2905 | - | - |
625
+ | 0.1808 | 147 | 0.1342 | 0.1088 | 0.9051 |
626
+ | 0.1820 | 148 | 0.0839 | - | - |
627
+ | 0.1833 | 149 | 0.2055 | - | - |
628
+ | 0.1845 | 150 | 0.2196 | - | - |
629
+ | 0.1857 | 151 | 0.2283 | - | - |
630
+ | 0.1870 | 152 | 0.2105 | - | - |
631
+ | 0.1882 | 153 | 0.1534 | - | - |
632
+ | 0.1894 | 154 | 0.1954 | - | - |
633
+ | 0.1907 | 155 | 0.1332 | - | - |
634
+ | 0.1919 | 156 | 0.19 | - | - |
635
+ | 0.1931 | 157 | 0.1878 | - | - |
636
+ | 0.1943 | 158 | 0.1518 | - | - |
637
+ | 0.1956 | 159 | 0.1906 | - | - |
638
+ | 0.1968 | 160 | 0.155 | - | - |
639
+ | 0.1980 | 161 | 0.1519 | - | - |
640
+ | 0.1993 | 162 | 0.1726 | - | - |
641
+ | 0.2005 | 163 | 0.1618 | - | - |
642
+ | 0.2017 | 164 | 0.2767 | - | - |
643
+ | 0.2030 | 165 | 0.1996 | - | - |
644
+ | 0.2042 | 166 | 0.1907 | - | - |
645
+ | 0.2054 | 167 | 0.1928 | - | - |
646
+ | 0.2066 | 168 | 0.1507 | 0.1082 | 0.9045 |
647
+ | 0.2079 | 169 | 0.1637 | - | - |
648
+ | 0.2091 | 170 | 0.1687 | - | - |
649
+ | 0.2103 | 171 | 0.2181 | - | - |
650
+ | 0.2116 | 172 | 0.1496 | - | - |
651
+ | 0.2128 | 173 | 0.1749 | - | - |
652
+ | 0.2140 | 174 | 0.2374 | - | - |
653
+ | 0.2153 | 175 | 0.2122 | - | - |
654
+ | 0.2165 | 176 | 0.1617 | - | - |
655
+ | 0.2177 | 177 | 0.168 | - | - |
656
+ | 0.2189 | 178 | 0.263 | - | - |
657
+ | 0.2202 | 179 | 0.1328 | - | - |
658
+ | 0.2214 | 180 | 0.3157 | - | - |
659
+ | 0.2226 | 181 | 0.2164 | - | - |
660
+ | 0.2239 | 182 | 0.1255 | - | - |
661
+ | 0.2251 | 183 | 0.2863 | - | - |
662
+ | 0.2263 | 184 | 0.155 | - | - |
663
+ | 0.2276 | 185 | 0.1271 | - | - |
664
+ | 0.2288 | 186 | 0.216 | - | - |
665
+ | 0.2300 | 187 | 0.205 | - | - |
666
+ | 0.2312 | 188 | 0.1575 | - | - |
667
+ | 0.2325 | 189 | 0.1939 | 0.1057 | 0.9046 |
668
+ | 0.2337 | 190 | 0.2209 | - | - |
669
+ | 0.2349 | 191 | 0.153 | - | - |
670
+ | 0.2362 | 192 | 0.2187 | - | - |
671
+ | 0.2374 | 193 | 0.1593 | - | - |
672
+ | 0.2386 | 194 | 0.173 | - | - |
673
+ | 0.2399 | 195 | 0.2377 | - | - |
674
+ | 0.2411 | 196 | 0.2281 | - | - |
675
+ | 0.2423 | 197 | 0.2651 | - | - |
676
+ | 0.2435 | 198 | 0.118 | - | - |
677
+ | 0.2448 | 199 | 0.1728 | - | - |
678
+ | 0.2460 | 200 | 0.2299 | - | - |
679
+ | 0.2472 | 201 | 0.2342 | - | - |
680
+ | 0.2485 | 202 | 0.2413 | - | - |
681
+ | 0.2497 | 203 | 0.168 | - | - |
682
+ | 0.2509 | 204 | 0.1474 | - | - |
683
+ | 0.2522 | 205 | 0.1102 | - | - |
684
+ | 0.2534 | 206 | 0.2326 | - | - |
685
+ | 0.2546 | 207 | 0.1787 | - | - |
686
+ | 0.2558 | 208 | 0.1423 | - | - |
687
+ | 0.2571 | 209 | 0.2069 | - | - |
688
+ | 0.2583 | 210 | 0.136 | 0.1040 | 0.9056 |
689
+ | 0.2595 | 211 | 0.2407 | - | - |
690
+ | 0.2608 | 212 | 0.212 | - | - |
691
+ | 0.2620 | 213 | 0.1361 | - | - |
692
+ | 0.2632 | 214 | 0.2356 | - | - |
693
+ | 0.2645 | 215 | 0.1059 | - | - |
694
+ | 0.2657 | 216 | 0.2501 | - | - |
695
+ | 0.2669 | 217 | 0.1817 | - | - |
696
+ | 0.2681 | 218 | 0.2022 | - | - |
697
+ | 0.2694 | 219 | 0.2235 | - | - |
698
+ | 0.2706 | 220 | 0.2437 | - | - |
699
+ | 0.2718 | 221 | 0.1859 | - | - |
700
+ | 0.2731 | 222 | 0.2167 | - | - |
701
+ | 0.2743 | 223 | 0.1495 | - | - |
702
+ | 0.2755 | 224 | 0.2876 | - | - |
703
+ | 0.2768 | 225 | 0.1842 | - | - |
704
+ | 0.2780 | 226 | 0.144 | - | - |
705
+ | 0.2792 | 227 | 0.1571 | - | - |
706
+ | 0.2804 | 228 | 0.209 | - | - |
707
+ | 0.2817 | 229 | 0.2075 | - | - |
708
+ | 0.2829 | 230 | 0.1722 | - | - |
709
+ | 0.2841 | 231 | 0.1464 | 0.1039 | 0.9087 |
710
+ | 0.2854 | 232 | 0.2675 | - | - |
711
+ | 0.2866 | 233 | 0.2585 | - | - |
712
+ | 0.2878 | 234 | 0.134 | - | - |
713
+ | 0.2891 | 235 | 0.1765 | - | - |
714
+ | 0.2903 | 236 | 0.1826 | - | - |
715
+ | 0.2915 | 237 | 0.222 | - | - |
716
+ | 0.2927 | 238 | 0.134 | - | - |
717
+ | 0.2940 | 239 | 0.1902 | - | - |
718
+ | 0.2952 | 240 | 0.2461 | - | - |
719
+ | 0.2964 | 241 | 0.3094 | - | - |
720
+ | 0.2977 | 242 | 0.2252 | - | - |
721
+ | 0.2989 | 243 | 0.2466 | - | - |
722
+ | 0.3001 | 244 | 0.139 | - | - |
723
+ | 0.3014 | 245 | 0.154 | - | - |
724
+ | 0.3026 | 246 | 0.1979 | - | - |
725
+ | 0.3038 | 247 | 0.1121 | - | - |
726
+ | 0.3050 | 248 | 0.1361 | - | - |
727
+ | 0.3063 | 249 | 0.2492 | - | - |
728
+ | 0.3075 | 250 | 0.1903 | - | - |
729
+ | 0.3087 | 251 | 0.2333 | - | - |
730
+ | 0.3100 | 252 | 0.1805 | 0.1030 | 0.9099 |
731
+ | 0.3112 | 253 | 0.1929 | - | - |
732
+ | 0.3124 | 254 | 0.1424 | - | - |
733
+ | 0.3137 | 255 | 0.2318 | - | - |
734
+ | 0.3149 | 256 | 0.1524 | - | - |
735
+ | 0.3161 | 257 | 0.2195 | - | - |
736
+ | 0.3173 | 258 | 0.1338 | - | - |
737
+ | 0.3186 | 259 | 0.2543 | - | - |
738
+ | 0.3198 | 260 | 0.202 | - | - |
739
+ | 0.3210 | 261 | 0.1489 | - | - |
740
+ | 0.3223 | 262 | 0.1937 | - | - |
741
+ | 0.3235 | 263 | 0.2334 | - | - |
742
+ | 0.3247 | 264 | 0.1942 | - | - |
743
+ | 0.3260 | 265 | 0.2013 | - | - |
744
+ | 0.3272 | 266 | 0.2954 | - | - |
745
+ | 0.3284 | 267 | 0.188 | - | - |
746
+ | 0.3296 | 268 | 0.1688 | - | - |
747
+ | 0.3309 | 269 | 0.1415 | - | - |
748
+ | 0.3321 | 270 | 0.2249 | - | - |
749
+ | 0.3333 | 271 | 0.2606 | - | - |
750
+ | 0.3346 | 272 | 0.2559 | - | - |
751
+ | 0.3358 | 273 | 0.2673 | 0.1039 | 0.9078 |
752
+ | 0.3370 | 274 | 0.1618 | - | - |
753
+ | 0.3383 | 275 | 0.2602 | - | - |
754
+ | 0.3395 | 276 | 0.2339 | - | - |
755
+ | 0.3407 | 277 | 0.1843 | - | - |
756
+ | 0.3419 | 278 | 0.133 | - | - |
757
+ | 0.3432 | 279 | 0.2345 | - | - |
758
+ | 0.3444 | 280 | 0.2808 | - | - |
759
+ | 0.3456 | 281 | 0.1044 | - | - |
760
+ | 0.3469 | 282 | 0.1622 | - | - |
761
+ | 0.3481 | 283 | 0.1303 | - | - |
762
+ | 0.3493 | 284 | 0.1453 | - | - |
763
+ | 0.3506 | 285 | 0.237 | - | - |
764
+ | 0.3518 | 286 | 0.1726 | - | - |
765
+ | 0.3530 | 287 | 0.2195 | - | - |
766
+ | 0.3542 | 288 | 0.3016 | - | - |
767
+ | 0.3555 | 289 | 0.1626 | - | - |
768
+ | 0.3567 | 290 | 0.1902 | - | - |
769
+ | 0.3579 | 291 | 0.1387 | - | - |
770
+ | 0.3592 | 292 | 0.1047 | - | - |
771
+ | 0.3604 | 293 | 0.1954 | - | - |
772
+ | 0.3616 | 294 | 0.2089 | 0.1029 | 0.9083 |
773
+ | 0.3629 | 295 | 0.1485 | - | - |
774
+ | 0.3641 | 296 | 0.1724 | - | - |
775
+ | 0.3653 | 297 | 0.2017 | - | - |
776
+ | 0.3665 | 298 | 0.1591 | - | - |
777
+ | 0.3678 | 299 | 0.2396 | - | - |
778
+ | 0.3690 | 300 | 0.1395 | - | - |
779
+ | 0.3702 | 301 | 0.1806 | - | - |
780
+ | 0.3715 | 302 | 0.1882 | - | - |
781
+ | 0.3727 | 303 | 0.1188 | - | - |
782
+ | 0.3739 | 304 | 0.1564 | - | - |
783
+ | 0.3752 | 305 | 0.313 | - | - |
784
+ | 0.3764 | 306 | 0.1455 | - | - |
785
+ | 0.3776 | 307 | 0.1535 | - | - |
786
+ | 0.3788 | 308 | 0.099 | - | - |
787
+ | 0.3801 | 309 | 0.1733 | - | - |
788
+ | 0.3813 | 310 | 0.1891 | - | - |
789
+ | 0.3825 | 311 | 0.2128 | - | - |
790
+ | 0.3838 | 312 | 0.2042 | - | - |
791
+ | 0.3850 | 313 | 0.203 | - | - |
792
+ | 0.3862 | 314 | 0.2249 | - | - |
793
+ | 0.3875 | 315 | 0.1597 | 0.1014 | 0.9074 |
794
+ | 0.3887 | 316 | 0.1358 | - | - |
795
+ | 0.3899 | 317 | 0.207 | - | - |
796
+ | 0.3911 | 318 | 0.193 | - | - |
797
+ | 0.3924 | 319 | 0.1141 | - | - |
798
+ | 0.3936 | 320 | 0.2835 | - | - |
799
+ | 0.3948 | 321 | 0.2589 | - | - |
800
+ | 0.3961 | 322 | 0.088 | - | - |
801
+ | 0.3973 | 323 | 0.1675 | - | - |
802
+ | 0.3985 | 324 | 0.1525 | - | - |
803
+ | 0.3998 | 325 | 0.1401 | - | - |
804
+ | 0.4010 | 326 | 0.2109 | - | - |
805
+ | 0.4022 | 327 | 0.1382 | - | - |
806
+ | 0.4034 | 328 | 0.1724 | - | - |
807
+
808
+ </details>
809
+
810
+ ### Framework Versions
811
+ - Python: 3.10.14
812
+ - Sentence Transformers: 3.0.1
813
+ - Transformers: 4.44.0
814
+ - PyTorch: 2.4.0
815
+ - Accelerate: 0.33.0
816
+ - Datasets: 2.21.0
817
+ - Tokenizers: 0.19.1
818
+
819
+ ## Citation
820
+
821
+ ### BibTeX
822
+
823
+ #### Sentence Transformers
824
+ ```bibtex
825
+ @inproceedings{reimers-2019-sentence-bert,
826
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
827
+ author = "Reimers, Nils and Gurevych, Iryna",
828
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
829
+ month = "11",
830
+ year = "2019",
831
+ publisher = "Association for Computational Linguistics",
832
+ url = "https://arxiv.org/abs/1908.10084",
833
+ }
834
+ ```
835
+
836
+ <!--
837
+ ## Glossary
838
+
839
+ *Clearly define terms in order to be accessible across audiences.*
840
+ -->
841
+
842
+ <!--
843
+ ## Model Card Authors
844
+
845
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
846
+ -->
847
+
848
+ <!--
849
+ ## Model Card Contact
850
+
851
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
852
+ -->
checkpoint-328/added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "[MASK]": 128000
3
+ }
checkpoint-328/config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "bobox/DeBERTa-small-ST-v1-test-step3",
3
+ "architectures": [
4
+ "DebertaV2Model"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 768,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 3072,
12
+ "layer_norm_eps": 1e-07,
13
+ "max_position_embeddings": 512,
14
+ "max_relative_positions": -1,
15
+ "model_type": "deberta-v2",
16
+ "norm_rel_ebd": "layer_norm",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 6,
19
+ "pad_token_id": 0,
20
+ "pooler_dropout": 0,
21
+ "pooler_hidden_act": "gelu",
22
+ "pooler_hidden_size": 768,
23
+ "pos_att_type": [
24
+ "p2c",
25
+ "c2p"
26
+ ],
27
+ "position_biased_input": false,
28
+ "position_buckets": 256,
29
+ "relative_attention": true,
30
+ "share_att_key": true,
31
+ "torch_dtype": "float32",
32
+ "transformers_version": "4.44.0",
33
+ "type_vocab_size": 0,
34
+ "vocab_size": 128100
35
+ }
checkpoint-328/config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.44.0",
5
+ "pytorch": "2.4.0"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
checkpoint-328/modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
checkpoint-328/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dacfb04aa0e35b5e12399790b596d583dc5c235665f764a2df38827194044da7
3
+ size 1130520122
checkpoint-328/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d3e8436bc4f7cea71076a0cf9339065d15e469b8a96cd14909e7d6fdbc1987b6
3
+ size 565251810
checkpoint-328/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7d349e6107c95f2e08d394591dbc2ed1d0b0887617ba35392cf6254e9078b3f7
3
+ size 14244
checkpoint-328/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ae993c3bf4ce4e98379a86f237b95840d1b3daa0545fd51245e7072535c431d
3
+ size 1064
checkpoint-328/sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
checkpoint-328/special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "[CLS]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "[SEP]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "[MASK]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "[PAD]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "[SEP]",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": true,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
checkpoint-328/spm.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c679fbf93643d19aab7ee10c0b99e460bdbc02fedf34b92b05af343b4af586fd
3
+ size 2464616
checkpoint-328/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-328/tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[CLS]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[SEP]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[UNK]",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128000": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "[CLS]",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "[CLS]",
47
+ "do_lower_case": false,
48
+ "eos_token": "[SEP]",
49
+ "mask_token": "[MASK]",
50
+ "max_length": 512,
51
+ "model_max_length": 512,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "[PAD]",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "[SEP]",
57
+ "sp_model_kwargs": {},
58
+ "split_by_punct": false,
59
+ "stride": 0,
60
+ "tokenizer_class": "DebertaV2Tokenizer",
61
+ "truncation_side": "right",
62
+ "truncation_strategy": "longest_first",
63
+ "unk_token": "[UNK]",
64
+ "vocab_type": "spm"
65
+ }
checkpoint-328/trainer_state.json ADDED
@@ -0,0 +1,2599 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 0.4034440344403444,
5
+ "eval_steps": 21,
6
+ "global_step": 328,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0012300123001230013,
13
+ "grad_norm": 6.540346145629883,
14
+ "learning_rate": 9.803921568627452e-08,
15
+ "loss": 0.3208,
16
+ "step": 1
17
+ },
18
+ {
19
+ "epoch": 0.0024600246002460025,
20
+ "grad_norm": 5.055933475494385,
21
+ "learning_rate": 1.9607843137254904e-07,
22
+ "loss": 0.1703,
23
+ "step": 2
24
+ },
25
+ {
26
+ "epoch": 0.0036900369003690036,
27
+ "grad_norm": 6.361550331115723,
28
+ "learning_rate": 2.9411764705882356e-07,
29
+ "loss": 0.3362,
30
+ "step": 3
31
+ },
32
+ {
33
+ "epoch": 0.004920049200492005,
34
+ "grad_norm": 6.709433078765869,
35
+ "learning_rate": 3.921568627450981e-07,
36
+ "loss": 0.3346,
37
+ "step": 4
38
+ },
39
+ {
40
+ "epoch": 0.006150061500615006,
41
+ "grad_norm": 5.4415154457092285,
42
+ "learning_rate": 4.901960784313725e-07,
43
+ "loss": 0.2484,
44
+ "step": 5
45
+ },
46
+ {
47
+ "epoch": 0.007380073800738007,
48
+ "grad_norm": 5.709558010101318,
49
+ "learning_rate": 5.882352941176471e-07,
50
+ "loss": 0.2249,
51
+ "step": 6
52
+ },
53
+ {
54
+ "epoch": 0.008610086100861008,
55
+ "grad_norm": 6.553178787231445,
56
+ "learning_rate": 6.862745098039217e-07,
57
+ "loss": 0.2724,
58
+ "step": 7
59
+ },
60
+ {
61
+ "epoch": 0.00984009840098401,
62
+ "grad_norm": 5.640111446380615,
63
+ "learning_rate": 7.843137254901962e-07,
64
+ "loss": 0.251,
65
+ "step": 8
66
+ },
67
+ {
68
+ "epoch": 0.01107011070110701,
69
+ "grad_norm": 5.696380615234375,
70
+ "learning_rate": 8.823529411764707e-07,
71
+ "loss": 0.2413,
72
+ "step": 9
73
+ },
74
+ {
75
+ "epoch": 0.012300123001230012,
76
+ "grad_norm": 6.983877182006836,
77
+ "learning_rate": 9.80392156862745e-07,
78
+ "loss": 0.382,
79
+ "step": 10
80
+ },
81
+ {
82
+ "epoch": 0.013530135301353014,
83
+ "grad_norm": 6.066723346710205,
84
+ "learning_rate": 1.0784313725490197e-06,
85
+ "loss": 0.2695,
86
+ "step": 11
87
+ },
88
+ {
89
+ "epoch": 0.014760147601476014,
90
+ "grad_norm": 5.643115520477295,
91
+ "learning_rate": 1.1764705882352942e-06,
92
+ "loss": 0.2392,
93
+ "step": 12
94
+ },
95
+ {
96
+ "epoch": 0.015990159901599015,
97
+ "grad_norm": 6.062892436981201,
98
+ "learning_rate": 1.2745098039215686e-06,
99
+ "loss": 0.3603,
100
+ "step": 13
101
+ },
102
+ {
103
+ "epoch": 0.017220172201722016,
104
+ "grad_norm": 6.2491655349731445,
105
+ "learning_rate": 1.3725490196078434e-06,
106
+ "loss": 0.3282,
107
+ "step": 14
108
+ },
109
+ {
110
+ "epoch": 0.01845018450184502,
111
+ "grad_norm": 6.1164398193359375,
112
+ "learning_rate": 1.4705882352941177e-06,
113
+ "loss": 0.2878,
114
+ "step": 15
115
+ },
116
+ {
117
+ "epoch": 0.01968019680196802,
118
+ "grad_norm": 5.676611423492432,
119
+ "learning_rate": 1.5686274509803923e-06,
120
+ "loss": 0.3046,
121
+ "step": 16
122
+ },
123
+ {
124
+ "epoch": 0.020910209102091022,
125
+ "grad_norm": 7.181272983551025,
126
+ "learning_rate": 1.6666666666666667e-06,
127
+ "loss": 0.3946,
128
+ "step": 17
129
+ },
130
+ {
131
+ "epoch": 0.02214022140221402,
132
+ "grad_norm": 5.430984020233154,
133
+ "learning_rate": 1.7647058823529414e-06,
134
+ "loss": 0.2038,
135
+ "step": 18
136
+ },
137
+ {
138
+ "epoch": 0.023370233702337023,
139
+ "grad_norm": 7.2283220291137695,
140
+ "learning_rate": 1.8627450980392158e-06,
141
+ "loss": 0.3542,
142
+ "step": 19
143
+ },
144
+ {
145
+ "epoch": 0.024600246002460024,
146
+ "grad_norm": 5.587338924407959,
147
+ "learning_rate": 1.96078431372549e-06,
148
+ "loss": 0.2369,
149
+ "step": 20
150
+ },
151
+ {
152
+ "epoch": 0.025830258302583026,
153
+ "grad_norm": 4.456090927124023,
154
+ "learning_rate": 2.058823529411765e-06,
155
+ "loss": 0.1967,
156
+ "step": 21
157
+ },
158
+ {
159
+ "epoch": 0.025830258302583026,
160
+ "eval_loss": 0.14506277441978455,
161
+ "eval_runtime": 54.872,
162
+ "eval_samples_per_second": 27.446,
163
+ "eval_steps_per_second": 0.219,
164
+ "eval_sts-test_pearson_cosine": 0.8860152816653839,
165
+ "eval_sts-test_pearson_dot": 0.8766503125978379,
166
+ "eval_sts-test_pearson_euclidean": 0.9084101290541164,
167
+ "eval_sts-test_pearson_manhattan": 0.909121525028934,
168
+ "eval_sts-test_pearson_max": 0.909121525028934,
169
+ "eval_sts-test_spearman_cosine": 0.9080919696366193,
170
+ "eval_sts-test_spearman_dot": 0.8799434709726907,
171
+ "eval_sts-test_spearman_euclidean": 0.9044399981995129,
172
+ "eval_sts-test_spearman_manhattan": 0.9048055712538192,
173
+ "eval_sts-test_spearman_max": 0.9080919696366193,
174
+ "step": 21
175
+ },
176
+ {
177
+ "epoch": 0.02706027060270603,
178
+ "grad_norm": 6.088884353637695,
179
+ "learning_rate": 2.1568627450980393e-06,
180
+ "loss": 0.2368,
181
+ "step": 22
182
+ },
183
+ {
184
+ "epoch": 0.028290282902829027,
185
+ "grad_norm": 5.354013919830322,
186
+ "learning_rate": 2.254901960784314e-06,
187
+ "loss": 0.263,
188
+ "step": 23
189
+ },
190
+ {
191
+ "epoch": 0.02952029520295203,
192
+ "grad_norm": 7.822023391723633,
193
+ "learning_rate": 2.3529411764705885e-06,
194
+ "loss": 0.3595,
195
+ "step": 24
196
+ },
197
+ {
198
+ "epoch": 0.03075030750307503,
199
+ "grad_norm": 6.401333332061768,
200
+ "learning_rate": 2.450980392156863e-06,
201
+ "loss": 0.3073,
202
+ "step": 25
203
+ },
204
+ {
205
+ "epoch": 0.03198031980319803,
206
+ "grad_norm": 5.567343235015869,
207
+ "learning_rate": 2.549019607843137e-06,
208
+ "loss": 0.2232,
209
+ "step": 26
210
+ },
211
+ {
212
+ "epoch": 0.033210332103321034,
213
+ "grad_norm": 4.244979381561279,
214
+ "learning_rate": 2.647058823529412e-06,
215
+ "loss": 0.1822,
216
+ "step": 27
217
+ },
218
+ {
219
+ "epoch": 0.03444034440344403,
220
+ "grad_norm": 5.674376964569092,
221
+ "learning_rate": 2.7450980392156867e-06,
222
+ "loss": 0.251,
223
+ "step": 28
224
+ },
225
+ {
226
+ "epoch": 0.03567035670356704,
227
+ "grad_norm": 6.017494201660156,
228
+ "learning_rate": 2.843137254901961e-06,
229
+ "loss": 0.2677,
230
+ "step": 29
231
+ },
232
+ {
233
+ "epoch": 0.03690036900369004,
234
+ "grad_norm": 6.415028095245361,
235
+ "learning_rate": 2.9411764705882355e-06,
236
+ "loss": 0.3252,
237
+ "step": 30
238
+ },
239
+ {
240
+ "epoch": 0.038130381303813035,
241
+ "grad_norm": 5.484204292297363,
242
+ "learning_rate": 3.03921568627451e-06,
243
+ "loss": 0.2058,
244
+ "step": 31
245
+ },
246
+ {
247
+ "epoch": 0.03936039360393604,
248
+ "grad_norm": 5.997295379638672,
249
+ "learning_rate": 3.1372549019607846e-06,
250
+ "loss": 0.3083,
251
+ "step": 32
252
+ },
253
+ {
254
+ "epoch": 0.04059040590405904,
255
+ "grad_norm": 5.527047157287598,
256
+ "learning_rate": 3.2352941176470594e-06,
257
+ "loss": 0.2109,
258
+ "step": 33
259
+ },
260
+ {
261
+ "epoch": 0.041820418204182044,
262
+ "grad_norm": 5.817302227020264,
263
+ "learning_rate": 3.3333333333333333e-06,
264
+ "loss": 0.2751,
265
+ "step": 34
266
+ },
267
+ {
268
+ "epoch": 0.04305043050430504,
269
+ "grad_norm": 5.476433753967285,
270
+ "learning_rate": 3.431372549019608e-06,
271
+ "loss": 0.2269,
272
+ "step": 35
273
+ },
274
+ {
275
+ "epoch": 0.04428044280442804,
276
+ "grad_norm": 5.363610744476318,
277
+ "learning_rate": 3.529411764705883e-06,
278
+ "loss": 0.2333,
279
+ "step": 36
280
+ },
281
+ {
282
+ "epoch": 0.04551045510455105,
283
+ "grad_norm": 6.07395601272583,
284
+ "learning_rate": 3.6274509803921573e-06,
285
+ "loss": 0.2747,
286
+ "step": 37
287
+ },
288
+ {
289
+ "epoch": 0.046740467404674045,
290
+ "grad_norm": 4.726163864135742,
291
+ "learning_rate": 3.7254901960784316e-06,
292
+ "loss": 0.1285,
293
+ "step": 38
294
+ },
295
+ {
296
+ "epoch": 0.04797047970479705,
297
+ "grad_norm": 5.783392906188965,
298
+ "learning_rate": 3.8235294117647055e-06,
299
+ "loss": 0.3659,
300
+ "step": 39
301
+ },
302
+ {
303
+ "epoch": 0.04920049200492005,
304
+ "grad_norm": 6.566931247711182,
305
+ "learning_rate": 3.92156862745098e-06,
306
+ "loss": 0.3991,
307
+ "step": 40
308
+ },
309
+ {
310
+ "epoch": 0.05043050430504305,
311
+ "grad_norm": 5.311452388763428,
312
+ "learning_rate": 4.019607843137255e-06,
313
+ "loss": 0.2647,
314
+ "step": 41
315
+ },
316
+ {
317
+ "epoch": 0.05166051660516605,
318
+ "grad_norm": 6.0737152099609375,
319
+ "learning_rate": 4.11764705882353e-06,
320
+ "loss": 0.3627,
321
+ "step": 42
322
+ },
323
+ {
324
+ "epoch": 0.05166051660516605,
325
+ "eval_loss": 0.1373225301504135,
326
+ "eval_runtime": 54.8187,
327
+ "eval_samples_per_second": 27.472,
328
+ "eval_steps_per_second": 0.219,
329
+ "eval_sts-test_pearson_cosine": 0.8846111050777101,
330
+ "eval_sts-test_pearson_dot": 0.8747554197498655,
331
+ "eval_sts-test_pearson_euclidean": 0.9089352149126115,
332
+ "eval_sts-test_pearson_manhattan": 0.9098483550214526,
333
+ "eval_sts-test_pearson_max": 0.9098483550214526,
334
+ "eval_sts-test_spearman_cosine": 0.9084485029361248,
335
+ "eval_sts-test_spearman_dot": 0.8796038088987298,
336
+ "eval_sts-test_spearman_euclidean": 0.9055790073044468,
337
+ "eval_sts-test_spearman_manhattan": 0.9063848432683216,
338
+ "eval_sts-test_spearman_max": 0.9084485029361248,
339
+ "step": 42
340
+ },
341
+ {
342
+ "epoch": 0.05289052890528905,
343
+ "grad_norm": 4.857839584350586,
344
+ "learning_rate": 4.215686274509805e-06,
345
+ "loss": 0.2026,
346
+ "step": 43
347
+ },
348
+ {
349
+ "epoch": 0.05412054120541206,
350
+ "grad_norm": 5.248873233795166,
351
+ "learning_rate": 4.313725490196079e-06,
352
+ "loss": 0.1923,
353
+ "step": 44
354
+ },
355
+ {
356
+ "epoch": 0.055350553505535055,
357
+ "grad_norm": 5.329862117767334,
358
+ "learning_rate": 4.411764705882353e-06,
359
+ "loss": 0.2369,
360
+ "step": 45
361
+ },
362
+ {
363
+ "epoch": 0.056580565805658053,
364
+ "grad_norm": 5.581146240234375,
365
+ "learning_rate": 4.509803921568628e-06,
366
+ "loss": 0.2268,
367
+ "step": 46
368
+ },
369
+ {
370
+ "epoch": 0.05781057810578106,
371
+ "grad_norm": 5.818411350250244,
372
+ "learning_rate": 4.607843137254902e-06,
373
+ "loss": 0.2975,
374
+ "step": 47
375
+ },
376
+ {
377
+ "epoch": 0.05904059040590406,
378
+ "grad_norm": 5.096602916717529,
379
+ "learning_rate": 4.705882352941177e-06,
380
+ "loss": 0.1922,
381
+ "step": 48
382
+ },
383
+ {
384
+ "epoch": 0.06027060270602706,
385
+ "grad_norm": 5.256355285644531,
386
+ "learning_rate": 4.803921568627452e-06,
387
+ "loss": 0.1906,
388
+ "step": 49
389
+ },
390
+ {
391
+ "epoch": 0.06150061500615006,
392
+ "grad_norm": 5.3927388191223145,
393
+ "learning_rate": 4.901960784313726e-06,
394
+ "loss": 0.2379,
395
+ "step": 50
396
+ },
397
+ {
398
+ "epoch": 0.06273062730627306,
399
+ "grad_norm": 6.2723846435546875,
400
+ "learning_rate": 5e-06,
401
+ "loss": 0.3796,
402
+ "step": 51
403
+ },
404
+ {
405
+ "epoch": 0.06396063960639606,
406
+ "grad_norm": 4.595238208770752,
407
+ "learning_rate": 5.098039215686274e-06,
408
+ "loss": 0.1821,
409
+ "step": 52
410
+ },
411
+ {
412
+ "epoch": 0.06519065190651907,
413
+ "grad_norm": 4.342020511627197,
414
+ "learning_rate": 5.19607843137255e-06,
415
+ "loss": 0.1257,
416
+ "step": 53
417
+ },
418
+ {
419
+ "epoch": 0.06642066420664207,
420
+ "grad_norm": 4.998225212097168,
421
+ "learning_rate": 5.294117647058824e-06,
422
+ "loss": 0.2368,
423
+ "step": 54
424
+ },
425
+ {
426
+ "epoch": 0.06765067650676507,
427
+ "grad_norm": 5.510946273803711,
428
+ "learning_rate": 5.392156862745098e-06,
429
+ "loss": 0.294,
430
+ "step": 55
431
+ },
432
+ {
433
+ "epoch": 0.06888068880688807,
434
+ "grad_norm": 4.788788318634033,
435
+ "learning_rate": 5.4901960784313735e-06,
436
+ "loss": 0.2594,
437
+ "step": 56
438
+ },
439
+ {
440
+ "epoch": 0.07011070110701106,
441
+ "grad_norm": 5.827020645141602,
442
+ "learning_rate": 5.588235294117647e-06,
443
+ "loss": 0.2972,
444
+ "step": 57
445
+ },
446
+ {
447
+ "epoch": 0.07134071340713408,
448
+ "grad_norm": 4.821737289428711,
449
+ "learning_rate": 5.686274509803922e-06,
450
+ "loss": 0.2297,
451
+ "step": 58
452
+ },
453
+ {
454
+ "epoch": 0.07257072570725707,
455
+ "grad_norm": 4.880247592926025,
456
+ "learning_rate": 5.784313725490197e-06,
457
+ "loss": 0.1487,
458
+ "step": 59
459
+ },
460
+ {
461
+ "epoch": 0.07380073800738007,
462
+ "grad_norm": 4.447835445404053,
463
+ "learning_rate": 5.882352941176471e-06,
464
+ "loss": 0.182,
465
+ "step": 60
466
+ },
467
+ {
468
+ "epoch": 0.07503075030750307,
469
+ "grad_norm": 5.5556640625,
470
+ "learning_rate": 5.980392156862746e-06,
471
+ "loss": 0.2516,
472
+ "step": 61
473
+ },
474
+ {
475
+ "epoch": 0.07626076260762607,
476
+ "grad_norm": 5.217922687530518,
477
+ "learning_rate": 6.07843137254902e-06,
478
+ "loss": 0.2809,
479
+ "step": 62
480
+ },
481
+ {
482
+ "epoch": 0.07749077490774908,
483
+ "grad_norm": 4.436608791351318,
484
+ "learning_rate": 6.176470588235295e-06,
485
+ "loss": 0.1371,
486
+ "step": 63
487
+ },
488
+ {
489
+ "epoch": 0.07749077490774908,
490
+ "eval_loss": 0.13080179691314697,
491
+ "eval_runtime": 54.9188,
492
+ "eval_samples_per_second": 27.422,
493
+ "eval_steps_per_second": 0.219,
494
+ "eval_sts-test_pearson_cosine": 0.882074513745531,
495
+ "eval_sts-test_pearson_dot": 0.8709046425878566,
496
+ "eval_sts-test_pearson_euclidean": 0.9081794284297221,
497
+ "eval_sts-test_pearson_manhattan": 0.9093974331692458,
498
+ "eval_sts-test_pearson_max": 0.9093974331692458,
499
+ "eval_sts-test_spearman_cosine": 0.9067824582257844,
500
+ "eval_sts-test_spearman_dot": 0.8757477717096785,
501
+ "eval_sts-test_spearman_euclidean": 0.9051085820447002,
502
+ "eval_sts-test_spearman_manhattan": 0.9064308923162935,
503
+ "eval_sts-test_spearman_max": 0.9067824582257844,
504
+ "step": 63
505
+ },
506
+ {
507
+ "epoch": 0.07872078720787208,
508
+ "grad_norm": 5.6947021484375,
509
+ "learning_rate": 6.274509803921569e-06,
510
+ "loss": 0.2149,
511
+ "step": 64
512
+ },
513
+ {
514
+ "epoch": 0.07995079950799508,
515
+ "grad_norm": 4.272282600402832,
516
+ "learning_rate": 6.372549019607843e-06,
517
+ "loss": 0.1806,
518
+ "step": 65
519
+ },
520
+ {
521
+ "epoch": 0.08118081180811808,
522
+ "grad_norm": 4.575979232788086,
523
+ "learning_rate": 6.470588235294119e-06,
524
+ "loss": 0.1458,
525
+ "step": 66
526
+ },
527
+ {
528
+ "epoch": 0.08241082410824108,
529
+ "grad_norm": 4.315216541290283,
530
+ "learning_rate": 6.568627450980393e-06,
531
+ "loss": 0.249,
532
+ "step": 67
533
+ },
534
+ {
535
+ "epoch": 0.08364083640836409,
536
+ "grad_norm": 5.67277193069458,
537
+ "learning_rate": 6.666666666666667e-06,
538
+ "loss": 0.2787,
539
+ "step": 68
540
+ },
541
+ {
542
+ "epoch": 0.08487084870848709,
543
+ "grad_norm": 5.964886665344238,
544
+ "learning_rate": 6.764705882352942e-06,
545
+ "loss": 0.288,
546
+ "step": 69
547
+ },
548
+ {
549
+ "epoch": 0.08610086100861009,
550
+ "grad_norm": 4.218502521514893,
551
+ "learning_rate": 6.862745098039216e-06,
552
+ "loss": 0.1461,
553
+ "step": 70
554
+ },
555
+ {
556
+ "epoch": 0.08733087330873308,
557
+ "grad_norm": 5.179543972015381,
558
+ "learning_rate": 6.96078431372549e-06,
559
+ "loss": 0.2304,
560
+ "step": 71
561
+ },
562
+ {
563
+ "epoch": 0.08856088560885608,
564
+ "grad_norm": 5.720668792724609,
565
+ "learning_rate": 7.058823529411766e-06,
566
+ "loss": 0.3505,
567
+ "step": 72
568
+ },
569
+ {
570
+ "epoch": 0.0897908979089791,
571
+ "grad_norm": 5.2965497970581055,
572
+ "learning_rate": 7.15686274509804e-06,
573
+ "loss": 0.2227,
574
+ "step": 73
575
+ },
576
+ {
577
+ "epoch": 0.0910209102091021,
578
+ "grad_norm": 4.685606956481934,
579
+ "learning_rate": 7.2549019607843145e-06,
580
+ "loss": 0.1746,
581
+ "step": 74
582
+ },
583
+ {
584
+ "epoch": 0.09225092250922509,
585
+ "grad_norm": 4.2930145263671875,
586
+ "learning_rate": 7.352941176470589e-06,
587
+ "loss": 0.1484,
588
+ "step": 75
589
+ },
590
+ {
591
+ "epoch": 0.09348093480934809,
592
+ "grad_norm": 3.764916181564331,
593
+ "learning_rate": 7.450980392156863e-06,
594
+ "loss": 0.1346,
595
+ "step": 76
596
+ },
597
+ {
598
+ "epoch": 0.09471094710947109,
599
+ "grad_norm": 5.033151626586914,
600
+ "learning_rate": 7.549019607843138e-06,
601
+ "loss": 0.2112,
602
+ "step": 77
603
+ },
604
+ {
605
+ "epoch": 0.0959409594095941,
606
+ "grad_norm": 5.817330837249756,
607
+ "learning_rate": 7.647058823529411e-06,
608
+ "loss": 0.3138,
609
+ "step": 78
610
+ },
611
+ {
612
+ "epoch": 0.0971709717097171,
613
+ "grad_norm": 6.147035121917725,
614
+ "learning_rate": 7.745098039215687e-06,
615
+ "loss": 0.2675,
616
+ "step": 79
617
+ },
618
+ {
619
+ "epoch": 0.0984009840098401,
620
+ "grad_norm": 5.131881237030029,
621
+ "learning_rate": 7.84313725490196e-06,
622
+ "loss": 0.2849,
623
+ "step": 80
624
+ },
625
+ {
626
+ "epoch": 0.0996309963099631,
627
+ "grad_norm": 4.2269368171691895,
628
+ "learning_rate": 7.941176470588236e-06,
629
+ "loss": 0.1719,
630
+ "step": 81
631
+ },
632
+ {
633
+ "epoch": 0.1008610086100861,
634
+ "grad_norm": 5.200590133666992,
635
+ "learning_rate": 8.03921568627451e-06,
636
+ "loss": 0.2749,
637
+ "step": 82
638
+ },
639
+ {
640
+ "epoch": 0.10209102091020911,
641
+ "grad_norm": 5.44044303894043,
642
+ "learning_rate": 8.137254901960784e-06,
643
+ "loss": 0.3097,
644
+ "step": 83
645
+ },
646
+ {
647
+ "epoch": 0.1033210332103321,
648
+ "grad_norm": 4.603049278259277,
649
+ "learning_rate": 8.23529411764706e-06,
650
+ "loss": 0.2068,
651
+ "step": 84
652
+ },
653
+ {
654
+ "epoch": 0.1033210332103321,
655
+ "eval_loss": 0.1260141134262085,
656
+ "eval_runtime": 54.8932,
657
+ "eval_samples_per_second": 27.435,
658
+ "eval_steps_per_second": 0.219,
659
+ "eval_sts-test_pearson_cosine": 0.8775839612260851,
660
+ "eval_sts-test_pearson_dot": 0.8664914414909934,
661
+ "eval_sts-test_pearson_euclidean": 0.9054210798291935,
662
+ "eval_sts-test_pearson_manhattan": 0.9069843565115414,
663
+ "eval_sts-test_pearson_max": 0.9069843565115414,
664
+ "eval_sts-test_spearman_cosine": 0.9044597335057865,
665
+ "eval_sts-test_spearman_dot": 0.872940077569982,
666
+ "eval_sts-test_spearman_euclidean": 0.9027100934391671,
667
+ "eval_sts-test_spearman_manhattan": 0.904476380975024,
668
+ "eval_sts-test_spearman_max": 0.904476380975024,
669
+ "step": 84
670
+ },
671
+ {
672
+ "epoch": 0.1045510455104551,
673
+ "grad_norm": 4.813210964202881,
674
+ "learning_rate": 8.333333333333334e-06,
675
+ "loss": 0.22,
676
+ "step": 85
677
+ },
678
+ {
679
+ "epoch": 0.1057810578105781,
680
+ "grad_norm": 4.659386157989502,
681
+ "learning_rate": 8.43137254901961e-06,
682
+ "loss": 0.2977,
683
+ "step": 86
684
+ },
685
+ {
686
+ "epoch": 0.1070110701107011,
687
+ "grad_norm": 4.895315647125244,
688
+ "learning_rate": 8.529411764705883e-06,
689
+ "loss": 0.209,
690
+ "step": 87
691
+ },
692
+ {
693
+ "epoch": 0.10824108241082411,
694
+ "grad_norm": 5.339110851287842,
695
+ "learning_rate": 8.627450980392157e-06,
696
+ "loss": 0.2215,
697
+ "step": 88
698
+ },
699
+ {
700
+ "epoch": 0.10947109471094711,
701
+ "grad_norm": 4.615406036376953,
702
+ "learning_rate": 8.725490196078433e-06,
703
+ "loss": 0.1948,
704
+ "step": 89
705
+ },
706
+ {
707
+ "epoch": 0.11070110701107011,
708
+ "grad_norm": 5.0383734703063965,
709
+ "learning_rate": 8.823529411764707e-06,
710
+ "loss": 0.2084,
711
+ "step": 90
712
+ },
713
+ {
714
+ "epoch": 0.11193111931119311,
715
+ "grad_norm": 3.9511592388153076,
716
+ "learning_rate": 8.921568627450982e-06,
717
+ "loss": 0.1823,
718
+ "step": 91
719
+ },
720
+ {
721
+ "epoch": 0.11316113161131611,
722
+ "grad_norm": 5.13690710067749,
723
+ "learning_rate": 9.019607843137256e-06,
724
+ "loss": 0.255,
725
+ "step": 92
726
+ },
727
+ {
728
+ "epoch": 0.11439114391143912,
729
+ "grad_norm": 5.1460747718811035,
730
+ "learning_rate": 9.11764705882353e-06,
731
+ "loss": 0.2675,
732
+ "step": 93
733
+ },
734
+ {
735
+ "epoch": 0.11562115621156212,
736
+ "grad_norm": 4.207213878631592,
737
+ "learning_rate": 9.215686274509804e-06,
738
+ "loss": 0.18,
739
+ "step": 94
740
+ },
741
+ {
742
+ "epoch": 0.11685116851168512,
743
+ "grad_norm": 4.802348613739014,
744
+ "learning_rate": 9.31372549019608e-06,
745
+ "loss": 0.2891,
746
+ "step": 95
747
+ },
748
+ {
749
+ "epoch": 0.11808118081180811,
750
+ "grad_norm": 4.9332966804504395,
751
+ "learning_rate": 9.411764705882354e-06,
752
+ "loss": 0.253,
753
+ "step": 96
754
+ },
755
+ {
756
+ "epoch": 0.11931119311193111,
757
+ "grad_norm": 5.841371536254883,
758
+ "learning_rate": 9.509803921568628e-06,
759
+ "loss": 0.3481,
760
+ "step": 97
761
+ },
762
+ {
763
+ "epoch": 0.12054120541205413,
764
+ "grad_norm": 3.70485782623291,
765
+ "learning_rate": 9.607843137254903e-06,
766
+ "loss": 0.1688,
767
+ "step": 98
768
+ },
769
+ {
770
+ "epoch": 0.12177121771217712,
771
+ "grad_norm": 4.415471076965332,
772
+ "learning_rate": 9.705882352941177e-06,
773
+ "loss": 0.1808,
774
+ "step": 99
775
+ },
776
+ {
777
+ "epoch": 0.12300123001230012,
778
+ "grad_norm": 5.058602809906006,
779
+ "learning_rate": 9.803921568627451e-06,
780
+ "loss": 0.2821,
781
+ "step": 100
782
+ },
783
+ {
784
+ "epoch": 0.12423124231242312,
785
+ "grad_norm": 4.303729057312012,
786
+ "learning_rate": 9.901960784313727e-06,
787
+ "loss": 0.1856,
788
+ "step": 101
789
+ },
790
+ {
791
+ "epoch": 0.12546125461254612,
792
+ "grad_norm": 4.048065185546875,
793
+ "learning_rate": 1e-05,
794
+ "loss": 0.1441,
795
+ "step": 102
796
+ },
797
+ {
798
+ "epoch": 0.12669126691266913,
799
+ "grad_norm": 4.463968753814697,
800
+ "learning_rate": 1.0098039215686275e-05,
801
+ "loss": 0.226,
802
+ "step": 103
803
+ },
804
+ {
805
+ "epoch": 0.12792127921279212,
806
+ "grad_norm": 3.401120901107788,
807
+ "learning_rate": 1.0196078431372549e-05,
808
+ "loss": 0.1662,
809
+ "step": 104
810
+ },
811
+ {
812
+ "epoch": 0.12915129151291513,
813
+ "grad_norm": 4.119345188140869,
814
+ "learning_rate": 1.0294117647058823e-05,
815
+ "loss": 0.2043,
816
+ "step": 105
817
+ },
818
+ {
819
+ "epoch": 0.12915129151291513,
820
+ "eval_loss": 0.11874283850193024,
821
+ "eval_runtime": 54.7282,
822
+ "eval_samples_per_second": 27.518,
823
+ "eval_steps_per_second": 0.219,
824
+ "eval_sts-test_pearson_cosine": 0.8767520821963045,
825
+ "eval_sts-test_pearson_dot": 0.8648481444888331,
826
+ "eval_sts-test_pearson_euclidean": 0.9053937497921556,
827
+ "eval_sts-test_pearson_manhattan": 0.9071737646452815,
828
+ "eval_sts-test_pearson_max": 0.9071737646452815,
829
+ "eval_sts-test_spearman_cosine": 0.9050983787571032,
830
+ "eval_sts-test_spearman_dot": 0.8730474805973213,
831
+ "eval_sts-test_spearman_euclidean": 0.9035385735413058,
832
+ "eval_sts-test_spearman_manhattan": 0.9054231834122819,
833
+ "eval_sts-test_spearman_max": 0.9054231834122819,
834
+ "step": 105
835
+ },
836
+ {
837
+ "epoch": 0.13038130381303814,
838
+ "grad_norm": 5.826413154602051,
839
+ "learning_rate": 1.03921568627451e-05,
840
+ "loss": 0.3907,
841
+ "step": 106
842
+ },
843
+ {
844
+ "epoch": 0.13161131611316113,
845
+ "grad_norm": 3.2629737854003906,
846
+ "learning_rate": 1.0490196078431374e-05,
847
+ "loss": 0.1332,
848
+ "step": 107
849
+ },
850
+ {
851
+ "epoch": 0.13284132841328414,
852
+ "grad_norm": 4.044755458831787,
853
+ "learning_rate": 1.0588235294117648e-05,
854
+ "loss": 0.2243,
855
+ "step": 108
856
+ },
857
+ {
858
+ "epoch": 0.13407134071340712,
859
+ "grad_norm": 3.9784040451049805,
860
+ "learning_rate": 1.0686274509803922e-05,
861
+ "loss": 0.162,
862
+ "step": 109
863
+ },
864
+ {
865
+ "epoch": 0.13530135301353013,
866
+ "grad_norm": 3.1851444244384766,
867
+ "learning_rate": 1.0784313725490196e-05,
868
+ "loss": 0.1481,
869
+ "step": 110
870
+ },
871
+ {
872
+ "epoch": 0.13653136531365315,
873
+ "grad_norm": 4.281413555145264,
874
+ "learning_rate": 1.0882352941176471e-05,
875
+ "loss": 0.2163,
876
+ "step": 111
877
+ },
878
+ {
879
+ "epoch": 0.13776137761377613,
880
+ "grad_norm": 4.62849235534668,
881
+ "learning_rate": 1.0980392156862747e-05,
882
+ "loss": 0.24,
883
+ "step": 112
884
+ },
885
+ {
886
+ "epoch": 0.13899138991389914,
887
+ "grad_norm": 3.92616868019104,
888
+ "learning_rate": 1.1078431372549021e-05,
889
+ "loss": 0.1406,
890
+ "step": 113
891
+ },
892
+ {
893
+ "epoch": 0.14022140221402213,
894
+ "grad_norm": 3.8505780696868896,
895
+ "learning_rate": 1.1176470588235295e-05,
896
+ "loss": 0.1522,
897
+ "step": 114
898
+ },
899
+ {
900
+ "epoch": 0.14145141451414514,
901
+ "grad_norm": 5.220509052276611,
902
+ "learning_rate": 1.1274509803921569e-05,
903
+ "loss": 0.2593,
904
+ "step": 115
905
+ },
906
+ {
907
+ "epoch": 0.14268142681426815,
908
+ "grad_norm": 4.459743499755859,
909
+ "learning_rate": 1.1372549019607844e-05,
910
+ "loss": 0.2426,
911
+ "step": 116
912
+ },
913
+ {
914
+ "epoch": 0.14391143911439114,
915
+ "grad_norm": 4.434360504150391,
916
+ "learning_rate": 1.1470588235294118e-05,
917
+ "loss": 0.1781,
918
+ "step": 117
919
+ },
920
+ {
921
+ "epoch": 0.14514145141451415,
922
+ "grad_norm": 4.638584613800049,
923
+ "learning_rate": 1.1568627450980394e-05,
924
+ "loss": 0.264,
925
+ "step": 118
926
+ },
927
+ {
928
+ "epoch": 0.14637146371463713,
929
+ "grad_norm": 4.5364484786987305,
930
+ "learning_rate": 1.1666666666666668e-05,
931
+ "loss": 0.1944,
932
+ "step": 119
933
+ },
934
+ {
935
+ "epoch": 0.14760147601476015,
936
+ "grad_norm": 3.597980499267578,
937
+ "learning_rate": 1.1764705882352942e-05,
938
+ "loss": 0.1341,
939
+ "step": 120
940
+ },
941
+ {
942
+ "epoch": 0.14883148831488316,
943
+ "grad_norm": 3.5174648761749268,
944
+ "learning_rate": 1.1862745098039217e-05,
945
+ "loss": 0.155,
946
+ "step": 121
947
+ },
948
+ {
949
+ "epoch": 0.15006150061500614,
950
+ "grad_norm": 4.771029949188232,
951
+ "learning_rate": 1.1960784313725491e-05,
952
+ "loss": 0.2052,
953
+ "step": 122
954
+ },
955
+ {
956
+ "epoch": 0.15129151291512916,
957
+ "grad_norm": 4.15376615524292,
958
+ "learning_rate": 1.2058823529411765e-05,
959
+ "loss": 0.2023,
960
+ "step": 123
961
+ },
962
+ {
963
+ "epoch": 0.15252152521525214,
964
+ "grad_norm": 3.5796732902526855,
965
+ "learning_rate": 1.215686274509804e-05,
966
+ "loss": 0.1519,
967
+ "step": 124
968
+ },
969
+ {
970
+ "epoch": 0.15375153751537515,
971
+ "grad_norm": 3.759777545928955,
972
+ "learning_rate": 1.2254901960784315e-05,
973
+ "loss": 0.2118,
974
+ "step": 125
975
+ },
976
+ {
977
+ "epoch": 0.15498154981549817,
978
+ "grad_norm": 4.691242218017578,
979
+ "learning_rate": 1.235294117647059e-05,
980
+ "loss": 0.2489,
981
+ "step": 126
982
+ },
983
+ {
984
+ "epoch": 0.15498154981549817,
985
+ "eval_loss": 0.11467884480953217,
986
+ "eval_runtime": 54.6969,
987
+ "eval_samples_per_second": 27.534,
988
+ "eval_steps_per_second": 0.219,
989
+ "eval_sts-test_pearson_cosine": 0.8783763084873629,
990
+ "eval_sts-test_pearson_dot": 0.8674040012483692,
991
+ "eval_sts-test_pearson_euclidean": 0.9069725735634968,
992
+ "eval_sts-test_pearson_manhattan": 0.908783443457056,
993
+ "eval_sts-test_pearson_max": 0.908783443457056,
994
+ "eval_sts-test_spearman_cosine": 0.9058364613112314,
995
+ "eval_sts-test_spearman_dot": 0.8738751104254939,
996
+ "eval_sts-test_spearman_euclidean": 0.904588080123457,
997
+ "eval_sts-test_spearman_manhattan": 0.9067583820471556,
998
+ "eval_sts-test_spearman_max": 0.9067583820471556,
999
+ "step": 126
1000
+ },
1001
+ {
1002
+ "epoch": 0.15621156211562115,
1003
+ "grad_norm": 4.087611198425293,
1004
+ "learning_rate": 1.2450980392156864e-05,
1005
+ "loss": 0.1988,
1006
+ "step": 127
1007
+ },
1008
+ {
1009
+ "epoch": 0.15744157441574416,
1010
+ "grad_norm": 3.754612684249878,
1011
+ "learning_rate": 1.2549019607843138e-05,
1012
+ "loss": 0.1541,
1013
+ "step": 128
1014
+ },
1015
+ {
1016
+ "epoch": 0.15867158671586715,
1017
+ "grad_norm": 3.9258835315704346,
1018
+ "learning_rate": 1.2647058823529412e-05,
1019
+ "loss": 0.1819,
1020
+ "step": 129
1021
+ },
1022
+ {
1023
+ "epoch": 0.15990159901599016,
1024
+ "grad_norm": 3.88478422164917,
1025
+ "learning_rate": 1.2745098039215686e-05,
1026
+ "loss": 0.1582,
1027
+ "step": 130
1028
+ },
1029
+ {
1030
+ "epoch": 0.16113161131611317,
1031
+ "grad_norm": 4.9845428466796875,
1032
+ "learning_rate": 1.2843137254901964e-05,
1033
+ "loss": 0.2866,
1034
+ "step": 131
1035
+ },
1036
+ {
1037
+ "epoch": 0.16236162361623616,
1038
+ "grad_norm": 4.692960262298584,
1039
+ "learning_rate": 1.2941176470588238e-05,
1040
+ "loss": 0.2766,
1041
+ "step": 132
1042
+ },
1043
+ {
1044
+ "epoch": 0.16359163591635917,
1045
+ "grad_norm": 3.9432125091552734,
1046
+ "learning_rate": 1.3039215686274511e-05,
1047
+ "loss": 0.1299,
1048
+ "step": 133
1049
+ },
1050
+ {
1051
+ "epoch": 0.16482164821648215,
1052
+ "grad_norm": 4.439709663391113,
1053
+ "learning_rate": 1.3137254901960785e-05,
1054
+ "loss": 0.2558,
1055
+ "step": 134
1056
+ },
1057
+ {
1058
+ "epoch": 0.16605166051660517,
1059
+ "grad_norm": 3.631169319152832,
1060
+ "learning_rate": 1.323529411764706e-05,
1061
+ "loss": 0.1687,
1062
+ "step": 135
1063
+ },
1064
+ {
1065
+ "epoch": 0.16728167281672818,
1066
+ "grad_norm": 4.130221843719482,
1067
+ "learning_rate": 1.3333333333333333e-05,
1068
+ "loss": 0.173,
1069
+ "step": 136
1070
+ },
1071
+ {
1072
+ "epoch": 0.16851168511685116,
1073
+ "grad_norm": 4.169937610626221,
1074
+ "learning_rate": 1.3431372549019607e-05,
1075
+ "loss": 0.2276,
1076
+ "step": 137
1077
+ },
1078
+ {
1079
+ "epoch": 0.16974169741697417,
1080
+ "grad_norm": 4.4349751472473145,
1081
+ "learning_rate": 1.3529411764705885e-05,
1082
+ "loss": 0.2174,
1083
+ "step": 138
1084
+ },
1085
+ {
1086
+ "epoch": 0.17097170971709716,
1087
+ "grad_norm": 4.688521862030029,
1088
+ "learning_rate": 1.3627450980392158e-05,
1089
+ "loss": 0.2666,
1090
+ "step": 139
1091
+ },
1092
+ {
1093
+ "epoch": 0.17220172201722017,
1094
+ "grad_norm": 3.7199971675872803,
1095
+ "learning_rate": 1.3725490196078432e-05,
1096
+ "loss": 0.1524,
1097
+ "step": 140
1098
+ },
1099
+ {
1100
+ "epoch": 0.17343173431734318,
1101
+ "grad_norm": 2.8609495162963867,
1102
+ "learning_rate": 1.3823529411764706e-05,
1103
+ "loss": 0.1179,
1104
+ "step": 141
1105
+ },
1106
+ {
1107
+ "epoch": 0.17466174661746617,
1108
+ "grad_norm": 4.374091625213623,
1109
+ "learning_rate": 1.392156862745098e-05,
1110
+ "loss": 0.2475,
1111
+ "step": 142
1112
+ },
1113
+ {
1114
+ "epoch": 0.17589175891758918,
1115
+ "grad_norm": 5.200084209442139,
1116
+ "learning_rate": 1.4019607843137256e-05,
1117
+ "loss": 0.2662,
1118
+ "step": 143
1119
+ },
1120
+ {
1121
+ "epoch": 0.17712177121771217,
1122
+ "grad_norm": 3.720994710922241,
1123
+ "learning_rate": 1.4117647058823532e-05,
1124
+ "loss": 0.1596,
1125
+ "step": 144
1126
+ },
1127
+ {
1128
+ "epoch": 0.17835178351783518,
1129
+ "grad_norm": 3.991046905517578,
1130
+ "learning_rate": 1.4215686274509805e-05,
1131
+ "loss": 0.2331,
1132
+ "step": 145
1133
+ },
1134
+ {
1135
+ "epoch": 0.1795817958179582,
1136
+ "grad_norm": 4.76691198348999,
1137
+ "learning_rate": 1.431372549019608e-05,
1138
+ "loss": 0.2905,
1139
+ "step": 146
1140
+ },
1141
+ {
1142
+ "epoch": 0.18081180811808117,
1143
+ "grad_norm": 3.6453163623809814,
1144
+ "learning_rate": 1.4411764705882353e-05,
1145
+ "loss": 0.1342,
1146
+ "step": 147
1147
+ },
1148
+ {
1149
+ "epoch": 0.18081180811808117,
1150
+ "eval_loss": 0.10875426232814789,
1151
+ "eval_runtime": 54.7153,
1152
+ "eval_samples_per_second": 27.524,
1153
+ "eval_steps_per_second": 0.219,
1154
+ "eval_sts-test_pearson_cosine": 0.8768801472189502,
1155
+ "eval_sts-test_pearson_dot": 0.8620776961156391,
1156
+ "eval_sts-test_pearson_euclidean": 0.9073367408863471,
1157
+ "eval_sts-test_pearson_manhattan": 0.9086519830687241,
1158
+ "eval_sts-test_pearson_max": 0.9086519830687241,
1159
+ "eval_sts-test_spearman_cosine": 0.905147068129497,
1160
+ "eval_sts-test_spearman_dot": 0.869011811845045,
1161
+ "eval_sts-test_spearman_euclidean": 0.9050077574527855,
1162
+ "eval_sts-test_spearman_manhattan": 0.9068115017944273,
1163
+ "eval_sts-test_spearman_max": 0.9068115017944273,
1164
+ "step": 147
1165
+ },
1166
+ {
1167
+ "epoch": 0.1820418204182042,
1168
+ "grad_norm": 2.6028358936309814,
1169
+ "learning_rate": 1.4509803921568629e-05,
1170
+ "loss": 0.0839,
1171
+ "step": 148
1172
+ },
1173
+ {
1174
+ "epoch": 0.18327183271832717,
1175
+ "grad_norm": 4.445943832397461,
1176
+ "learning_rate": 1.4607843137254903e-05,
1177
+ "loss": 0.2055,
1178
+ "step": 149
1179
+ },
1180
+ {
1181
+ "epoch": 0.18450184501845018,
1182
+ "grad_norm": 4.500098705291748,
1183
+ "learning_rate": 1.4705882352941179e-05,
1184
+ "loss": 0.2196,
1185
+ "step": 150
1186
+ },
1187
+ {
1188
+ "epoch": 0.1857318573185732,
1189
+ "grad_norm": 4.317416667938232,
1190
+ "learning_rate": 1.4803921568627453e-05,
1191
+ "loss": 0.2283,
1192
+ "step": 151
1193
+ },
1194
+ {
1195
+ "epoch": 0.18696186961869618,
1196
+ "grad_norm": 4.395689010620117,
1197
+ "learning_rate": 1.4901960784313726e-05,
1198
+ "loss": 0.2105,
1199
+ "step": 152
1200
+ },
1201
+ {
1202
+ "epoch": 0.1881918819188192,
1203
+ "grad_norm": 3.5757391452789307,
1204
+ "learning_rate": 1.5000000000000002e-05,
1205
+ "loss": 0.1534,
1206
+ "step": 153
1207
+ },
1208
+ {
1209
+ "epoch": 0.18942189421894218,
1210
+ "grad_norm": 3.860861301422119,
1211
+ "learning_rate": 1.5098039215686276e-05,
1212
+ "loss": 0.1954,
1213
+ "step": 154
1214
+ },
1215
+ {
1216
+ "epoch": 0.1906519065190652,
1217
+ "grad_norm": 3.4191622734069824,
1218
+ "learning_rate": 1.519607843137255e-05,
1219
+ "loss": 0.1332,
1220
+ "step": 155
1221
+ },
1222
+ {
1223
+ "epoch": 0.1918819188191882,
1224
+ "grad_norm": 3.8505654335021973,
1225
+ "learning_rate": 1.5294117647058822e-05,
1226
+ "loss": 0.19,
1227
+ "step": 156
1228
+ },
1229
+ {
1230
+ "epoch": 0.1931119311193112,
1231
+ "grad_norm": 4.127209663391113,
1232
+ "learning_rate": 1.53921568627451e-05,
1233
+ "loss": 0.1878,
1234
+ "step": 157
1235
+ },
1236
+ {
1237
+ "epoch": 0.1943419434194342,
1238
+ "grad_norm": 3.7976646423339844,
1239
+ "learning_rate": 1.5490196078431373e-05,
1240
+ "loss": 0.1518,
1241
+ "step": 158
1242
+ },
1243
+ {
1244
+ "epoch": 0.19557195571955718,
1245
+ "grad_norm": 4.613111972808838,
1246
+ "learning_rate": 1.558823529411765e-05,
1247
+ "loss": 0.1906,
1248
+ "step": 159
1249
+ },
1250
+ {
1251
+ "epoch": 0.1968019680196802,
1252
+ "grad_norm": 3.911393880844116,
1253
+ "learning_rate": 1.568627450980392e-05,
1254
+ "loss": 0.155,
1255
+ "step": 160
1256
+ },
1257
+ {
1258
+ "epoch": 0.1980319803198032,
1259
+ "grad_norm": 3.694939374923706,
1260
+ "learning_rate": 1.5784313725490197e-05,
1261
+ "loss": 0.1519,
1262
+ "step": 161
1263
+ },
1264
+ {
1265
+ "epoch": 0.1992619926199262,
1266
+ "grad_norm": 4.334694862365723,
1267
+ "learning_rate": 1.5882352941176473e-05,
1268
+ "loss": 0.1726,
1269
+ "step": 162
1270
+ },
1271
+ {
1272
+ "epoch": 0.2004920049200492,
1273
+ "grad_norm": 3.6630055904388428,
1274
+ "learning_rate": 1.5980392156862748e-05,
1275
+ "loss": 0.1618,
1276
+ "step": 163
1277
+ },
1278
+ {
1279
+ "epoch": 0.2017220172201722,
1280
+ "grad_norm": 4.7789130210876465,
1281
+ "learning_rate": 1.607843137254902e-05,
1282
+ "loss": 0.2767,
1283
+ "step": 164
1284
+ },
1285
+ {
1286
+ "epoch": 0.2029520295202952,
1287
+ "grad_norm": 4.171343803405762,
1288
+ "learning_rate": 1.6176470588235296e-05,
1289
+ "loss": 0.1996,
1290
+ "step": 165
1291
+ },
1292
+ {
1293
+ "epoch": 0.20418204182041821,
1294
+ "grad_norm": 4.386513710021973,
1295
+ "learning_rate": 1.627450980392157e-05,
1296
+ "loss": 0.1907,
1297
+ "step": 166
1298
+ },
1299
+ {
1300
+ "epoch": 0.2054120541205412,
1301
+ "grad_norm": 4.183532238006592,
1302
+ "learning_rate": 1.6372549019607844e-05,
1303
+ "loss": 0.1928,
1304
+ "step": 167
1305
+ },
1306
+ {
1307
+ "epoch": 0.2066420664206642,
1308
+ "grad_norm": 3.8950257301330566,
1309
+ "learning_rate": 1.647058823529412e-05,
1310
+ "loss": 0.1507,
1311
+ "step": 168
1312
+ },
1313
+ {
1314
+ "epoch": 0.2066420664206642,
1315
+ "eval_loss": 0.10821738839149475,
1316
+ "eval_runtime": 54.7389,
1317
+ "eval_samples_per_second": 27.512,
1318
+ "eval_steps_per_second": 0.219,
1319
+ "eval_sts-test_pearson_cosine": 0.8772991124680096,
1320
+ "eval_sts-test_pearson_dot": 0.861322579093208,
1321
+ "eval_sts-test_pearson_euclidean": 0.9072621517681675,
1322
+ "eval_sts-test_pearson_manhattan": 0.9086600802981594,
1323
+ "eval_sts-test_pearson_max": 0.9086600802981594,
1324
+ "eval_sts-test_spearman_cosine": 0.9044609865411055,
1325
+ "eval_sts-test_spearman_dot": 0.8661539962925903,
1326
+ "eval_sts-test_spearman_euclidean": 0.904084091417667,
1327
+ "eval_sts-test_spearman_manhattan": 0.9054917423447336,
1328
+ "eval_sts-test_spearman_max": 0.9054917423447336,
1329
+ "step": 168
1330
+ },
1331
+ {
1332
+ "epoch": 0.2078720787207872,
1333
+ "grad_norm": 4.002283096313477,
1334
+ "learning_rate": 1.6568627450980395e-05,
1335
+ "loss": 0.1637,
1336
+ "step": 169
1337
+ },
1338
+ {
1339
+ "epoch": 0.2091020910209102,
1340
+ "grad_norm": 4.142872333526611,
1341
+ "learning_rate": 1.6666666666666667e-05,
1342
+ "loss": 0.1687,
1343
+ "step": 170
1344
+ },
1345
+ {
1346
+ "epoch": 0.21033210332103322,
1347
+ "grad_norm": 4.345719337463379,
1348
+ "learning_rate": 1.6764705882352943e-05,
1349
+ "loss": 0.2181,
1350
+ "step": 171
1351
+ },
1352
+ {
1353
+ "epoch": 0.2115621156211562,
1354
+ "grad_norm": 3.7364888191223145,
1355
+ "learning_rate": 1.686274509803922e-05,
1356
+ "loss": 0.1496,
1357
+ "step": 172
1358
+ },
1359
+ {
1360
+ "epoch": 0.21279212792127922,
1361
+ "grad_norm": 4.202157974243164,
1362
+ "learning_rate": 1.696078431372549e-05,
1363
+ "loss": 0.1749,
1364
+ "step": 173
1365
+ },
1366
+ {
1367
+ "epoch": 0.2140221402214022,
1368
+ "grad_norm": 4.639451503753662,
1369
+ "learning_rate": 1.7058823529411767e-05,
1370
+ "loss": 0.2374,
1371
+ "step": 174
1372
+ },
1373
+ {
1374
+ "epoch": 0.21525215252152521,
1375
+ "grad_norm": 4.011781215667725,
1376
+ "learning_rate": 1.715686274509804e-05,
1377
+ "loss": 0.2122,
1378
+ "step": 175
1379
+ },
1380
+ {
1381
+ "epoch": 0.21648216482164823,
1382
+ "grad_norm": 4.113095760345459,
1383
+ "learning_rate": 1.7254901960784314e-05,
1384
+ "loss": 0.1617,
1385
+ "step": 176
1386
+ },
1387
+ {
1388
+ "epoch": 0.2177121771217712,
1389
+ "grad_norm": 4.0442681312561035,
1390
+ "learning_rate": 1.735294117647059e-05,
1391
+ "loss": 0.168,
1392
+ "step": 177
1393
+ },
1394
+ {
1395
+ "epoch": 0.21894218942189422,
1396
+ "grad_norm": 4.375425338745117,
1397
+ "learning_rate": 1.7450980392156866e-05,
1398
+ "loss": 0.263,
1399
+ "step": 178
1400
+ },
1401
+ {
1402
+ "epoch": 0.2201722017220172,
1403
+ "grad_norm": 3.2303390502929688,
1404
+ "learning_rate": 1.7549019607843138e-05,
1405
+ "loss": 0.1328,
1406
+ "step": 179
1407
+ },
1408
+ {
1409
+ "epoch": 0.22140221402214022,
1410
+ "grad_norm": 4.832092761993408,
1411
+ "learning_rate": 1.7647058823529414e-05,
1412
+ "loss": 0.3157,
1413
+ "step": 180
1414
+ },
1415
+ {
1416
+ "epoch": 0.22263222632226323,
1417
+ "grad_norm": 3.57254695892334,
1418
+ "learning_rate": 1.7745098039215686e-05,
1419
+ "loss": 0.2164,
1420
+ "step": 181
1421
+ },
1422
+ {
1423
+ "epoch": 0.22386223862238622,
1424
+ "grad_norm": 3.135535717010498,
1425
+ "learning_rate": 1.7843137254901965e-05,
1426
+ "loss": 0.1255,
1427
+ "step": 182
1428
+ },
1429
+ {
1430
+ "epoch": 0.22509225092250923,
1431
+ "grad_norm": 4.719324588775635,
1432
+ "learning_rate": 1.7941176470588237e-05,
1433
+ "loss": 0.2863,
1434
+ "step": 183
1435
+ },
1436
+ {
1437
+ "epoch": 0.22632226322263221,
1438
+ "grad_norm": 3.8961801528930664,
1439
+ "learning_rate": 1.8039215686274513e-05,
1440
+ "loss": 0.155,
1441
+ "step": 184
1442
+ },
1443
+ {
1444
+ "epoch": 0.22755227552275523,
1445
+ "grad_norm": 2.8389103412628174,
1446
+ "learning_rate": 1.8137254901960785e-05,
1447
+ "loss": 0.1271,
1448
+ "step": 185
1449
+ },
1450
+ {
1451
+ "epoch": 0.22878228782287824,
1452
+ "grad_norm": 4.103536128997803,
1453
+ "learning_rate": 1.823529411764706e-05,
1454
+ "loss": 0.216,
1455
+ "step": 186
1456
+ },
1457
+ {
1458
+ "epoch": 0.23001230012300122,
1459
+ "grad_norm": 4.006705284118652,
1460
+ "learning_rate": 1.8333333333333333e-05,
1461
+ "loss": 0.205,
1462
+ "step": 187
1463
+ },
1464
+ {
1465
+ "epoch": 0.23124231242312424,
1466
+ "grad_norm": 3.424255847930908,
1467
+ "learning_rate": 1.843137254901961e-05,
1468
+ "loss": 0.1575,
1469
+ "step": 188
1470
+ },
1471
+ {
1472
+ "epoch": 0.23247232472324722,
1473
+ "grad_norm": 4.568851947784424,
1474
+ "learning_rate": 1.8529411764705884e-05,
1475
+ "loss": 0.1939,
1476
+ "step": 189
1477
+ },
1478
+ {
1479
+ "epoch": 0.23247232472324722,
1480
+ "eval_loss": 0.1056687980890274,
1481
+ "eval_runtime": 54.7136,
1482
+ "eval_samples_per_second": 27.525,
1483
+ "eval_steps_per_second": 0.219,
1484
+ "eval_sts-test_pearson_cosine": 0.8789160692756717,
1485
+ "eval_sts-test_pearson_dot": 0.8639029174125306,
1486
+ "eval_sts-test_pearson_euclidean": 0.9084173029414142,
1487
+ "eval_sts-test_pearson_manhattan": 0.9093131544369648,
1488
+ "eval_sts-test_pearson_max": 0.9093131544369648,
1489
+ "eval_sts-test_spearman_cosine": 0.904571298400435,
1490
+ "eval_sts-test_spearman_dot": 0.8658778810098052,
1491
+ "eval_sts-test_spearman_euclidean": 0.9046812074984125,
1492
+ "eval_sts-test_spearman_manhattan": 0.9056302027474785,
1493
+ "eval_sts-test_spearman_max": 0.9056302027474785,
1494
+ "step": 189
1495
+ },
1496
+ {
1497
+ "epoch": 0.23370233702337023,
1498
+ "grad_norm": 3.5909903049468994,
1499
+ "learning_rate": 1.862745098039216e-05,
1500
+ "loss": 0.2209,
1501
+ "step": 190
1502
+ },
1503
+ {
1504
+ "epoch": 0.23493234932349324,
1505
+ "grad_norm": 3.443946361541748,
1506
+ "learning_rate": 1.8725490196078432e-05,
1507
+ "loss": 0.153,
1508
+ "step": 191
1509
+ },
1510
+ {
1511
+ "epoch": 0.23616236162361623,
1512
+ "grad_norm": 3.8604445457458496,
1513
+ "learning_rate": 1.8823529411764708e-05,
1514
+ "loss": 0.2187,
1515
+ "step": 192
1516
+ },
1517
+ {
1518
+ "epoch": 0.23739237392373924,
1519
+ "grad_norm": 3.5916690826416016,
1520
+ "learning_rate": 1.892156862745098e-05,
1521
+ "loss": 0.1593,
1522
+ "step": 193
1523
+ },
1524
+ {
1525
+ "epoch": 0.23862238622386223,
1526
+ "grad_norm": 3.8676974773406982,
1527
+ "learning_rate": 1.9019607843137255e-05,
1528
+ "loss": 0.173,
1529
+ "step": 194
1530
+ },
1531
+ {
1532
+ "epoch": 0.23985239852398524,
1533
+ "grad_norm": 4.338643550872803,
1534
+ "learning_rate": 1.911764705882353e-05,
1535
+ "loss": 0.2377,
1536
+ "step": 195
1537
+ },
1538
+ {
1539
+ "epoch": 0.24108241082410825,
1540
+ "grad_norm": 4.509932994842529,
1541
+ "learning_rate": 1.9215686274509807e-05,
1542
+ "loss": 0.2281,
1543
+ "step": 196
1544
+ },
1545
+ {
1546
+ "epoch": 0.24231242312423124,
1547
+ "grad_norm": 4.282917022705078,
1548
+ "learning_rate": 1.931372549019608e-05,
1549
+ "loss": 0.2651,
1550
+ "step": 197
1551
+ },
1552
+ {
1553
+ "epoch": 0.24354243542435425,
1554
+ "grad_norm": 3.1566977500915527,
1555
+ "learning_rate": 1.9411764705882355e-05,
1556
+ "loss": 0.118,
1557
+ "step": 198
1558
+ },
1559
+ {
1560
+ "epoch": 0.24477244772447723,
1561
+ "grad_norm": 4.118341445922852,
1562
+ "learning_rate": 1.950980392156863e-05,
1563
+ "loss": 0.1728,
1564
+ "step": 199
1565
+ },
1566
+ {
1567
+ "epoch": 0.24600246002460024,
1568
+ "grad_norm": 4.250949859619141,
1569
+ "learning_rate": 1.9607843137254903e-05,
1570
+ "loss": 0.2299,
1571
+ "step": 200
1572
+ },
1573
+ {
1574
+ "epoch": 0.24723247232472326,
1575
+ "grad_norm": 4.084754943847656,
1576
+ "learning_rate": 1.9705882352941178e-05,
1577
+ "loss": 0.2342,
1578
+ "step": 201
1579
+ },
1580
+ {
1581
+ "epoch": 0.24846248462484624,
1582
+ "grad_norm": 3.939434051513672,
1583
+ "learning_rate": 1.9803921568627454e-05,
1584
+ "loss": 0.2413,
1585
+ "step": 202
1586
+ },
1587
+ {
1588
+ "epoch": 0.24969249692496925,
1589
+ "grad_norm": 3.9612276554107666,
1590
+ "learning_rate": 1.9901960784313726e-05,
1591
+ "loss": 0.168,
1592
+ "step": 203
1593
+ },
1594
+ {
1595
+ "epoch": 0.25092250922509224,
1596
+ "grad_norm": 3.401622772216797,
1597
+ "learning_rate": 2e-05,
1598
+ "loss": 0.1474,
1599
+ "step": 204
1600
+ },
1601
+ {
1602
+ "epoch": 0.2521525215252153,
1603
+ "grad_norm": 3.2245850563049316,
1604
+ "learning_rate": 1.9998802517966852e-05,
1605
+ "loss": 0.1102,
1606
+ "step": 205
1607
+ },
1608
+ {
1609
+ "epoch": 0.25338253382533826,
1610
+ "grad_norm": 4.254729270935059,
1611
+ "learning_rate": 1.9995210358660037e-05,
1612
+ "loss": 0.2326,
1613
+ "step": 206
1614
+ },
1615
+ {
1616
+ "epoch": 0.25461254612546125,
1617
+ "grad_norm": 3.603159189224243,
1618
+ "learning_rate": 1.9989224382388813e-05,
1619
+ "loss": 0.1787,
1620
+ "step": 207
1621
+ },
1622
+ {
1623
+ "epoch": 0.25584255842558423,
1624
+ "grad_norm": 3.434582471847534,
1625
+ "learning_rate": 1.9980846022772978e-05,
1626
+ "loss": 0.1423,
1627
+ "step": 208
1628
+ },
1629
+ {
1630
+ "epoch": 0.2570725707257073,
1631
+ "grad_norm": 3.8560950756073,
1632
+ "learning_rate": 1.997007728639956e-05,
1633
+ "loss": 0.2069,
1634
+ "step": 209
1635
+ },
1636
+ {
1637
+ "epoch": 0.25830258302583026,
1638
+ "grad_norm": 3.4417314529418945,
1639
+ "learning_rate": 1.9956920752342226e-05,
1640
+ "loss": 0.136,
1641
+ "step": 210
1642
+ },
1643
+ {
1644
+ "epoch": 0.25830258302583026,
1645
+ "eval_loss": 0.10401736944913864,
1646
+ "eval_runtime": 54.8034,
1647
+ "eval_samples_per_second": 27.48,
1648
+ "eval_steps_per_second": 0.219,
1649
+ "eval_sts-test_pearson_cosine": 0.8792920980078169,
1650
+ "eval_sts-test_pearson_dot": 0.8612401830255582,
1651
+ "eval_sts-test_pearson_euclidean": 0.9094380100842928,
1652
+ "eval_sts-test_pearson_manhattan": 0.9095661408662257,
1653
+ "eval_sts-test_pearson_max": 0.9095661408662257,
1654
+ "eval_sts-test_spearman_cosine": 0.905583034917972,
1655
+ "eval_sts-test_spearman_dot": 0.8658094563311378,
1656
+ "eval_sts-test_spearman_euclidean": 0.906560626223067,
1657
+ "eval_sts-test_spearman_manhattan": 0.906644400584392,
1658
+ "eval_sts-test_spearman_max": 0.906644400584392,
1659
+ "step": 210
1660
+ },
1661
+ {
1662
+ "epoch": 0.25953259532595324,
1663
+ "grad_norm": 4.170388221740723,
1664
+ "learning_rate": 1.9941379571543597e-05,
1665
+ "loss": 0.2407,
1666
+ "step": 211
1667
+ },
1668
+ {
1669
+ "epoch": 0.2607626076260763,
1670
+ "grad_norm": 4.218827247619629,
1671
+ "learning_rate": 1.9923457466060637e-05,
1672
+ "loss": 0.212,
1673
+ "step": 212
1674
+ },
1675
+ {
1676
+ "epoch": 0.26199261992619927,
1677
+ "grad_norm": 3.6592209339141846,
1678
+ "learning_rate": 1.9903158728173206e-05,
1679
+ "loss": 0.1361,
1680
+ "step": 213
1681
+ },
1682
+ {
1683
+ "epoch": 0.26322263222632225,
1684
+ "grad_norm": 4.208631992340088,
1685
+ "learning_rate": 1.9880488219356086e-05,
1686
+ "loss": 0.2356,
1687
+ "step": 214
1688
+ },
1689
+ {
1690
+ "epoch": 0.2644526445264453,
1691
+ "grad_norm": 2.9232637882232666,
1692
+ "learning_rate": 1.9855451369114677e-05,
1693
+ "loss": 0.1059,
1694
+ "step": 215
1695
+ },
1696
+ {
1697
+ "epoch": 0.2656826568265683,
1698
+ "grad_norm": 4.299160480499268,
1699
+ "learning_rate": 1.9828054173684646e-05,
1700
+ "loss": 0.2501,
1701
+ "step": 216
1702
+ },
1703
+ {
1704
+ "epoch": 0.26691266912669126,
1705
+ "grad_norm": 4.013469219207764,
1706
+ "learning_rate": 1.9798303194595846e-05,
1707
+ "loss": 0.1817,
1708
+ "step": 217
1709
+ },
1710
+ {
1711
+ "epoch": 0.26814268142681424,
1712
+ "grad_norm": 3.691553831100464,
1713
+ "learning_rate": 1.976620555710087e-05,
1714
+ "loss": 0.2022,
1715
+ "step": 218
1716
+ },
1717
+ {
1718
+ "epoch": 0.2693726937269373,
1719
+ "grad_norm": 4.433103561401367,
1720
+ "learning_rate": 1.973176894846855e-05,
1721
+ "loss": 0.2235,
1722
+ "step": 219
1723
+ },
1724
+ {
1725
+ "epoch": 0.27060270602706027,
1726
+ "grad_norm": 4.862768173217773,
1727
+ "learning_rate": 1.9695001616142916e-05,
1728
+ "loss": 0.2437,
1729
+ "step": 220
1730
+ },
1731
+ {
1732
+ "epoch": 0.27183271832718325,
1733
+ "grad_norm": 3.9157614707946777,
1734
+ "learning_rate": 1.965591236576794e-05,
1735
+ "loss": 0.1859,
1736
+ "step": 221
1737
+ },
1738
+ {
1739
+ "epoch": 0.2730627306273063,
1740
+ "grad_norm": 4.705247402191162,
1741
+ "learning_rate": 1.9614510559078626e-05,
1742
+ "loss": 0.2167,
1743
+ "step": 222
1744
+ },
1745
+ {
1746
+ "epoch": 0.2742927429274293,
1747
+ "grad_norm": 3.890500068664551,
1748
+ "learning_rate": 1.95708061116589e-05,
1749
+ "loss": 0.1495,
1750
+ "step": 223
1751
+ },
1752
+ {
1753
+ "epoch": 0.27552275522755226,
1754
+ "grad_norm": 4.393867492675781,
1755
+ "learning_rate": 1.9524809490566878e-05,
1756
+ "loss": 0.2876,
1757
+ "step": 224
1758
+ },
1759
+ {
1760
+ "epoch": 0.2767527675276753,
1761
+ "grad_norm": 3.782416582107544,
1762
+ "learning_rate": 1.9476531711828027e-05,
1763
+ "loss": 0.1842,
1764
+ "step": 225
1765
+ },
1766
+ {
1767
+ "epoch": 0.2779827798277983,
1768
+ "grad_norm": 3.32236647605896,
1769
+ "learning_rate": 1.942598433779687e-05,
1770
+ "loss": 0.144,
1771
+ "step": 226
1772
+ },
1773
+ {
1774
+ "epoch": 0.27921279212792127,
1775
+ "grad_norm": 3.9284870624542236,
1776
+ "learning_rate": 1.9373179474387858e-05,
1777
+ "loss": 0.1571,
1778
+ "step": 227
1779
+ },
1780
+ {
1781
+ "epoch": 0.28044280442804426,
1782
+ "grad_norm": 3.847404956817627,
1783
+ "learning_rate": 1.9318129768176033e-05,
1784
+ "loss": 0.209,
1785
+ "step": 228
1786
+ },
1787
+ {
1788
+ "epoch": 0.2816728167281673,
1789
+ "grad_norm": 4.21238899230957,
1790
+ "learning_rate": 1.926084840336821e-05,
1791
+ "loss": 0.2075,
1792
+ "step": 229
1793
+ },
1794
+ {
1795
+ "epoch": 0.2829028290282903,
1796
+ "grad_norm": 4.167908191680908,
1797
+ "learning_rate": 1.9201349098645433e-05,
1798
+ "loss": 0.1722,
1799
+ "step": 230
1800
+ },
1801
+ {
1802
+ "epoch": 0.28413284132841327,
1803
+ "grad_norm": 3.7701351642608643,
1804
+ "learning_rate": 1.9139646103877378e-05,
1805
+ "loss": 0.1464,
1806
+ "step": 231
1807
+ },
1808
+ {
1809
+ "epoch": 0.28413284132841327,
1810
+ "eval_loss": 0.10392418503761292,
1811
+ "eval_runtime": 54.7341,
1812
+ "eval_samples_per_second": 27.515,
1813
+ "eval_steps_per_second": 0.219,
1814
+ "eval_sts-test_pearson_cosine": 0.8822954324429473,
1815
+ "eval_sts-test_pearson_dot": 0.8642863367586305,
1816
+ "eval_sts-test_pearson_euclidean": 0.9122889538029727,
1817
+ "eval_sts-test_pearson_manhattan": 0.912099304650421,
1818
+ "eval_sts-test_pearson_max": 0.9122889538029727,
1819
+ "eval_sts-test_spearman_cosine": 0.9087387596175093,
1820
+ "eval_sts-test_spearman_dot": 0.8704923178256567,
1821
+ "eval_sts-test_spearman_euclidean": 0.9097842833373965,
1822
+ "eval_sts-test_spearman_manhattan": 0.9095372563745162,
1823
+ "eval_sts-test_spearman_max": 0.9097842833373965,
1824
+ "step": 231
1825
+ },
1826
+ {
1827
+ "epoch": 0.2853628536285363,
1828
+ "grad_norm": 4.548703193664551,
1829
+ "learning_rate": 1.9075754196709574e-05,
1830
+ "loss": 0.2675,
1831
+ "step": 232
1832
+ },
1833
+ {
1834
+ "epoch": 0.2865928659286593,
1835
+ "grad_norm": 5.041469573974609,
1836
+ "learning_rate": 1.900968867902419e-05,
1837
+ "loss": 0.2585,
1838
+ "step": 233
1839
+ },
1840
+ {
1841
+ "epoch": 0.2878228782287823,
1842
+ "grad_norm": 3.0036237239837646,
1843
+ "learning_rate": 1.894146537327533e-05,
1844
+ "loss": 0.134,
1845
+ "step": 234
1846
+ },
1847
+ {
1848
+ "epoch": 0.2890528905289053,
1849
+ "grad_norm": 3.6082603931427,
1850
+ "learning_rate": 1.8871100618699553e-05,
1851
+ "loss": 0.1765,
1852
+ "step": 235
1853
+ },
1854
+ {
1855
+ "epoch": 0.2902829028290283,
1856
+ "grad_norm": 3.8336241245269775,
1857
+ "learning_rate": 1.8798611267402745e-05,
1858
+ "loss": 0.1826,
1859
+ "step": 236
1860
+ },
1861
+ {
1862
+ "epoch": 0.2915129151291513,
1863
+ "grad_norm": 4.307932376861572,
1864
+ "learning_rate": 1.872401468032406e-05,
1865
+ "loss": 0.222,
1866
+ "step": 237
1867
+ },
1868
+ {
1869
+ "epoch": 0.29274292742927427,
1870
+ "grad_norm": 3.153963088989258,
1871
+ "learning_rate": 1.864732872307804e-05,
1872
+ "loss": 0.134,
1873
+ "step": 238
1874
+ },
1875
+ {
1876
+ "epoch": 0.2939729397293973,
1877
+ "grad_norm": 4.044833660125732,
1878
+ "learning_rate": 1.8568571761675893e-05,
1879
+ "loss": 0.1902,
1880
+ "step": 239
1881
+ },
1882
+ {
1883
+ "epoch": 0.2952029520295203,
1884
+ "grad_norm": 4.640310287475586,
1885
+ "learning_rate": 1.8487762658126872e-05,
1886
+ "loss": 0.2461,
1887
+ "step": 240
1888
+ },
1889
+ {
1890
+ "epoch": 0.2964329643296433,
1891
+ "grad_norm": 4.932340145111084,
1892
+ "learning_rate": 1.8404920765920898e-05,
1893
+ "loss": 0.3094,
1894
+ "step": 241
1895
+ },
1896
+ {
1897
+ "epoch": 0.2976629766297663,
1898
+ "grad_norm": 4.0233917236328125,
1899
+ "learning_rate": 1.8320065925393468e-05,
1900
+ "loss": 0.2252,
1901
+ "step": 242
1902
+ },
1903
+ {
1904
+ "epoch": 0.2988929889298893,
1905
+ "grad_norm": 4.369536399841309,
1906
+ "learning_rate": 1.8233218458973984e-05,
1907
+ "loss": 0.2466,
1908
+ "step": 243
1909
+ },
1910
+ {
1911
+ "epoch": 0.3001230012300123,
1912
+ "grad_norm": 3.6295106410980225,
1913
+ "learning_rate": 1.814439916631857e-05,
1914
+ "loss": 0.139,
1915
+ "step": 244
1916
+ },
1917
+ {
1918
+ "epoch": 0.3013530135301353,
1919
+ "grad_norm": 3.705105781555176,
1920
+ "learning_rate": 1.8053629319328662e-05,
1921
+ "loss": 0.154,
1922
+ "step": 245
1923
+ },
1924
+ {
1925
+ "epoch": 0.3025830258302583,
1926
+ "grad_norm": 3.7480130195617676,
1927
+ "learning_rate": 1.796093065705644e-05,
1928
+ "loss": 0.1979,
1929
+ "step": 246
1930
+ },
1931
+ {
1932
+ "epoch": 0.3038130381303813,
1933
+ "grad_norm": 2.5885541439056396,
1934
+ "learning_rate": 1.786632538049842e-05,
1935
+ "loss": 0.1121,
1936
+ "step": 247
1937
+ },
1938
+ {
1939
+ "epoch": 0.3050430504305043,
1940
+ "grad_norm": 3.3691048622131348,
1941
+ "learning_rate": 1.7769836147278385e-05,
1942
+ "loss": 0.1361,
1943
+ "step": 248
1944
+ },
1945
+ {
1946
+ "epoch": 0.3062730627306273,
1947
+ "grad_norm": 4.20883321762085,
1948
+ "learning_rate": 1.7671486066220965e-05,
1949
+ "loss": 0.2492,
1950
+ "step": 249
1951
+ },
1952
+ {
1953
+ "epoch": 0.3075030750307503,
1954
+ "grad_norm": 3.8119523525238037,
1955
+ "learning_rate": 1.757129869181718e-05,
1956
+ "loss": 0.1903,
1957
+ "step": 250
1958
+ },
1959
+ {
1960
+ "epoch": 0.3087330873308733,
1961
+ "grad_norm": 4.464923858642578,
1962
+ "learning_rate": 1.746929801858317e-05,
1963
+ "loss": 0.2333,
1964
+ "step": 251
1965
+ },
1966
+ {
1967
+ "epoch": 0.30996309963099633,
1968
+ "grad_norm": 4.029540061950684,
1969
+ "learning_rate": 1.736550847531366e-05,
1970
+ "loss": 0.1805,
1971
+ "step": 252
1972
+ },
1973
+ {
1974
+ "epoch": 0.30996309963099633,
1975
+ "eval_loss": 0.10298814624547958,
1976
+ "eval_runtime": 54.69,
1977
+ "eval_samples_per_second": 27.537,
1978
+ "eval_steps_per_second": 0.219,
1979
+ "eval_sts-test_pearson_cosine": 0.881635556166377,
1980
+ "eval_sts-test_pearson_dot": 0.862389303076424,
1981
+ "eval_sts-test_pearson_euclidean": 0.9125260227425505,
1982
+ "eval_sts-test_pearson_manhattan": 0.9128421094636647,
1983
+ "eval_sts-test_pearson_max": 0.9128421094636647,
1984
+ "eval_sts-test_spearman_cosine": 0.9098964747497047,
1985
+ "eval_sts-test_spearman_dot": 0.8698043119330254,
1986
+ "eval_sts-test_spearman_euclidean": 0.9103645729438589,
1987
+ "eval_sts-test_spearman_manhattan": 0.9110424650514156,
1988
+ "eval_sts-test_spearman_max": 0.9110424650514156,
1989
+ "step": 252
1990
+ },
1991
+ {
1992
+ "epoch": 0.3111931119311193,
1993
+ "grad_norm": 3.990410089492798,
1994
+ "learning_rate": 1.725995491923131e-05,
1995
+ "loss": 0.1929,
1996
+ "step": 253
1997
+ },
1998
+ {
1999
+ "epoch": 0.3124231242312423,
2000
+ "grad_norm": 3.7284581661224365,
2001
+ "learning_rate": 1.7152662630033506e-05,
2002
+ "loss": 0.1424,
2003
+ "step": 254
2004
+ },
2005
+ {
2006
+ "epoch": 0.31365313653136534,
2007
+ "grad_norm": 3.8791370391845703,
2008
+ "learning_rate": 1.7043657303837965e-05,
2009
+ "loss": 0.2318,
2010
+ "step": 255
2011
+ },
2012
+ {
2013
+ "epoch": 0.3148831488314883,
2014
+ "grad_norm": 3.4804205894470215,
2015
+ "learning_rate": 1.693296504702862e-05,
2016
+ "loss": 0.1524,
2017
+ "step": 256
2018
+ },
2019
+ {
2020
+ "epoch": 0.3161131611316113,
2021
+ "grad_norm": 3.573451519012451,
2022
+ "learning_rate": 1.682061237000322e-05,
2023
+ "loss": 0.2195,
2024
+ "step": 257
2025
+ },
2026
+ {
2027
+ "epoch": 0.3173431734317343,
2028
+ "grad_norm": 3.5766184329986572,
2029
+ "learning_rate": 1.6706626180824185e-05,
2030
+ "loss": 0.1338,
2031
+ "step": 258
2032
+ },
2033
+ {
2034
+ "epoch": 0.31857318573185733,
2035
+ "grad_norm": 4.488210201263428,
2036
+ "learning_rate": 1.659103377877423e-05,
2037
+ "loss": 0.2543,
2038
+ "step": 259
2039
+ },
2040
+ {
2041
+ "epoch": 0.3198031980319803,
2042
+ "grad_norm": 4.0144147872924805,
2043
+ "learning_rate": 1.647386284781828e-05,
2044
+ "loss": 0.202,
2045
+ "step": 260
2046
+ },
2047
+ {
2048
+ "epoch": 0.3210332103321033,
2049
+ "grad_norm": 3.4031426906585693,
2050
+ "learning_rate": 1.6355141449973254e-05,
2051
+ "loss": 0.1489,
2052
+ "step": 261
2053
+ },
2054
+ {
2055
+ "epoch": 0.32226322263222634,
2056
+ "grad_norm": 3.8359596729278564,
2057
+ "learning_rate": 1.6234898018587336e-05,
2058
+ "loss": 0.1937,
2059
+ "step": 262
2060
+ },
2061
+ {
2062
+ "epoch": 0.3234932349323493,
2063
+ "grad_norm": 4.457846641540527,
2064
+ "learning_rate": 1.6113161351530257e-05,
2065
+ "loss": 0.2334,
2066
+ "step": 263
2067
+ },
2068
+ {
2069
+ "epoch": 0.3247232472324723,
2070
+ "grad_norm": 4.167722702026367,
2071
+ "learning_rate": 1.598996060429634e-05,
2072
+ "loss": 0.1942,
2073
+ "step": 264
2074
+ },
2075
+ {
2076
+ "epoch": 0.32595325953259535,
2077
+ "grad_norm": 4.352579116821289,
2078
+ "learning_rate": 1.586532528302183e-05,
2079
+ "loss": 0.2013,
2080
+ "step": 265
2081
+ },
2082
+ {
2083
+ "epoch": 0.32718327183271834,
2084
+ "grad_norm": 5.293665409088135,
2085
+ "learning_rate": 1.5739285237418323e-05,
2086
+ "loss": 0.2954,
2087
+ "step": 266
2088
+ },
2089
+ {
2090
+ "epoch": 0.3284132841328413,
2091
+ "grad_norm": 3.7269585132598877,
2092
+ "learning_rate": 1.5611870653623826e-05,
2093
+ "loss": 0.188,
2094
+ "step": 267
2095
+ },
2096
+ {
2097
+ "epoch": 0.3296432964329643,
2098
+ "grad_norm": 3.8485231399536133,
2099
+ "learning_rate": 1.548311204697331e-05,
2100
+ "loss": 0.1688,
2101
+ "step": 268
2102
+ },
2103
+ {
2104
+ "epoch": 0.33087330873308735,
2105
+ "grad_norm": 3.183656692504883,
2106
+ "learning_rate": 1.5353040254690396e-05,
2107
+ "loss": 0.1415,
2108
+ "step": 269
2109
+ },
2110
+ {
2111
+ "epoch": 0.33210332103321033,
2112
+ "grad_norm": 4.292448997497559,
2113
+ "learning_rate": 1.5221686428501929e-05,
2114
+ "loss": 0.2249,
2115
+ "step": 270
2116
+ },
2117
+ {
2118
+ "epoch": 0.3333333333333333,
2119
+ "grad_norm": 4.776716232299805,
2120
+ "learning_rate": 1.508908202717729e-05,
2121
+ "loss": 0.2606,
2122
+ "step": 271
2123
+ },
2124
+ {
2125
+ "epoch": 0.33456334563345635,
2126
+ "grad_norm": 4.753313064575195,
2127
+ "learning_rate": 1.4955258808994093e-05,
2128
+ "loss": 0.2559,
2129
+ "step": 272
2130
+ },
2131
+ {
2132
+ "epoch": 0.33579335793357934,
2133
+ "grad_norm": 4.271998882293701,
2134
+ "learning_rate": 1.482024882413222e-05,
2135
+ "loss": 0.2673,
2136
+ "step": 273
2137
+ },
2138
+ {
2139
+ "epoch": 0.33579335793357934,
2140
+ "eval_loss": 0.10389706492424011,
2141
+ "eval_runtime": 54.7165,
2142
+ "eval_samples_per_second": 27.524,
2143
+ "eval_steps_per_second": 0.219,
2144
+ "eval_sts-test_pearson_cosine": 0.8802537579023317,
2145
+ "eval_sts-test_pearson_dot": 0.8611279643446735,
2146
+ "eval_sts-test_pearson_euclidean": 0.9104828974356078,
2147
+ "eval_sts-test_pearson_manhattan": 0.9112003024801107,
2148
+ "eval_sts-test_pearson_max": 0.9112003024801107,
2149
+ "eval_sts-test_spearman_cosine": 0.9077587069930375,
2150
+ "eval_sts-test_spearman_dot": 0.8683557106354935,
2151
+ "eval_sts-test_spearman_euclidean": 0.908703271867226,
2152
+ "eval_sts-test_spearman_manhattan": 0.9088831719380195,
2153
+ "eval_sts-test_spearman_max": 0.9088831719380195,
2154
+ "step": 273
2155
+ },
2156
+ {
2157
+ "epoch": 0.3370233702337023,
2158
+ "grad_norm": 3.5790419578552246,
2159
+ "learning_rate": 1.4684084406997903e-05,
2160
+ "loss": 0.1618,
2161
+ "step": 274
2162
+ },
2163
+ {
2164
+ "epoch": 0.33825338253382536,
2165
+ "grad_norm": 4.819033145904541,
2166
+ "learning_rate": 1.4546798168479756e-05,
2167
+ "loss": 0.2602,
2168
+ "step": 275
2169
+ },
2170
+ {
2171
+ "epoch": 0.33948339483394835,
2172
+ "grad_norm": 4.567826747894287,
2173
+ "learning_rate": 1.4408422988138585e-05,
2174
+ "loss": 0.2339,
2175
+ "step": 276
2176
+ },
2177
+ {
2178
+ "epoch": 0.34071340713407133,
2179
+ "grad_norm": 4.182609558105469,
2180
+ "learning_rate": 1.4268992006332847e-05,
2181
+ "loss": 0.1843,
2182
+ "step": 277
2183
+ },
2184
+ {
2185
+ "epoch": 0.3419434194341943,
2186
+ "grad_norm": 3.4522156715393066,
2187
+ "learning_rate": 1.412853861628166e-05,
2188
+ "loss": 0.133,
2189
+ "step": 278
2190
+ },
2191
+ {
2192
+ "epoch": 0.34317343173431736,
2193
+ "grad_norm": 4.6532301902771,
2194
+ "learning_rate": 1.3987096456067236e-05,
2195
+ "loss": 0.2345,
2196
+ "step": 279
2197
+ },
2198
+ {
2199
+ "epoch": 0.34440344403444034,
2200
+ "grad_norm": 4.240933418273926,
2201
+ "learning_rate": 1.3844699400578696e-05,
2202
+ "loss": 0.2808,
2203
+ "step": 280
2204
+ },
2205
+ {
2206
+ "epoch": 0.3456334563345633,
2207
+ "grad_norm": 3.000117063522339,
2208
+ "learning_rate": 1.3701381553399144e-05,
2209
+ "loss": 0.1044,
2210
+ "step": 281
2211
+ },
2212
+ {
2213
+ "epoch": 0.34686346863468637,
2214
+ "grad_norm": 3.7988216876983643,
2215
+ "learning_rate": 1.3557177238637987e-05,
2216
+ "loss": 0.1622,
2217
+ "step": 282
2218
+ },
2219
+ {
2220
+ "epoch": 0.34809348093480935,
2221
+ "grad_norm": 3.2597107887268066,
2222
+ "learning_rate": 1.3412120992710425e-05,
2223
+ "loss": 0.1303,
2224
+ "step": 283
2225
+ },
2226
+ {
2227
+ "epoch": 0.34932349323493234,
2228
+ "grad_norm": 3.2426445484161377,
2229
+ "learning_rate": 1.3266247556066122e-05,
2230
+ "loss": 0.1453,
2231
+ "step": 284
2232
+ },
2233
+ {
2234
+ "epoch": 0.3505535055350554,
2235
+ "grad_norm": 4.482458114624023,
2236
+ "learning_rate": 1.3119591864868979e-05,
2237
+ "loss": 0.237,
2238
+ "step": 285
2239
+ },
2240
+ {
2241
+ "epoch": 0.35178351783517836,
2242
+ "grad_norm": 4.062747478485107,
2243
+ "learning_rate": 1.2972189042630044e-05,
2244
+ "loss": 0.1726,
2245
+ "step": 286
2246
+ },
2247
+ {
2248
+ "epoch": 0.35301353013530135,
2249
+ "grad_norm": 3.9885880947113037,
2250
+ "learning_rate": 1.2824074391795571e-05,
2251
+ "loss": 0.2195,
2252
+ "step": 287
2253
+ },
2254
+ {
2255
+ "epoch": 0.35424354243542433,
2256
+ "grad_norm": 5.205960750579834,
2257
+ "learning_rate": 1.2675283385292212e-05,
2258
+ "loss": 0.3016,
2259
+ "step": 288
2260
+ },
2261
+ {
2262
+ "epoch": 0.35547355473554737,
2263
+ "grad_norm": 3.2820823192596436,
2264
+ "learning_rate": 1.252585165803135e-05,
2265
+ "loss": 0.1626,
2266
+ "step": 289
2267
+ },
2268
+ {
2269
+ "epoch": 0.35670356703567035,
2270
+ "grad_norm": 4.133265495300293,
2271
+ "learning_rate": 1.2375814998374714e-05,
2272
+ "loss": 0.1902,
2273
+ "step": 290
2274
+ },
2275
+ {
2276
+ "epoch": 0.35793357933579334,
2277
+ "grad_norm": 3.349637746810913,
2278
+ "learning_rate": 1.2225209339563144e-05,
2279
+ "loss": 0.1387,
2280
+ "step": 291
2281
+ },
2282
+ {
2283
+ "epoch": 0.3591635916359164,
2284
+ "grad_norm": 2.7458724975585938,
2285
+ "learning_rate": 1.2074070751110753e-05,
2286
+ "loss": 0.1047,
2287
+ "step": 292
2288
+ },
2289
+ {
2290
+ "epoch": 0.36039360393603936,
2291
+ "grad_norm": 3.7697091102600098,
2292
+ "learning_rate": 1.1922435430166372e-05,
2293
+ "loss": 0.1954,
2294
+ "step": 293
2295
+ },
2296
+ {
2297
+ "epoch": 0.36162361623616235,
2298
+ "grad_norm": 4.1529622077941895,
2299
+ "learning_rate": 1.1770339692844484e-05,
2300
+ "loss": 0.2089,
2301
+ "step": 294
2302
+ },
2303
+ {
2304
+ "epoch": 0.36162361623616235,
2305
+ "eval_loss": 0.10294178128242493,
2306
+ "eval_runtime": 54.5996,
2307
+ "eval_samples_per_second": 27.583,
2308
+ "eval_steps_per_second": 0.22,
2309
+ "eval_sts-test_pearson_cosine": 0.8803510241742277,
2310
+ "eval_sts-test_pearson_dot": 0.8614562911474479,
2311
+ "eval_sts-test_pearson_euclidean": 0.9105274765595701,
2312
+ "eval_sts-test_pearson_manhattan": 0.9112776803683604,
2313
+ "eval_sts-test_pearson_max": 0.9112776803683604,
2314
+ "eval_sts-test_spearman_cosine": 0.9082726304788564,
2315
+ "eval_sts-test_spearman_dot": 0.8687116203836315,
2316
+ "eval_sts-test_spearman_euclidean": 0.9088194013905367,
2317
+ "eval_sts-test_spearman_manhattan": 0.9091452800759889,
2318
+ "eval_sts-test_spearman_max": 0.9091452800759889,
2319
+ "step": 294
2320
+ },
2321
+ {
2322
+ "epoch": 0.3628536285362854,
2323
+ "grad_norm": 3.3324942588806152,
2324
+ "learning_rate": 1.161781996552765e-05,
2325
+ "loss": 0.1485,
2326
+ "step": 295
2327
+ },
2328
+ {
2329
+ "epoch": 0.3640836408364084,
2330
+ "grad_norm": 3.477254867553711,
2331
+ "learning_rate": 1.1464912776142494e-05,
2332
+ "loss": 0.1724,
2333
+ "step": 296
2334
+ },
2335
+ {
2336
+ "epoch": 0.36531365313653136,
2337
+ "grad_norm": 3.933436393737793,
2338
+ "learning_rate": 1.1311654745411424e-05,
2339
+ "loss": 0.2017,
2340
+ "step": 297
2341
+ },
2342
+ {
2343
+ "epoch": 0.36654366543665434,
2344
+ "grad_norm": 3.6212170124053955,
2345
+ "learning_rate": 1.115808257808209e-05,
2346
+ "loss": 0.1591,
2347
+ "step": 298
2348
+ },
2349
+ {
2350
+ "epoch": 0.3677736777367774,
2351
+ "grad_norm": 4.0765700340271,
2352
+ "learning_rate": 1.1004233054136726e-05,
2353
+ "loss": 0.2396,
2354
+ "step": 299
2355
+ },
2356
+ {
2357
+ "epoch": 0.36900369003690037,
2358
+ "grad_norm": 3.589646816253662,
2359
+ "learning_rate": 1.0850143019983475e-05,
2360
+ "loss": 0.1395,
2361
+ "step": 300
2362
+ },
2363
+ {
2364
+ "epoch": 0.37023370233702335,
2365
+ "grad_norm": 3.7769243717193604,
2366
+ "learning_rate": 1.0695849379631816e-05,
2367
+ "loss": 0.1806,
2368
+ "step": 301
2369
+ },
2370
+ {
2371
+ "epoch": 0.3714637146371464,
2372
+ "grad_norm": 3.4720847606658936,
2373
+ "learning_rate": 1.0541389085854177e-05,
2374
+ "loss": 0.1882,
2375
+ "step": 302
2376
+ },
2377
+ {
2378
+ "epoch": 0.3726937269372694,
2379
+ "grad_norm": 2.9006810188293457,
2380
+ "learning_rate": 1.038679913133589e-05,
2381
+ "loss": 0.1188,
2382
+ "step": 303
2383
+ },
2384
+ {
2385
+ "epoch": 0.37392373923739236,
2386
+ "grad_norm": 3.7660434246063232,
2387
+ "learning_rate": 1.023211653981556e-05,
2388
+ "loss": 0.1564,
2389
+ "step": 304
2390
+ },
2391
+ {
2392
+ "epoch": 0.3751537515375154,
2393
+ "grad_norm": 5.082170486450195,
2394
+ "learning_rate": 1.0077378357218023e-05,
2395
+ "loss": 0.313,
2396
+ "step": 305
2397
+ },
2398
+ {
2399
+ "epoch": 0.3763837638376384,
2400
+ "grad_norm": 3.5429434776306152,
2401
+ "learning_rate": 9.922621642781982e-06,
2402
+ "loss": 0.1455,
2403
+ "step": 306
2404
+ },
2405
+ {
2406
+ "epoch": 0.37761377613776137,
2407
+ "grad_norm": 3.1348257064819336,
2408
+ "learning_rate": 9.767883460184447e-06,
2409
+ "loss": 0.1535,
2410
+ "step": 307
2411
+ },
2412
+ {
2413
+ "epoch": 0.37884378843788435,
2414
+ "grad_norm": 2.881880521774292,
2415
+ "learning_rate": 9.613200868664112e-06,
2416
+ "loss": 0.099,
2417
+ "step": 308
2418
+ },
2419
+ {
2420
+ "epoch": 0.3800738007380074,
2421
+ "grad_norm": 3.5104594230651855,
2422
+ "learning_rate": 9.458610914145826e-06,
2423
+ "loss": 0.1733,
2424
+ "step": 309
2425
+ },
2426
+ {
2427
+ "epoch": 0.3813038130381304,
2428
+ "grad_norm": 3.9202194213867188,
2429
+ "learning_rate": 9.304150620368189e-06,
2430
+ "loss": 0.1891,
2431
+ "step": 310
2432
+ },
2433
+ {
2434
+ "epoch": 0.38253382533825336,
2435
+ "grad_norm": 3.655240297317505,
2436
+ "learning_rate": 9.149856980016529e-06,
2437
+ "loss": 0.2128,
2438
+ "step": 311
2439
+ },
2440
+ {
2441
+ "epoch": 0.3837638376383764,
2442
+ "grad_norm": 3.766303062438965,
2443
+ "learning_rate": 8.995766945863278e-06,
2444
+ "loss": 0.2042,
2445
+ "step": 312
2446
+ },
2447
+ {
2448
+ "epoch": 0.3849938499384994,
2449
+ "grad_norm": 3.8122780323028564,
2450
+ "learning_rate": 8.841917421917913e-06,
2451
+ "loss": 0.203,
2452
+ "step": 313
2453
+ },
2454
+ {
2455
+ "epoch": 0.3862238622386224,
2456
+ "grad_norm": 4.085626602172852,
2457
+ "learning_rate": 8.688345254588579e-06,
2458
+ "loss": 0.2249,
2459
+ "step": 314
2460
+ },
2461
+ {
2462
+ "epoch": 0.3874538745387454,
2463
+ "grad_norm": 3.744234323501587,
2464
+ "learning_rate": 8.53508722385751e-06,
2465
+ "loss": 0.1597,
2466
+ "step": 315
2467
+ },
2468
+ {
2469
+ "epoch": 0.3874538745387454,
2470
+ "eval_loss": 0.10140395164489746,
2471
+ "eval_runtime": 54.6293,
2472
+ "eval_samples_per_second": 27.568,
2473
+ "eval_steps_per_second": 0.22,
2474
+ "eval_sts-test_pearson_cosine": 0.8788980244871143,
2475
+ "eval_sts-test_pearson_dot": 0.8616979928109831,
2476
+ "eval_sts-test_pearson_euclidean": 0.9090675882181183,
2477
+ "eval_sts-test_pearson_manhattan": 0.909351159725011,
2478
+ "eval_sts-test_pearson_max": 0.909351159725011,
2479
+ "eval_sts-test_spearman_cosine": 0.9074493862743003,
2480
+ "eval_sts-test_spearman_dot": 0.8701774479505067,
2481
+ "eval_sts-test_spearman_euclidean": 0.9075559837789346,
2482
+ "eval_sts-test_spearman_manhattan": 0.9076191725600193,
2483
+ "eval_sts-test_spearman_max": 0.9076191725600193,
2484
+ "step": 315
2485
+ },
2486
+ {
2487
+ "epoch": 0.3886838868388684,
2488
+ "grad_norm": 3.4346373081207275,
2489
+ "learning_rate": 8.382180034472353e-06,
2490
+ "loss": 0.1358,
2491
+ "step": 316
2492
+ },
2493
+ {
2494
+ "epoch": 0.3899138991389914,
2495
+ "grad_norm": 3.872002124786377,
2496
+ "learning_rate": 8.229660307155518e-06,
2497
+ "loss": 0.207,
2498
+ "step": 317
2499
+ },
2500
+ {
2501
+ "epoch": 0.39114391143911437,
2502
+ "grad_norm": 3.4921915531158447,
2503
+ "learning_rate": 8.077564569833633e-06,
2504
+ "loss": 0.193,
2505
+ "step": 318
2506
+ },
2507
+ {
2508
+ "epoch": 0.3923739237392374,
2509
+ "grad_norm": 3.2774128913879395,
2510
+ "learning_rate": 7.92592924888925e-06,
2511
+ "loss": 0.1141,
2512
+ "step": 319
2513
+ },
2514
+ {
2515
+ "epoch": 0.3936039360393604,
2516
+ "grad_norm": 4.900540351867676,
2517
+ "learning_rate": 7.774790660436857e-06,
2518
+ "loss": 0.2835,
2519
+ "step": 320
2520
+ },
2521
+ {
2522
+ "epoch": 0.3948339483394834,
2523
+ "grad_norm": 4.073228359222412,
2524
+ "learning_rate": 7.6241850016252915e-06,
2525
+ "loss": 0.2589,
2526
+ "step": 321
2527
+ },
2528
+ {
2529
+ "epoch": 0.3960639606396064,
2530
+ "grad_norm": 2.549339532852173,
2531
+ "learning_rate": 7.4741483419686475e-06,
2532
+ "loss": 0.088,
2533
+ "step": 322
2534
+ },
2535
+ {
2536
+ "epoch": 0.3972939729397294,
2537
+ "grad_norm": 4.411525726318359,
2538
+ "learning_rate": 7.324716614707792e-06,
2539
+ "loss": 0.1675,
2540
+ "step": 323
2541
+ },
2542
+ {
2543
+ "epoch": 0.3985239852398524,
2544
+ "grad_norm": 3.336052656173706,
2545
+ "learning_rate": 7.175925608204428e-06,
2546
+ "loss": 0.1525,
2547
+ "step": 324
2548
+ },
2549
+ {
2550
+ "epoch": 0.3997539975399754,
2551
+ "grad_norm": 3.2689311504364014,
2552
+ "learning_rate": 7.0278109573699574e-06,
2553
+ "loss": 0.1401,
2554
+ "step": 325
2555
+ },
2556
+ {
2557
+ "epoch": 0.4009840098400984,
2558
+ "grad_norm": 3.8623855113983154,
2559
+ "learning_rate": 6.880408135131022e-06,
2560
+ "loss": 0.2109,
2561
+ "step": 326
2562
+ },
2563
+ {
2564
+ "epoch": 0.4022140221402214,
2565
+ "grad_norm": 3.1453464031219482,
2566
+ "learning_rate": 6.733752443933879e-06,
2567
+ "loss": 0.1382,
2568
+ "step": 327
2569
+ },
2570
+ {
2571
+ "epoch": 0.4034440344403444,
2572
+ "grad_norm": 3.09243106842041,
2573
+ "learning_rate": 6.587879007289576e-06,
2574
+ "loss": 0.1724,
2575
+ "step": 328
2576
+ }
2577
+ ],
2578
+ "logging_steps": 1,
2579
+ "max_steps": 813,
2580
+ "num_input_tokens_seen": 0,
2581
+ "num_train_epochs": 1,
2582
+ "save_steps": 82,
2583
+ "stateful_callbacks": {
2584
+ "TrainerControl": {
2585
+ "args": {
2586
+ "should_epoch_stop": false,
2587
+ "should_evaluate": false,
2588
+ "should_log": false,
2589
+ "should_save": true,
2590
+ "should_training_stop": false
2591
+ },
2592
+ "attributes": {}
2593
+ }
2594
+ },
2595
+ "total_flos": 0.0,
2596
+ "train_batch_size": 320,
2597
+ "trial_name": null,
2598
+ "trial_params": null
2599
+ }
checkpoint-328/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cd608c44b8e7ee1bfc13e8632dd3ba8e3075b88a8d04aa48d7120c1f23659d10
3
+ size 5752