bobox commited on
Commit
5c2007e
·
verified ·
1 Parent(s): 7c67c11

Training in progress, epoch 1, checkpoint

Browse files
last-checkpoint/README.md CHANGED
@@ -7,9 +7,9 @@ tags:
7
  - sentence-similarity
8
  - feature-extraction
9
  - generated_from_trainer
10
- - dataset_size:96781
11
  - loss:MultipleNegativesRankingLoss
12
- - loss:AnglELoss
13
  - loss:GISTEmbedLoss
14
  - loss:OnlineContrastiveLoss
15
  - loss:MultipleNegativesSymmetricRankingLoss
@@ -23,48 +23,43 @@ datasets:
23
  - sentence-transformers/xsum
24
  - sentence-transformers/sentence-compression
25
  widget:
26
- - source_sentence: What dual titles did Frederick William hold?
 
27
  sentences:
28
- - The impact was increased by chronic overfishing, and by eutrophication that gave
29
- the entire ecosystem a short-term boost, causing the Mnemiopsis population to
30
- increase even faster than normal and above all by the absence of efficient predators
31
- on these introduced ctenophores.
32
- - The "European Council" (rather than the Council, made up of different government
33
- Ministers) is composed of the Prime Ministers or executive Presidents of the member
34
- states.
35
- - Nearly 50,000 Huguenots established themselves in Germany, 20,000 of whom were
36
- welcomed in Brandenburg-Prussia, where they were granted special privileges (Edict
37
- of Potsdam) and churches in which to worship (such as the Church of St. Peter
38
- and St. Paul, Angermünde) by Frederick William, Elector of Brandenburg and Duke
39
- of Prussia.
40
- - source_sentence: the Great Internet Mersenne Prime Search, what was the prize for
41
- finding a prime with at least 10 million digits?
42
  sentences:
43
- - Since September 2004, the official home of the Scottish Parliament has been a
44
- new Scottish Parliament Building, in the Holyrood area of Edinburgh.
45
- - The roughly half-mile stretch of Kearney Boulevard between Fresno Street and Thorne
46
- Ave was at one time the preferred neighborhood for Fresno's elite African-American
47
- families.
48
- - In 2009, the Great Internet Mersenne Prime Search project was awarded a US$100,000
49
- prize for first discovering a prime with at least 10 million digits.
50
- - source_sentence: A woman is tugging on a white sheet and laughing
 
51
  sentences:
52
- - Fruit characters decorate this child's bib
53
- - The person is amused.
54
- - Farmers preparing to feed their animals.
55
- - source_sentence: A hispanic fruit market with many different fruits and vegetables
56
- in view on a city street with a man passing the store dressed in dark pants and
57
- a hoodie.
58
  sentences:
59
- - A fruit market and a man
60
- - The guys have guns.
61
- - Two people are playing an instruments on stage.
62
- - source_sentence: Two Asian children, a boy and a girl, the girl looking squarely
63
- at the camera and the boy making a face.
64
  sentences:
65
- - A woman is outside.
66
- - there are children near the camera
67
- - A boy is playing on an inflatable ride.
68
  pipeline_tag: sentence-similarity
69
  ---
70
 
@@ -125,9 +120,9 @@ from sentence_transformers import SentenceTransformer
125
  model = SentenceTransformer("bobox/DeBERTaV3-small-GeneralSentenceTransformer-checkpoints-tmp")
126
  # Run inference
127
  sentences = [
128
- 'Two Asian children, a boy and a girl, the girl looking squarely at the camera and the boy making a face.',
129
- 'there are children near the camera',
130
- 'A boy is playing on an inflatable ride.',
131
  ]
132
  embeddings = model.encode(sentences)
133
  print(embeddings.shape)
@@ -182,7 +177,7 @@ You can finetune this model on your own dataset.
182
  #### nli-pairs
183
 
184
  * Dataset: [nli-pairs](https://huggingface.co/datasets/sentence-transformers/all-nli) at [d482672](https://huggingface.co/datasets/sentence-transformers/all-nli/tree/d482672c8e74ce18da116f430137434ba2e52fab)
185
- * Size: 7,500 training samples
186
  * Columns: <code>sentence1</code> and <code>sentence2</code>
187
  * Approximate statistics based on the first 1000 samples:
188
  | | sentence1 | sentence2 |
@@ -219,30 +214,30 @@ You can finetune this model on your own dataset.
219
  | <code>A plane is taking off.</code> | <code>An air plane is taking off.</code> | <code>1.0</code> |
220
  | <code>A man is playing a large flute.</code> | <code>A man is playing a flute.</code> | <code>0.76</code> |
221
  | <code>A man is spreading shreded cheese on a pizza.</code> | <code>A man is spreading shredded cheese on an uncooked pizza.</code> | <code>0.76</code> |
222
- * Loss: [<code>AnglELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#angleloss) with these parameters:
223
  ```json
224
  {
225
  "scale": 20.0,
226
- "similarity_fct": "pairwise_angle_sim"
227
  }
228
  ```
229
 
230
  #### vitaminc-pairs
231
 
232
  * Dataset: [vitaminc-pairs](https://huggingface.co/datasets/tals/vitaminc) at [be6febb](https://huggingface.co/datasets/tals/vitaminc/tree/be6febb761b0b2807687e61e0b5282e459df2fa0)
233
- * Size: 3,695 training samples
234
  * Columns: <code>label</code>, <code>sentence1</code>, and <code>sentence2</code>
235
  * Approximate statistics based on the first 1000 samples:
236
- | | label | sentence1 | sentence2 |
237
- |:--------|:-----------------------------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
238
- | type | int | string | string |
239
- | details | <ul><li>1: 100.00%</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 16.21 tokens</li><li>max: 67 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 37.22 tokens</li><li>max: 224 tokens</li></ul> |
240
  * Samples:
241
- | label | sentence1 | sentence2 |
242
- |:---------------|:------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
243
- | <code>1</code> | <code>Ginn filled out the lineup in 2015 .</code> | <code>In 2015 , Ginn filled out the line up with adding new members Tyler Smith on bass , and Brandon Pertzborn on drums.</code> |
244
- | <code>1</code> | <code>Brent Grimes is a free agent .</code> | <code>Brent Omar Grimes ( born July 19 , 1983 ) is an American football cornerback who is currently a free agent .</code> |
245
- | <code>1</code> | <code>The Symphony by Erich Korngold is in F-sharp Major .</code> | <code>`` Classical music critic Mark Swed of the Los Angeles Times pointed out that many classical composers used material from previous composers saying that `` '' John Williams all but lifted the core idea of his Star Wars soundtrack score from the Scherzo of Erich Korngold 's Symphony in F-sharp Major , written 25 years earlier . '' '' ''</code> |
246
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
247
  ```json
248
  {'guide': SentenceTransformer(
@@ -255,19 +250,19 @@ You can finetune this model on your own dataset.
255
  #### qnli-contrastive
256
 
257
  * Dataset: [qnli-contrastive](https://huggingface.co/datasets/nyu-mll/glue) at [bcdcba7](https://huggingface.co/datasets/nyu-mll/glue/tree/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c)
258
- * Size: 7,500 training samples
259
  * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
260
  * Approximate statistics based on the first 1000 samples:
261
  | | sentence1 | sentence2 | label |
262
  |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------|
263
  | type | string | string | int |
264
- | details | <ul><li>min: 6 tokens</li><li>mean: 13.84 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 34.77 tokens</li><li>max: 166 tokens</li></ul> | <ul><li>0: 100.00%</li></ul> |
265
  * Samples:
266
- | sentence1 | sentence2 | label |
267
- |:-------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
268
- | <code>The term, the New Haven Approach was what exactly?</code> | <code>A theory of international law, which argues for a sociological normative approach in regards to jurisprudence, is named the New Haven Approach, after the city.</code> | <code>0</code> |
269
- | <code>As of April 2014, how many albums have Jay Z and Beyonce sold together?</code> | <code>As of April 2014, the couple have sold a combined 300 million records together.</code> | <code>0</code> |
270
- | <code>Most U.S law, the kind of law we live everyday, consists of what kind of law?</code> | <code>In the dual-sovereign system of American federalism (actually tripartite because of the presence of Indian reservations), states are the plenary sovereigns, each with their own constitution, while the federal sovereign possesses only the limited supreme authority enumerated in the Constitution.</code> | <code>0</code> |
271
  * Loss: [<code>OnlineContrastiveLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#onlinecontrastiveloss)
272
 
273
  #### scitail-pairs-qa
@@ -276,16 +271,16 @@ You can finetune this model on your own dataset.
276
  * Size: 14,987 training samples
277
  * Columns: <code>sentence2</code> and <code>sentence1</code>
278
  * Approximate statistics based on the first 1000 samples:
279
- | | sentence2 | sentence1 |
280
- |:--------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
281
- | type | string | string |
282
- | details | <ul><li>min: 7 tokens</li><li>mean: 15.73 tokens</li><li>max: 41 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 14.9 tokens</li><li>max: 33 tokens</li></ul> |
283
  * Samples:
284
- | sentence2 | sentence1 |
285
- |:-------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------|
286
- | <code>The nervous system plays a critical role in the regulation of vascular homeostasis.</code> | <code>What system plays a critical role in the regulation of vascular homeostasis?</code> |
287
- | <code>A moose eats a plant is an example of a living thing that depends on another living thing to survive.</code> | <code>Which statement best identifies a living thing that depends on another living thing to survive?</code> |
288
- | <code>Single-celled organisms and multicellular organisms have this in common: both have a way to get rid of waste materials.</code> | <code>Which characteristic do single-celled organisms and multicellular organisms have in common?</code> |
289
  * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
290
  ```json
291
  {
@@ -303,13 +298,13 @@ You can finetune this model on your own dataset.
303
  | | sentence1 | sentence2 |
304
  |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
305
  | type | string | string |
306
- | details | <ul><li>min: 6 tokens</li><li>mean: 23.64 tokens</li><li>max: 69 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 15.52 tokens</li><li>max: 36 tokens</li></ul> |
307
  * Samples:
308
- | sentence1 | sentence2 |
309
- |:--------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------|
310
- | <code>population A group of organisms of the same species that occupy a particular geographic area or region.</code> | <code>You call a group of organisms of the same species that live in the same area a(n) population.</code> |
311
- | <code>A galaxy is a vast island of hundreds of billions of stars (solar systems), nebulas and star clusters.</code> | <code>A galaxy is best described as a cluster of billions of stars.</code> |
312
- | <code>Photosynthesis is an anabolic process by which plants synthesize glucose from the raw products carbon dioxide and water.</code> | <code>Photosynthesis converts carbon dioxide and water into glucose.</code> |
313
  * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
314
  ```json
315
  {
@@ -321,19 +316,19 @@ You can finetune this model on your own dataset.
321
  #### xsum-pairs
322
 
323
  * Dataset: [xsum-pairs](https://huggingface.co/datasets/sentence-transformers/xsum) at [788ddaf](https://huggingface.co/datasets/sentence-transformers/xsum/tree/788ddafe04e539956d56b567bc32a036ee7b9206)
324
- * Size: 3,750 training samples
325
  * Columns: <code>sentence1</code> and <code>sentence2</code>
326
  * Approximate statistics based on the first 1000 samples:
327
  | | sentence1 | sentence2 |
328
  |:--------|:-------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
329
  | type | string | string |
330
- | details | <ul><li>min: 35 tokens</li><li>mean: 353.88 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 26.93 tokens</li><li>max: 67 tokens</li></ul> |
331
  * Samples:
332
- | sentence1 | sentence2 |
333
- |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------|
334
- | <code>About 24 of the rocks, known as Hertfordshire Puddingstone, were removed from the former St Albans Museum grounds.<br>It is thought they were taken from the garden between the 3 and 14 April.<br>Hertfordshire Police said the theft of the rare rock was being treated as a heritage crime.<br>PC Sean Lannon said Hertfordshire Puddingstone was "one of the world's rarest rocks" and part of the county's heritage.<br>"We are doing all we can to ensure that these stones are returned to the museum," he said.<br>The force has appealed for witnesses or anyone who may have been offered the rocks for sale to come forward.<br>Hertfordshire Puddingstone is a naturally occurring conglomerate consisting of rounded flint pebbles bound in silica cement, found mostly within the county.<br>It is thought to have originated from deposits laid down millions of years ago and is called puddingstone because the flints resemble the plums in a Christmas pudding.<br>Most of the rocks taken came from the site of the Seventh Day Adventist Church during the late 1970s.<br>St Albans Museum in Hatfield Road closed earlier this year ahead of its move to a new site in the Town Hall which is due to open next year.</code> | <code>A collection of prehistoric stones thought be about 54 million years old has been stolen from a former museum site, police have said.</code> |
335
- | <code>Instead, in today's ever more fitness and fashion conscious world, a growing number are willing to pay as much for a new gym outfit as they do for a new formal party dress.<br>This has led to a big increase over recent years in the value of the women's sportswear market.<br>In the US alone, combined sales of such products - from yoga leggings, sports bras and vests, to tracksuits - totalled $15.1bn (£10.3bn) in the 12 months to August 2011, according to research firm NPD Group. It said this was 10% higher than the prior year.<br>Meanwhile, sportswear giant Nike said last October that the rate of sales growth in its female clothing ranges was outpacing that of its products for men.<br>Analysts say that the rise in sales of women's sportswear has been helped by an increased emphasis on the style of the clothing - making them look and feel as good as possible - which in turn has led to an increase in the number of women wearing such items as fashionable leisurewear.<br>And with the market being so valuable, it is not surprising that a growing number of small companies - predominantly led by women - are launching their own ranges of upmarket female sportswear.<br>Katy Biddulph didn't need gym membership when she launched her women's sportswear brand Striders Edge in London back in 2011.<br>Initially running the business from a second floor one-bedroom flat, she would get her exercise by carrying all her deliveries up and down the stairs.<br>The 31-year-old says: "It looked like a fairly big business to the outside world when I was just starting out, but I was receiving all my goods from the manufacturer in Portugal from a truck outside my flat.<br>"I had hundreds of garments landing in the street, and I had to get all the boxes up the stairs by myself. I never slept that first year, but I just knew there was a gap in the market that I could fill.<br>"Now I've got an office that overlooks the London Eye."<br>Ms Biddulph set up the business after previously working for fellow British women's sportswear company Sweaty Betty, where she designed and managed a number of product ranges.<br>Her industry experience and knowledge persuaded a number of private investors to back her venture.<br>Striders Edge's clothes are now stocked by UK retailers Harrods, John Lewis and House of Fraser, and the brand launched in the US in February. It also sells globally via its website.<br>Now with nine members of staff, Ms Biddulph says she wants to hit £2m in sales within the next 12 months.<br>She adds: "You want your customer to feel great and part of something. As a female, you know the standard concerns."<br>But just how do you convince women to spend more than £60 on a t-shirt or a pair of leggings?<br>"It's not as hard as you would think," says Brittany Morris-Asquith, spokesperson and designer for Titika Active Couture, a Canadian brand based in Toronto. "Women are always looking for something different.<br>"They're asking more questions about fabrics, and if they understand the construction that goes into it, they're willing to pay for a better product."<br>Since Titika's founder Eileen Zhang, 32, opened her first shop in Toronto in 2009, Titika has expanded to seven stores across the province of Ontario.<br>And in March of this year it expended its online sales to the US, with plans to ship globally later this year.<br>Ms Morris-Asquith adds: "We provide clothing to women that make them feel good, we encourage them to try on things that they would never think about."<br>Titika also offers free in-store exercise classes to promote a healthy lifestyle - from yoga and kickboxing, to zumba dance workouts. And inspirational slogans affixed above fitting room mirrors urge against body shaming.<br>Catherine Elliott, a professor at the Telfer School of Management at the University of Ottawa, says that businesses such as Striders Edge and Titika share an ethos which is typical for female-led companies.<br>"They tend to have a double bottom line - to create wealth, but also to make positive change for girls and women," says Prof Elliott, who is co-author of a recently published book on the subject called Feminine Capital.<br>"When women are defining the objectives of [a clothing] business, they're going to see it as something that empowers women as opposed to just making them look sexy.<br>"The sports clothing industry is about feeling good about yourself, and wearing clothing that fits and makes you feel comfortable."<br>She adds: "A lot of women have talked about how being in sports and fitness has given them the leadership skills and confidence to be successful in corporate settings and entrepreneurship."<br>At New York-based women's sportswear business Live The Process, founder Robyn Berkley says the aim is for the brand to not just be about clothing, and instead "offer authenticity, honesty, and embrace the idealism of wellness".<br>Its website features editorial content from 32 contributors, offering tips ranging from changing careers to taking care of your skin.<br>Established in 2013, the company's clothing range was an immediate hit, with sales topping £1m in its first year.</code> | <code>The days when women would simply throw on an old T-shirt to do some exercise are long gone.</code> |
336
- | <code>Bovis Homes wanted to build the new estate on land between Collingtree Park and Collingtree village to the south of Northampton.<br>The plans were rejected by Northampton Borough Council in January.<br>Now a report to the council's planning committee, meeting next week, says Bovis Home intends to appeal.<br>The plans, for the site to the north of the M1, also included community buildings, a site for a primary school and open space.<br>Council officers said in the report that if the appeal was successful it would be important to have in place an agreement with the developer for it to help fund play areas, footpath improvements and a school and community buildings.<br>The original plans were recommended for approval by council officers but rejected by the council after fears were raised about flooding in the area and some homes being sited too near the M1.</code> | <code>Developers who saw their plans for 1,000 homes near a Northamptonshire village rejected by councillors are set to appeal against the decision.</code> |
337
  * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
338
  ```json
339
  {
@@ -345,19 +340,19 @@ You can finetune this model on your own dataset.
345
  #### compression-pairs
346
 
347
  * Dataset: [compression-pairs](https://huggingface.co/datasets/sentence-transformers/sentence-compression) at [605bc91](https://huggingface.co/datasets/sentence-transformers/sentence-compression/tree/605bc91d95631895ba25b6eda51a3cb596976c90)
348
- * Size: 45,000 training samples
349
  * Columns: <code>sentence1</code> and <code>sentence2</code>
350
  * Approximate statistics based on the first 1000 samples:
351
  | | sentence1 | sentence2 |
352
  |:--------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
353
  | type | string | string |
354
- | details | <ul><li>min: 10 tokens</li><li>mean: 31.78 tokens</li><li>max: 170 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 10.14 tokens</li><li>max: 29 tokens</li></ul> |
355
  * Samples:
356
- | sentence1 | sentence2 |
357
- |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------|
358
- | <code>The USHL completed an expansion draft on Monday as 10 players who were on the rosters of USHL teams during the 2009-10 season were selected by the League's two newest entries, the Muskegon Lumberjacks and Dubuque Fighting Saints.</code> | <code>USHL completes expansion draft</code> |
359
- | <code>NRT LLC, one of the nation's largest residential real estate brokerage companies, announced several executive appointments within its Coldwell Banker Residential Brokerage operations in Southern California.</code> | <code>NRT announces executive appointments at its Coldwell Banker operations in Southern California</code> |
360
- | <code>A new survey shows 30 percent of Californians use Twitter, and more and more of us are using our smart phones to go online.</code> | <code>Survey: 30 percent of Californians use Twitter</code> |
361
  * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
362
  ```json
363
  {
@@ -371,7 +366,7 @@ You can finetune this model on your own dataset.
371
  #### nli-pairs
372
 
373
  * Dataset: [nli-pairs](https://huggingface.co/datasets/sentence-transformers/all-nli) at [d482672](https://huggingface.co/datasets/sentence-transformers/all-nli/tree/d482672c8e74ce18da116f430137434ba2e52fab)
374
- * Size: 2,000 evaluation samples
375
  * Columns: <code>sentence1</code> and <code>sentence2</code>
376
  * Approximate statistics based on the first 1000 samples:
377
  | | sentence1 | sentence2 |
@@ -395,7 +390,7 @@ You can finetune this model on your own dataset.
395
  #### qnli-contrastive
396
 
397
  * Dataset: [qnli-contrastive](https://huggingface.co/datasets/nyu-mll/glue) at [bcdcba7](https://huggingface.co/datasets/nyu-mll/glue/tree/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c)
398
- * Size: 2,000 evaluation samples
399
  * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
400
  * Approximate statistics based on the first 1000 samples:
401
  | | sentence1 | sentence2 | label |
@@ -414,11 +409,11 @@ You can finetune this model on your own dataset.
414
  #### Non-Default Hyperparameters
415
 
416
  - `eval_strategy`: steps
417
- - `per_device_train_batch_size`: 64
418
- - `per_device_eval_batch_size`: 16
419
- - `learning_rate`: 3e-06
420
  - `weight_decay`: 1e-10
421
- - `num_train_epochs`: 5
422
  - `lr_scheduler_type`: cosine
423
  - `warmup_ratio`: 0.33
424
  - `save_safetensors`: False
@@ -427,7 +422,6 @@ You can finetune this model on your own dataset.
427
  - `hub_model_id`: bobox/DeBERTaV3-small-GeneralSentenceTransformer-checkpoints-tmp
428
  - `hub_strategy`: checkpoint
429
  - `batch_sampler`: no_duplicates
430
- - `multi_dataset_batch_sampler`: round_robin
431
 
432
  #### All Hyperparameters
433
  <details><summary>Click to expand</summary>
@@ -436,19 +430,19 @@ You can finetune this model on your own dataset.
436
  - `do_predict`: False
437
  - `eval_strategy`: steps
438
  - `prediction_loss_only`: True
439
- - `per_device_train_batch_size`: 64
440
- - `per_device_eval_batch_size`: 16
441
  - `per_gpu_train_batch_size`: None
442
  - `per_gpu_eval_batch_size`: None
443
  - `gradient_accumulation_steps`: 1
444
  - `eval_accumulation_steps`: None
445
- - `learning_rate`: 3e-06
446
  - `weight_decay`: 1e-10
447
  - `adam_beta1`: 0.9
448
  - `adam_beta2`: 0.999
449
  - `adam_epsilon`: 1e-08
450
  - `max_grad_norm`: 1.0
451
- - `num_train_epochs`: 5
452
  - `max_steps`: -1
453
  - `lr_scheduler_type`: cosine
454
  - `lr_scheduler_kwargs`: {}
@@ -539,34 +533,23 @@ You can finetune this model on your own dataset.
539
  - `optim_target_modules`: None
540
  - `batch_eval_metrics`: False
541
  - `batch_sampler`: no_duplicates
542
- - `multi_dataset_batch_sampler`: round_robin
543
 
544
  </details>
545
 
546
  ### Training Logs
547
  | Epoch | Step | Training Loss | qnli-contrastive loss | nli-pairs loss |
548
  |:------:|:----:|:-------------:|:---------------------:|:--------------:|
549
- | None | 0 | - | 6.0041 | 4.0946 |
550
- | 0.25 | 116 | 4.9013 | 5.9679 | 4.0430 |
551
- | 0.5 | 232 | 4.6399 | 5.5328 | 3.8479 |
552
- | 0.75 | 348 | 4.4683 | 4.2996 | 3.6937 |
553
- | 1.0 | 464 | 3.8129 | 2.8062 | 3.4773 |
554
- | 1.2457 | 580 | 3.3971 | 1.8330 | 3.1263 |
555
- | 1.4957 | 696 | 2.7459 | 1.2780 | 2.7294 |
556
- | 1.7457 | 812 | 2.8721 | 0.9296 | 2.2870 |
557
- | 1.9957 | 928 | 2.5066 | 0.6388 | 2.0548 |
558
- | 2.2414 | 1044 | 2.3223 | 0.5312 | 1.8876 |
559
- | 2.4914 | 1160 | 2.1771 | 0.4300 | 1.7922 |
560
- | 2.7414 | 1276 | 2.2549 | 0.3610 | 1.6473 |
561
- | 2.9914 | 1392 | 2.2168 | 0.2929 | 1.5590 |
562
- | 3.2371 | 1508 | 2.0581 | 0.2678 | 1.5177 |
563
- | 3.4871 | 1624 | 1.9654 | 0.2392 | 1.5037 |
564
- | 3.7371 | 1740 | 2.1107 | 0.2234 | 1.4557 |
565
- | 3.9871 | 1856 | 2.0709 | 0.2094 | 1.4287 |
566
- | 4.2328 | 1972 | 1.9489 | 0.2072 | 1.4167 |
567
- | 4.4828 | 2088 | 1.8238 | 0.2019 | 1.4155 |
568
- | 4.7328 | 2204 | 2.1587 | 0.2005 | 1.4136 |
569
- | 4.9828 | 2320 | 1.929 | 0.2005 | 1.4132 |
570
 
571
 
572
  ### Framework Versions
@@ -607,15 +590,14 @@ You can finetune this model on your own dataset.
607
  }
608
  ```
609
 
610
- #### AnglELoss
611
  ```bibtex
612
- @misc{li2023angleoptimized,
613
- title={AnglE-optimized Text Embeddings},
614
- author={Xianming Li and Jing Li},
615
- year={2023},
616
- eprint={2309.12871},
617
- archivePrefix={arXiv},
618
- primaryClass={cs.CL}
619
  }
620
  ```
621
 
 
7
  - sentence-similarity
8
  - feature-extraction
9
  - generated_from_trainer
10
+ - dataset_size:689221
11
  - loss:MultipleNegativesRankingLoss
12
+ - loss:CoSENTLoss
13
  - loss:GISTEmbedLoss
14
  - loss:OnlineContrastiveLoss
15
  - loss:MultipleNegativesSymmetricRankingLoss
 
23
  - sentence-transformers/xsum
24
  - sentence-transformers/sentence-compression
25
  widget:
26
+ - source_sentence: What are the exceptions in the constitution that require special
27
+ considerations to amend?
28
  sentences:
29
+ - The river makes a distinctive turn to the north near Chur.
30
+ - The Victorian Constitution can be amended by the Parliament of Victoria, except
31
+ for certain "entrenched" provisions that require either an absolute majority in
32
+ both houses, a three-fifths majority in both houses, or the approval of the Victorian
33
+ people in a referendum, depending on the provision.
34
+ - A new arrangement of the theme, once again by Gold, was introduced in the 2007
35
+ Christmas special episode, "Voyage of the Damned"; Gold returned as composer for
36
+ the 2010 series.
37
+ - source_sentence: What is the name of a Bodhisattva vow?
 
 
 
 
 
38
  sentences:
39
+ - In Tibetan Buddhism the teachers of Dharma in Tibet are most commonly called a
40
+ Lama.
41
+ - This origin of chloroplasts was first suggested by the Russian biologist Konstantin
42
+ Mereschkowski in 1905 after Andreas Schimper observed in 1883 that chloroplasts
43
+ closely resemble cyanobacteria.
44
+ - The announcement came a day after Setanta Sports confirmed that it would launch
45
+ in March as a subscription service on the digital terrestrial platform, and on
46
+ the same day that NTL's services re-branded as Virgin Media.
47
+ - source_sentence: Two dogs run around inside a fence.
48
  sentences:
49
+ - A young woman tennis player have many tennis balls.
50
+ - Two dogs are inside a fence.
51
+ - A little girl in red plays tennis.
52
+ - source_sentence: A little boy wearing a blue stiped shirt has a party hat on his
53
+ head and is playing in a puddle.
 
54
  sentences:
55
+ - The party boy is playing in a puddle.
56
+ - There is a crowd
57
+ - Four people are skiing
58
+ - source_sentence: Two wrestlers jump in a ring while an official watches.
 
59
  sentences:
60
+ - The man was walking.
61
+ - Two men are dressed in makeup
62
+ - Two wrestlers were just tagged in on a tag team match.
63
  pipeline_tag: sentence-similarity
64
  ---
65
 
 
120
  model = SentenceTransformer("bobox/DeBERTaV3-small-GeneralSentenceTransformer-checkpoints-tmp")
121
  # Run inference
122
  sentences = [
123
+ 'Two wrestlers jump in a ring while an official watches.',
124
+ 'Two wrestlers were just tagged in on a tag team match.',
125
+ 'Two men are dressed in makeup',
126
  ]
127
  embeddings = model.encode(sentences)
128
  print(embeddings.shape)
 
177
  #### nli-pairs
178
 
179
  * Dataset: [nli-pairs](https://huggingface.co/datasets/sentence-transformers/all-nli) at [d482672](https://huggingface.co/datasets/sentence-transformers/all-nli/tree/d482672c8e74ce18da116f430137434ba2e52fab)
180
+ * Size: 150,000 training samples
181
  * Columns: <code>sentence1</code> and <code>sentence2</code>
182
  * Approximate statistics based on the first 1000 samples:
183
  | | sentence1 | sentence2 |
 
214
  | <code>A plane is taking off.</code> | <code>An air plane is taking off.</code> | <code>1.0</code> |
215
  | <code>A man is playing a large flute.</code> | <code>A man is playing a flute.</code> | <code>0.76</code> |
216
  | <code>A man is spreading shreded cheese on a pizza.</code> | <code>A man is spreading shredded cheese on an uncooked pizza.</code> | <code>0.76</code> |
217
+ * Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
218
  ```json
219
  {
220
  "scale": 20.0,
221
+ "similarity_fct": "pairwise_cos_sim"
222
  }
223
  ```
224
 
225
  #### vitaminc-pairs
226
 
227
  * Dataset: [vitaminc-pairs](https://huggingface.co/datasets/tals/vitaminc) at [be6febb](https://huggingface.co/datasets/tals/vitaminc/tree/be6febb761b0b2807687e61e0b5282e459df2fa0)
228
+ * Size: 75,142 training samples
229
  * Columns: <code>label</code>, <code>sentence1</code>, and <code>sentence2</code>
230
  * Approximate statistics based on the first 1000 samples:
231
+ | | label | sentence1 | sentence2 |
232
+ |:--------|:-----------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
233
+ | type | int | string | string |
234
+ | details | <ul><li>1: 100.00%</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 17.44 tokens</li><li>max: 53 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 38.0 tokens</li><li>max: 151 tokens</li></ul> |
235
  * Samples:
236
+ | label | sentence1 | sentence2 |
237
+ |:---------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
238
+ | <code>1</code> | <code>Penguins has a rating of less than 92 % , defined by more than 20 reviews on Rotten Tomatoes .</code> | <code>On review aggregator Rotten Tomatoes , the film holds an approval rating of 91 % based on 22 reviews , with an average rating of 7.14/10 .</code> |
239
+ | <code>1</code> | <code>Fluoxetine , acts as a positive allosteric modulator of the GABAA receptor at high concentrations , as does norfluoxetine though more potently .</code> | <code>In addition , it acts as a positive allosteric modulator of the GABAA receptor at high concentrations , and norfluoxetine does the same but more potently , actions which may be clinically-relevant .</code> |
240
+ | <code>1</code> | <code>Andrew Robertson is considered by many experts to be one of the best left backs .</code> | <code>He is considered by many pundits to be one of the best left backs in the world due to his pace and crossing ability.</code> |
241
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
242
  ```json
243
  {'guide': SentenceTransformer(
 
250
  #### qnli-contrastive
251
 
252
  * Dataset: [qnli-contrastive](https://huggingface.co/datasets/nyu-mll/glue) at [bcdcba7](https://huggingface.co/datasets/nyu-mll/glue/tree/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c)
253
+ * Size: 104,743 training samples
254
  * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
255
  * Approximate statistics based on the first 1000 samples:
256
  | | sentence1 | sentence2 | label |
257
  |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------|
258
  | type | string | string | int |
259
+ | details | <ul><li>min: 3 tokens</li><li>mean: 13.82 tokens</li><li>max: 39 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 34.56 tokens</li><li>max: 110 tokens</li></ul> | <ul><li>0: 100.00%</li></ul> |
260
  * Samples:
261
+ | sentence1 | sentence2 | label |
262
+ |:-----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
263
+ | <code>Which Formula One racing team developed the C-X75's used for filming.</code> | <code>The C-X75s used for filming were developed by the engineering division of Formula One racing team Williams, who built the original C-X75 prototype for Jaguar.</code> | <code>0</code> |
264
+ | <code>When did the University of Michigan leave Detroit?</code> | <code>In June 2009, the Michigan State University College of Osteopathic Medicine which is based in East Lansing opened a satellite campus located at the Detroit Medical Center.</code> | <code>0</code> |
265
+ | <code>When did the Vlachs migrate into the region?</code> | <code>The Gorals of southern Poland and northern Slovakia are partially descended from Romance-speaking Vlachs who migrated into the region from the 14th to 17th centuries and were absorbed into the local population.</code> | <code>0</code> |
266
  * Loss: [<code>OnlineContrastiveLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#onlinecontrastiveloss)
267
 
268
  #### scitail-pairs-qa
 
271
  * Size: 14,987 training samples
272
  * Columns: <code>sentence2</code> and <code>sentence1</code>
273
  * Approximate statistics based on the first 1000 samples:
274
+ | | sentence2 | sentence1 |
275
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
276
+ | type | string | string |
277
+ | details | <ul><li>min: 7 tokens</li><li>mean: 16.04 tokens</li><li>max: 41 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 15.14 tokens</li><li>max: 34 tokens</li></ul> |
278
  * Samples:
279
+ | sentence2 | sentence1 |
280
+ |:--------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------|
281
+ | <code>Voltage is not the same as energy, but means the energy per unit charge.</code> | <code>What term is not the same as energy, but means the energy per unit charge?</code> |
282
+ | <code>A jellyfish does not have a circulatory system.</code> | <code>Name the type of system that a jellyfish does not have?</code> |
283
+ | <code>Insight learning is based on past experience and reasoning.</code> | <code>What type of learning is based on past experience and reasoning?</code> |
284
  * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
285
  ```json
286
  {
 
298
  | | sentence1 | sentence2 |
299
  |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
300
  | type | string | string |
301
+ | details | <ul><li>min: 6 tokens</li><li>mean: 23.99 tokens</li><li>max: 65 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 15.54 tokens</li><li>max: 39 tokens</li></ul> |
302
  * Samples:
303
+ | sentence1 | sentence2 |
304
+ |:-----------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------|
305
+ | <code>A) A calorie is a unit of measure used to express the amount of energy a food produces in the body.</code> | <code>Another unit of energy, used widely in the health professions and everyday life, is calorie ( cal )?</code> |
306
+ | <code>solid 1 A state that retains shape independent of the shape of the container it occupies.</code> | <code>Solid takes neither the shape nor the volume of its container.</code> |
307
+ | <code>Sometimes the two sides of a fracture moved due to the pressure and a fault was formed.</code> | <code>A fault is the fracture caused when rocks on both sides move.</code> |
308
  * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
309
  ```json
310
  {
 
316
  #### xsum-pairs
317
 
318
  * Dataset: [xsum-pairs](https://huggingface.co/datasets/sentence-transformers/xsum) at [788ddaf](https://huggingface.co/datasets/sentence-transformers/xsum/tree/788ddafe04e539956d56b567bc32a036ee7b9206)
319
+ * Size: 150,000 training samples
320
  * Columns: <code>sentence1</code> and <code>sentence2</code>
321
  * Approximate statistics based on the first 1000 samples:
322
  | | sentence1 | sentence2 |
323
  |:--------|:-------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
324
  | type | string | string |
325
+ | details | <ul><li>min: 13 tokens</li><li>mean: 346.32 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 26.95 tokens</li><li>max: 66 tokens</li></ul> |
326
  * Samples:
327
+ | sentence1 | sentence2 |
328
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------|
329
+ | <code>Jacob Murphy fired in his 10th goal of the season from inside the box to give the Canaries the lead at the break.<br>Adam Hammill, Angus MacDonald and Marley Watkins all went close for the visitors after the restart.<br>Norwich then stretched their lead thanks to MacDonald's own goal to leave them five points behind sixth-placed Sheffield Wednesday.<br>Victory means caretaker boss Alan Irvine has now claimed four points from a possible six since the departure of Alex Neil.<br>The hosts dominated the early proceedings, with Jonny Howson and Alex Pritchard both being denied by Barnsley keeper Adam Davies.<br>After Hammill had a goal ruled out for a clear offside at the other end, winger Murphy gave the Canaries a deserved lead moments before the break when, having being picked out by Cameron Jerome, he drilled a shot low and into the corner of the net.<br>Hammill was unlucky to not get a strong enough flick on Andy Yiadom's cross to make it 1-1 after the restart and MacDonald saw a close-range effort well saved by Michael McGovern from the resulting corner.<br>But, after Steven Naismith fired over for the Canaries with just the keeper to beat, they doubled their lead in fortunate circumstances as an effort from Murphy deflected off MacDonald into the net.<br>Jerome and Howson then went close to adding a third as Norwich coasted to three points.<br>Norwich caretaker manager Alan Irvine:<br>"I was asked to take charge for two games and I have done that. I haven't heard anything more about what happens going forward, but I should imagine I will be speaking to someone soon to find out what happens next week.<br>"If that is to be my last game in charge it was a good way to finish - and the win certainly makes it interesting as far as the play-offs are concerned.<br>"Being five points behind sounds a lot better than being eight points behind - and as I said last week there are still plenty of points to play for."<br>Barnsley manager Paul Heckingbottom:<br>"The take-away message from that game is hit the target, score goals.<br>"There were plenty of positives to take away from it, but if you are going to get anything in this league you have got to be clinical in front of goal.<br>"It's frustrating, but there is still plenty to play for. We will keep striving to get that perfect performance and obviously want to win as many games as possible between now and the end of the season."<br>Match ends, Norwich City 2, Barnsley 0.<br>Second Half ends, Norwich City 2, Barnsley 0.<br>Hand ball by Nélson Oliveira (Norwich City).<br>Attempt missed. Ryan Kent (Barnsley) right footed shot from the centre of the box is close, but misses to the left. Assisted by Ryan Hedges with a cross.<br>Attempt saved. Nélson Oliveira (Norwich City) left footed shot from outside the box is saved in the centre of the goal.<br>Alex Pritchard (Norwich City) wins a free kick in the attacking half.<br>Foul by Alex Mowatt (Barnsley).<br>Corner, Barnsley. Conceded by Jonny Howson.<br>Foul by Graham Dorrans (Norwich City).<br>Matthew James (Barnsley) wins a free kick in the defensive half.<br>Attempt missed. Tom Bradshaw (Barnsley) left footed shot from the centre of the box is too high. Assisted by Gethin Jones with a cross.<br>Attempt missed. Steven Naismith (Norwich City) right footed shot from the right side of the box misses to the left. Assisted by Alex Pritchard.<br>Corner, Norwich City. Conceded by Angus MacDonald.<br>Attempt blocked. Jonny Howson (Norwich City) right footed shot from the right side of the box is blocked. Assisted by Graham Dorrans with a through ball.<br>Substitution, Norwich City. Graham Dorrans replaces Jacob Murphy.<br>Substitution, Norwich City. Nélson Oliveira replaces Cameron Jerome.<br>Substitution, Barnsley. Ryan Hedges replaces Adam Hammill.<br>Attempt missed. Ryan Kent (Barnsley) left footed shot from the centre of the box is high and wide to the left. Assisted by Matthew James with a cross.<br>Attempt saved. Cameron Jerome (Norwich City) right footed shot from the centre of the box is saved in the bottom right corner. Assisted by Jacob Murphy with a through ball.<br>Substitution, Barnsley. Alex Mowatt replaces Marley Watkins.<br>Corner, Barnsley. Conceded by Ivo Pinto.<br>Corner, Barnsley. Conceded by Russell Martin.<br>Attempt blocked. Tom Bradshaw (Barnsley) right footed shot from the right side of the box is blocked. Assisted by Ryan Kent.<br>Own Goal by Angus MacDonald, Barnsley. Norwich City 2, Barnsley 0.<br>Attempt saved. Jacob Murphy (Norwich City) right footed shot from the centre of the box is saved in the bottom right corner. Assisted by Alex Pritchard.<br>Attempt saved. Steven Naismith (Norwich City) left footed shot from the centre of the box is saved in the bottom left corner. Assisted by Steven Whittaker with a cross.<br>Ivo Pinto (Norwich City) wins a free kick in the defensive half.<br>Foul by Adam Hammill (Barnsley).<br>Attempt saved. Ryan Kent (Barnsley) right footed shot from outside the box is saved in the centre of the goal. Assisted by Marley Watkins.<br>Attempt missed. Josh Scowen (Barnsley) right footed shot from outside the box is high and wide to the right. Assisted by Adam Hammill.<br>Jacob Murphy (Norwich City) wins a free kick in the attacking half.<br>Foul by Angus MacDonald (Barnsley).<br>Ryan Bennett (Norwich City) wins a free kick in the defensive half.<br>Foul by Marc Roberts (Barnsley).<br>Ivo Pinto (Norwich City) is shown the yellow card for a bad foul.<br>Foul by Ivo Pinto (Norwich City).<br>Ryan Kent (Barnsley) wins a free kick in the attacking half.<br>Foul by Ryan Bennett (Norwich City).<br>Tom Bradshaw (Barnsley) wins a free kick in the attacking half.<br>Attempt missed. Steven Naismith (Norwich City) left footed shot from the left side of the box is too high. Assisted by Alex Pritchard.</code> | <code>Norwich City kept their Championship play-off hopes alive by beating Barnsley at Carrow Road.</code> |
330
+ | <code>Political reporter Samantha Maiden said the offensive text, which also contained strong language, was intended for disgraced ex-minister Jamie Briggs.<br>She said Mr Dutton apologised for the message about her article referring to Mr Briggs' recent resignation.<br>The BBC has approached Mr Dutton's office for comment.<br>He reportedly told News Corp in a statement he is expecting a "tough time" in Ms Maiden's next article.<br>"Sam and I have exchanged some robust language over the years so we had a laugh after this and I apologised to her straightaway, which she took in good faith," Mr Dutton was quoted as saying.<br>Former Cities Minister Jamie Briggs resigned last week following a complaint from a female public servant over his alleged conduct during a night out in Hong Kong.</code> | <code>Australia's Immigration Minister Peter Dutton has reportedly apologised for mistakenly sending an SMS to a journalist, calling her a "mad witch".</code> |
331
+ | <code>Demonstrators have moved around several sites since April to highlight a crisis in temporary housing.<br>The council's lawyer told the court "trespass, highways and planning laws" were the grounds for the case.<br>The cost to the council in terms of additional policing, security and legal costs has exceeded £100,000, he added.<br>Ahead of the hearing, tents were set up and a banner reading "The homeless resistance" was hung outside Manchester Civil Justice Centre.<br>'Grave and serious'<br>Protesters said they hoped to be offered "permanent, suitable accommodation".<br>Some had earlier refused temporary accommodation offered by the council because they said it was "not suitable" and they felt unsafe.<br>The council said it had engaged with the protestors and had offered them support, but it could not accept anti-social behaviour and disruption to residents and businesses.<br>Councillor Nigel Murphy added the exclusion order was "designed to prevent the recurrence of camps and not targeted at individual rough sleepers".<br>He said the council would work with police and court bailiffs to "regain possession" of areas taken over by camps in St Ann's Square and Castlefield as soon as possible.<br>John Clegg, from Unison's community branch, said there was a lack of social housing in Manchester.<br>He added: "There is a large amount of money for building private flats, more hotels are going up all the time, but there are no plans to build any social housing. That's wrong. That's absolutely wrong."<br>"In our view an injunction is a form of gating, and sending out a message that poor people are not wanted and should not be coming in to the city centre."</code> | <code>A Manchester City Council application for an injunction to stop the setting up of homeless camps in the city centre has been granted.</code> |
332
  * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
333
  ```json
334
  {
 
340
  #### compression-pairs
341
 
342
  * Dataset: [compression-pairs](https://huggingface.co/datasets/sentence-transformers/sentence-compression) at [605bc91](https://huggingface.co/datasets/sentence-transformers/sentence-compression/tree/605bc91d95631895ba25b6eda51a3cb596976c90)
343
+ * Size: 180,000 training samples
344
  * Columns: <code>sentence1</code> and <code>sentence2</code>
345
  * Approximate statistics based on the first 1000 samples:
346
  | | sentence1 | sentence2 |
347
  |:--------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
348
  | type | string | string |
349
+ | details | <ul><li>min: 10 tokens</li><li>mean: 31.89 tokens</li><li>max: 125 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 10.21 tokens</li><li>max: 28 tokens</li></ul> |
350
  * Samples:
351
+ | sentence1 | sentence2 |
352
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------|
353
+ | <code>The USHL completed an expansion draft on Monday as 10 players who were on the rosters of USHL teams during the 2009-10 season were selected by the League's two newest entries, the Muskegon Lumberjacks and Dubuque Fighting Saints.</code> | <code>USHL completes expansion draft</code> |
354
+ | <code>Major League Baseball Commissioner Bud Selig will be speaking at St. Norbert College next month.</code> | <code>Bud Selig to speak at St. Norbert College</code> |
355
+ | <code>It's fresh cherry time in Michigan and the best time to enjoy this delicious and nutritious fruit.</code> | <code>It's cherry time</code> |
356
  * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
357
  ```json
358
  {
 
366
  #### nli-pairs
367
 
368
  * Dataset: [nli-pairs](https://huggingface.co/datasets/sentence-transformers/all-nli) at [d482672](https://huggingface.co/datasets/sentence-transformers/all-nli/tree/d482672c8e74ce18da116f430137434ba2e52fab)
369
+ * Size: 6,808 evaluation samples
370
  * Columns: <code>sentence1</code> and <code>sentence2</code>
371
  * Approximate statistics based on the first 1000 samples:
372
  | | sentence1 | sentence2 |
 
390
  #### qnli-contrastive
391
 
392
  * Dataset: [qnli-contrastive](https://huggingface.co/datasets/nyu-mll/glue) at [bcdcba7](https://huggingface.co/datasets/nyu-mll/glue/tree/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c)
393
+ * Size: 5,463 evaluation samples
394
  * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
395
  * Approximate statistics based on the first 1000 samples:
396
  | | sentence1 | sentence2 | label |
 
409
  #### Non-Default Hyperparameters
410
 
411
  - `eval_strategy`: steps
412
+ - `per_device_train_batch_size`: 94
413
+ - `per_device_eval_batch_size`: 32
414
+ - `learning_rate`: 2e-05
415
  - `weight_decay`: 1e-10
416
+ - `num_train_epochs`: 2
417
  - `lr_scheduler_type`: cosine
418
  - `warmup_ratio`: 0.33
419
  - `save_safetensors`: False
 
422
  - `hub_model_id`: bobox/DeBERTaV3-small-GeneralSentenceTransformer-checkpoints-tmp
423
  - `hub_strategy`: checkpoint
424
  - `batch_sampler`: no_duplicates
 
425
 
426
  #### All Hyperparameters
427
  <details><summary>Click to expand</summary>
 
430
  - `do_predict`: False
431
  - `eval_strategy`: steps
432
  - `prediction_loss_only`: True
433
+ - `per_device_train_batch_size`: 94
434
+ - `per_device_eval_batch_size`: 32
435
  - `per_gpu_train_batch_size`: None
436
  - `per_gpu_eval_batch_size`: None
437
  - `gradient_accumulation_steps`: 1
438
  - `eval_accumulation_steps`: None
439
+ - `learning_rate`: 2e-05
440
  - `weight_decay`: 1e-10
441
  - `adam_beta1`: 0.9
442
  - `adam_beta2`: 0.999
443
  - `adam_epsilon`: 1e-08
444
  - `max_grad_norm`: 1.0
445
+ - `num_train_epochs`: 2
446
  - `max_steps`: -1
447
  - `lr_scheduler_type`: cosine
448
  - `lr_scheduler_kwargs`: {}
 
533
  - `optim_target_modules`: None
534
  - `batch_eval_metrics`: False
535
  - `batch_sampler`: no_duplicates
536
+ - `multi_dataset_batch_sampler`: proportional
537
 
538
  </details>
539
 
540
  ### Training Logs
541
  | Epoch | Step | Training Loss | qnli-contrastive loss | nli-pairs loss |
542
  |:------:|:----:|:-------------:|:---------------------:|:--------------:|
543
+ | None | 0 | - | 20.1737 | 4.0959 |
544
+ | 0.1001 | 734 | 4.796 | - | - |
545
+ | 0.2001 | 1468 | 1.3015 | 0.0358 | 0.9115 |
546
+ | 0.3002 | 2202 | 0.89 | - | - |
547
+ | 0.4002 | 2936 | 0.716 | 0.0168 | 0.5944 |
548
+ | 0.5003 | 3670 | 0.6365 | - | - |
549
+ | 0.6003 | 4404 | 0.5883 | 0.0164 | 0.4975 |
550
+ | 0.7004 | 5138 | 0.5192 | - | - |
551
+ | 0.8004 | 5872 | 0.4961 | 0.0288 | 0.4450 |
552
+ | 0.9005 | 6606 | 0.6035 | - | - |
 
 
 
 
 
 
 
 
 
 
 
553
 
554
 
555
  ### Framework Versions
 
590
  }
591
  ```
592
 
593
+ #### CoSENTLoss
594
  ```bibtex
595
+ @online{kexuefm-8847,
596
+ title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
597
+ author={Su Jianlin},
598
+ year={2022},
599
+ month={Jan},
600
+ url={https://kexue.fm/archives/8847},
 
601
  }
602
  ```
603
 
last-checkpoint/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2415efd86ddb31b8ccd116ba8fb00cf3a2bb32e6d5d2ef1d307b59571c494cb5
3
  size 1130520122
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6a749be1ff609ad2bded40d5b2fb1132d3d648b50ef1b7246d14619faa8c58f8
3
  size 1130520122
last-checkpoint/pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:aa6de6db3864cbe5490abac662a29ae3c7a4c0dce0a1063f4172a3ba474b3b0e
3
  size 565251810
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5b7a78caf0b7de88dddf8c331c22dc8a0c8a8173693518132a3c2bc00703c2dc
3
  size 565251810
last-checkpoint/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ee7def8d6e19abe0010fc23ee8ceef6f2e3224be5f40cd1c4f4ae996d6eab300
3
  size 14180
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cdc73eeb20f0bc899b26bfb3842397bedfaaf0599682feea9dcc50aa3a5f6766
3
  size 14180
last-checkpoint/scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0d07db50179fc1817acd2aeda9c1e69355a330f49f4a4908ab69b93d19e89e01
3
  size 1064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c524663492ab13642dabc57fe5921f5cabb07eb2dedd76a5d83a640195afeb24
3
  size 1064
last-checkpoint/trainer_state.json CHANGED
@@ -1,479 +1,146 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 4.982758620689655,
5
- "eval_steps": 116,
6
- "global_step": 2320,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
- "epoch": 0.25,
13
- "grad_norm": 34.43177795410156,
14
- "learning_rate": 4.386422976501306e-07,
15
- "loss": 4.9013,
16
- "step": 116
17
  },
18
  {
19
- "epoch": 0.25,
20
- "eval_nli-pairs_loss": 4.042984962463379,
21
- "eval_nli-pairs_runtime": 1.346,
22
- "eval_nli-pairs_samples_per_second": 1485.842,
23
- "eval_nli-pairs_steps_per_second": 92.865,
24
- "step": 116
25
  },
26
  {
27
- "epoch": 0.25,
28
- "eval_qnli-contrastive_loss": 5.967944145202637,
29
- "eval_qnli-contrastive_runtime": 1.4504,
30
- "eval_qnli-contrastive_samples_per_second": 1378.915,
31
- "eval_qnli-contrastive_steps_per_second": 86.182,
32
- "step": 116
33
  },
34
  {
35
- "epoch": 0.5,
36
- "grad_norm": 12.268269538879395,
37
- "learning_rate": 8.929503916449087e-07,
38
- "loss": 4.6399,
39
- "step": 232
 
40
  },
41
  {
42
- "epoch": 0.5,
43
- "eval_nli-pairs_loss": 3.8479082584381104,
44
- "eval_nli-pairs_runtime": 1.2981,
45
- "eval_nli-pairs_samples_per_second": 1540.739,
46
- "eval_nli-pairs_steps_per_second": 96.296,
47
- "step": 232
48
  },
49
  {
50
- "epoch": 0.5,
51
- "eval_qnli-contrastive_loss": 5.532758712768555,
52
- "eval_qnli-contrastive_runtime": 1.4399,
53
- "eval_qnli-contrastive_samples_per_second": 1388.943,
54
- "eval_qnli-contrastive_steps_per_second": 86.809,
55
- "step": 232
56
  },
57
  {
58
- "epoch": 0.75,
59
- "grad_norm": 41.44931411743164,
60
- "learning_rate": 1.343342036553525e-06,
61
- "loss": 4.4683,
62
- "step": 348
 
63
  },
64
  {
65
- "epoch": 0.75,
66
- "eval_nli-pairs_loss": 3.69368314743042,
67
- "eval_nli-pairs_runtime": 1.304,
68
- "eval_nli-pairs_samples_per_second": 1533.712,
69
- "eval_nli-pairs_steps_per_second": 95.857,
70
- "step": 348
71
  },
72
  {
73
- "epoch": 0.75,
74
- "eval_qnli-contrastive_loss": 4.299612998962402,
75
- "eval_qnli-contrastive_runtime": 1.4461,
76
- "eval_qnli-contrastive_samples_per_second": 1383.027,
77
- "eval_qnli-contrastive_steps_per_second": 86.439,
78
- "step": 348
79
  },
80
  {
81
- "epoch": 1.0,
82
- "grad_norm": 16.208133697509766,
83
- "learning_rate": 1.7976501305483032e-06,
84
- "loss": 3.8129,
85
- "step": 464
86
  },
87
  {
88
- "epoch": 1.0,
89
- "eval_nli-pairs_loss": 3.477344036102295,
90
- "eval_nli-pairs_runtime": 1.2934,
91
- "eval_nli-pairs_samples_per_second": 1546.283,
92
- "eval_nli-pairs_steps_per_second": 96.643,
93
- "step": 464
94
  },
95
  {
96
- "epoch": 1.0,
97
- "eval_qnli-contrastive_loss": 2.806222915649414,
98
- "eval_qnli-contrastive_runtime": 1.4307,
99
- "eval_qnli-contrastive_samples_per_second": 1397.948,
100
- "eval_qnli-contrastive_steps_per_second": 87.372,
101
- "step": 464
102
  },
103
  {
104
- "epoch": 1.2456896551724137,
105
- "grad_norm": 98.47541046142578,
106
- "learning_rate": 2.251958224543081e-06,
107
- "loss": 3.3971,
108
- "step": 580
109
  },
110
  {
111
- "epoch": 1.2456896551724137,
112
- "eval_nli-pairs_loss": 3.126293182373047,
113
- "eval_nli-pairs_runtime": 1.3419,
114
- "eval_nli-pairs_samples_per_second": 1490.389,
115
- "eval_nli-pairs_steps_per_second": 93.149,
116
- "step": 580
117
  },
118
  {
119
- "epoch": 1.2456896551724137,
120
- "eval_qnli-contrastive_loss": 1.8329846858978271,
121
- "eval_qnli-contrastive_runtime": 1.5173,
122
- "eval_qnli-contrastive_samples_per_second": 1318.109,
123
- "eval_qnli-contrastive_steps_per_second": 82.382,
124
- "step": 580
125
  },
126
  {
127
- "epoch": 1.4956896551724137,
128
- "grad_norm": 16.574974060058594,
129
- "learning_rate": 2.706266318537859e-06,
130
- "loss": 2.7459,
131
- "step": 696
 
132
  },
133
  {
134
- "epoch": 1.4956896551724137,
135
- "eval_nli-pairs_loss": 2.72936749458313,
136
- "eval_nli-pairs_runtime": 1.3359,
137
- "eval_nli-pairs_samples_per_second": 1497.081,
138
- "eval_nli-pairs_steps_per_second": 93.568,
139
- "step": 696
140
- },
141
- {
142
- "epoch": 1.4956896551724137,
143
- "eval_qnli-contrastive_loss": 1.2779531478881836,
144
- "eval_qnli-contrastive_runtime": 1.4644,
145
- "eval_qnli-contrastive_samples_per_second": 1365.702,
146
- "eval_qnli-contrastive_steps_per_second": 85.356,
147
- "step": 696
148
- },
149
- {
150
- "epoch": 1.7456896551724137,
151
- "grad_norm": 201.21456909179688,
152
- "learning_rate": 2.9950983500630964e-06,
153
- "loss": 2.8721,
154
- "step": 812
155
- },
156
- {
157
- "epoch": 1.7456896551724137,
158
- "eval_nli-pairs_loss": 2.2870194911956787,
159
- "eval_nli-pairs_runtime": 1.3503,
160
- "eval_nli-pairs_samples_per_second": 1481.188,
161
- "eval_nli-pairs_steps_per_second": 92.574,
162
- "step": 812
163
- },
164
- {
165
- "epoch": 1.7456896551724137,
166
- "eval_qnli-contrastive_loss": 0.9296175837516785,
167
- "eval_qnli-contrastive_runtime": 1.4485,
168
- "eval_qnli-contrastive_samples_per_second": 1380.738,
169
- "eval_qnli-contrastive_steps_per_second": 86.296,
170
- "step": 812
171
- },
172
- {
173
- "epoch": 1.9956896551724137,
174
- "grad_norm": 12.68950366973877,
175
- "learning_rate": 2.9260214825373185e-06,
176
- "loss": 2.5066,
177
- "step": 928
178
- },
179
- {
180
- "epoch": 1.9956896551724137,
181
- "eval_nli-pairs_loss": 2.0547828674316406,
182
- "eval_nli-pairs_runtime": 1.2929,
183
- "eval_nli-pairs_samples_per_second": 1546.937,
184
- "eval_nli-pairs_steps_per_second": 96.684,
185
- "step": 928
186
- },
187
- {
188
- "epoch": 1.9956896551724137,
189
- "eval_qnli-contrastive_loss": 0.6387521028518677,
190
- "eval_qnli-contrastive_runtime": 1.4598,
191
- "eval_qnli-contrastive_samples_per_second": 1370.032,
192
- "eval_qnli-contrastive_steps_per_second": 85.627,
193
- "step": 928
194
- },
195
- {
196
- "epoch": 2.2413793103448274,
197
- "grad_norm": 12.477553367614746,
198
- "learning_rate": 2.7788810181030676e-06,
199
- "loss": 2.3223,
200
- "step": 1044
201
- },
202
- {
203
- "epoch": 2.2413793103448274,
204
- "eval_nli-pairs_loss": 1.8876054286956787,
205
- "eval_nli-pairs_runtime": 1.4105,
206
- "eval_nli-pairs_samples_per_second": 1417.897,
207
- "eval_nli-pairs_steps_per_second": 88.619,
208
- "step": 1044
209
- },
210
- {
211
- "epoch": 2.2413793103448274,
212
- "eval_qnli-contrastive_loss": 0.5312397480010986,
213
- "eval_qnli-contrastive_runtime": 1.4798,
214
- "eval_qnli-contrastive_samples_per_second": 1351.505,
215
- "eval_qnli-contrastive_steps_per_second": 84.469,
216
- "step": 1044
217
- },
218
- {
219
- "epoch": 2.4913793103448274,
220
- "grad_norm": 7.06378173828125,
221
- "learning_rate": 2.5617317540023054e-06,
222
- "loss": 2.1771,
223
- "step": 1160
224
- },
225
- {
226
- "epoch": 2.4913793103448274,
227
- "eval_nli-pairs_loss": 1.7922124862670898,
228
- "eval_nli-pairs_runtime": 1.392,
229
- "eval_nli-pairs_samples_per_second": 1436.768,
230
- "eval_nli-pairs_steps_per_second": 89.798,
231
- "step": 1160
232
- },
233
- {
234
- "epoch": 2.4913793103448274,
235
- "eval_qnli-contrastive_loss": 0.4299691915512085,
236
- "eval_qnli-contrastive_runtime": 1.4683,
237
- "eval_qnli-contrastive_samples_per_second": 1362.111,
238
- "eval_qnli-contrastive_steps_per_second": 85.132,
239
- "step": 1160
240
- },
241
- {
242
- "epoch": 2.7413793103448274,
243
- "grad_norm": 11.377643585205078,
244
- "learning_rate": 2.286460925335848e-06,
245
- "loss": 2.2549,
246
- "step": 1276
247
- },
248
- {
249
- "epoch": 2.7413793103448274,
250
- "eval_nli-pairs_loss": 1.647322177886963,
251
- "eval_nli-pairs_runtime": 1.3347,
252
- "eval_nli-pairs_samples_per_second": 1498.487,
253
- "eval_nli-pairs_steps_per_second": 93.655,
254
- "step": 1276
255
- },
256
- {
257
- "epoch": 2.7413793103448274,
258
- "eval_qnli-contrastive_loss": 0.36095327138900757,
259
- "eval_qnli-contrastive_runtime": 1.5309,
260
- "eval_qnli-contrastive_samples_per_second": 1306.387,
261
- "eval_qnli-contrastive_steps_per_second": 81.649,
262
- "step": 1276
263
- },
264
- {
265
- "epoch": 2.9913793103448274,
266
- "grad_norm": 8.12272834777832,
267
- "learning_rate": 1.968137471297685e-06,
268
- "loss": 2.2168,
269
- "step": 1392
270
- },
271
- {
272
- "epoch": 2.9913793103448274,
273
- "eval_nli-pairs_loss": 1.5589631795883179,
274
- "eval_nli-pairs_runtime": 1.2874,
275
- "eval_nli-pairs_samples_per_second": 1553.463,
276
- "eval_nli-pairs_steps_per_second": 97.091,
277
- "step": 1392
278
- },
279
- {
280
- "epoch": 2.9913793103448274,
281
- "eval_qnli-contrastive_loss": 0.2929060459136963,
282
- "eval_qnli-contrastive_runtime": 1.4489,
283
- "eval_qnli-contrastive_samples_per_second": 1380.312,
284
- "eval_qnli-contrastive_steps_per_second": 86.269,
285
- "step": 1392
286
- },
287
- {
288
- "epoch": 3.2370689655172415,
289
- "grad_norm": 14.837372779846191,
290
- "learning_rate": 1.6241871278299807e-06,
291
- "loss": 2.0581,
292
- "step": 1508
293
- },
294
- {
295
- "epoch": 3.2370689655172415,
296
- "eval_nli-pairs_loss": 1.5176913738250732,
297
- "eval_nli-pairs_runtime": 1.3641,
298
- "eval_nli-pairs_samples_per_second": 1466.194,
299
- "eval_nli-pairs_steps_per_second": 91.637,
300
- "step": 1508
301
- },
302
- {
303
- "epoch": 3.2370689655172415,
304
- "eval_qnli-contrastive_loss": 0.2678474187850952,
305
- "eval_qnli-contrastive_runtime": 1.5105,
306
- "eval_qnli-contrastive_samples_per_second": 1324.09,
307
- "eval_qnli-contrastive_steps_per_second": 82.756,
308
- "step": 1508
309
- },
310
- {
311
- "epoch": 3.4870689655172415,
312
- "grad_norm": 145.98458862304688,
313
- "learning_rate": 1.2734385039668851e-06,
314
- "loss": 1.9654,
315
- "step": 1624
316
- },
317
- {
318
- "epoch": 3.4870689655172415,
319
- "eval_nli-pairs_loss": 1.5036982297897339,
320
- "eval_nli-pairs_runtime": 1.3348,
321
- "eval_nli-pairs_samples_per_second": 1498.309,
322
- "eval_nli-pairs_steps_per_second": 93.644,
323
- "step": 1624
324
- },
325
- {
326
- "epoch": 3.4870689655172415,
327
- "eval_qnli-contrastive_loss": 0.23919104039669037,
328
- "eval_qnli-contrastive_runtime": 1.5129,
329
- "eval_qnli-contrastive_samples_per_second": 1321.928,
330
- "eval_qnli-contrastive_steps_per_second": 82.621,
331
- "step": 1624
332
- },
333
- {
334
- "epoch": 3.737068965517241,
335
- "grad_norm": 10.36633586883545,
336
- "learning_rate": 9.350923617759733e-07,
337
- "loss": 2.1107,
338
- "step": 1740
339
- },
340
- {
341
- "epoch": 3.737068965517241,
342
- "eval_nli-pairs_loss": 1.4556528329849243,
343
- "eval_nli-pairs_runtime": 1.4177,
344
- "eval_nli-pairs_samples_per_second": 1410.69,
345
- "eval_nli-pairs_steps_per_second": 88.168,
346
- "step": 1740
347
- },
348
- {
349
- "epoch": 3.737068965517241,
350
- "eval_qnli-contrastive_loss": 0.22335131466388702,
351
- "eval_qnli-contrastive_runtime": 1.5405,
352
- "eval_qnli-contrastive_samples_per_second": 1298.243,
353
- "eval_qnli-contrastive_steps_per_second": 81.14,
354
- "step": 1740
355
- },
356
- {
357
- "epoch": 3.987068965517241,
358
- "grad_norm": 178.8499755859375,
359
- "learning_rate": 6.276705238124942e-07,
360
- "loss": 2.0709,
361
- "step": 1856
362
- },
363
- {
364
- "epoch": 3.987068965517241,
365
- "eval_nli-pairs_loss": 1.4286649227142334,
366
- "eval_nli-pairs_runtime": 1.2929,
367
- "eval_nli-pairs_samples_per_second": 1546.95,
368
- "eval_nli-pairs_steps_per_second": 96.684,
369
- "step": 1856
370
- },
371
- {
372
- "epoch": 3.987068965517241,
373
- "eval_qnli-contrastive_loss": 0.2093583047389984,
374
- "eval_qnli-contrastive_runtime": 1.4454,
375
- "eval_qnli-contrastive_samples_per_second": 1383.695,
376
- "eval_qnli-contrastive_steps_per_second": 86.481,
377
- "step": 1856
378
- },
379
- {
380
- "epoch": 4.232758620689655,
381
- "grad_norm": 4.497424602508545,
382
- "learning_rate": 3.680019472369961e-07,
383
- "loss": 1.9489,
384
- "step": 1972
385
- },
386
- {
387
- "epoch": 4.232758620689655,
388
- "eval_nli-pairs_loss": 1.4166995286941528,
389
- "eval_nli-pairs_runtime": 1.3578,
390
- "eval_nli-pairs_samples_per_second": 1472.956,
391
- "eval_nli-pairs_steps_per_second": 92.06,
392
- "step": 1972
393
- },
394
- {
395
- "epoch": 4.232758620689655,
396
- "eval_qnli-contrastive_loss": 0.2071654498577118,
397
- "eval_qnli-contrastive_runtime": 1.489,
398
- "eval_qnli-contrastive_samples_per_second": 1343.182,
399
- "eval_qnli-contrastive_steps_per_second": 83.949,
400
- "step": 1972
401
- },
402
- {
403
- "epoch": 4.482758620689655,
404
- "grad_norm": 8.940858840942383,
405
- "learning_rate": 1.7030146916085187e-07,
406
- "loss": 1.8238,
407
- "step": 2088
408
- },
409
- {
410
- "epoch": 4.482758620689655,
411
- "eval_nli-pairs_loss": 1.4154555797576904,
412
- "eval_nli-pairs_runtime": 1.4109,
413
- "eval_nli-pairs_samples_per_second": 1417.564,
414
- "eval_nli-pairs_steps_per_second": 88.598,
415
- "step": 2088
416
- },
417
- {
418
- "epoch": 4.482758620689655,
419
- "eval_qnli-contrastive_loss": 0.20185217261314392,
420
- "eval_qnli-contrastive_runtime": 1.4817,
421
- "eval_qnli-contrastive_samples_per_second": 1349.799,
422
- "eval_qnli-contrastive_steps_per_second": 84.362,
423
- "step": 2088
424
- },
425
- {
426
- "epoch": 4.732758620689655,
427
- "grad_norm": 4.952300548553467,
428
- "learning_rate": 4.5391654754460885e-08,
429
- "loss": 2.1587,
430
- "step": 2204
431
- },
432
- {
433
- "epoch": 4.732758620689655,
434
- "eval_nli-pairs_loss": 1.4136021137237549,
435
- "eval_nli-pairs_runtime": 1.3576,
436
- "eval_nli-pairs_samples_per_second": 1473.214,
437
- "eval_nli-pairs_steps_per_second": 92.076,
438
- "step": 2204
439
- },
440
- {
441
- "epoch": 4.732758620689655,
442
- "eval_qnli-contrastive_loss": 0.20051518082618713,
443
- "eval_qnli-contrastive_runtime": 1.585,
444
- "eval_qnli-contrastive_samples_per_second": 1261.81,
445
- "eval_qnli-contrastive_steps_per_second": 78.863,
446
- "step": 2204
447
- },
448
- {
449
- "epoch": 4.982758620689655,
450
- "grad_norm": 10.16062068939209,
451
- "learning_rate": 1.1034588846758897e-10,
452
- "loss": 1.929,
453
- "step": 2320
454
- },
455
- {
456
- "epoch": 4.982758620689655,
457
- "eval_nli-pairs_loss": 1.4131741523742676,
458
- "eval_nli-pairs_runtime": 1.2998,
459
- "eval_nli-pairs_samples_per_second": 1538.653,
460
- "eval_nli-pairs_steps_per_second": 96.166,
461
- "step": 2320
462
- },
463
- {
464
- "epoch": 4.982758620689655,
465
- "eval_qnli-contrastive_loss": 0.2004699856042862,
466
- "eval_qnli-contrastive_runtime": 1.449,
467
- "eval_qnli-contrastive_samples_per_second": 1380.303,
468
- "eval_qnli-contrastive_steps_per_second": 86.269,
469
- "step": 2320
470
  }
471
  ],
472
- "logging_steps": 116,
473
- "max_steps": 2320,
474
  "num_input_tokens_seen": 0,
475
- "num_train_epochs": 5,
476
- "save_steps": 1160,
477
  "stateful_callbacks": {
478
  "TrainerControl": {
479
  "args": {
@@ -481,13 +148,13 @@
481
  "should_evaluate": false,
482
  "should_log": false,
483
  "should_save": true,
484
- "should_training_stop": true
485
  },
486
  "attributes": {}
487
  }
488
  },
489
  "total_flos": 0.0,
490
- "train_batch_size": 64,
491
  "trial_name": null,
492
  "trial_params": null
493
  }
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 1.0,
5
+ "eval_steps": 1468,
6
+ "global_step": 7336,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
+ "epoch": 0.10005452562704471,
13
+ "grad_norm": 11.328764915466309,
14
+ "learning_rate": 3.0111524163568777e-06,
15
+ "loss": 4.796,
16
+ "step": 734
17
  },
18
  {
19
+ "epoch": 0.20010905125408943,
20
+ "grad_norm": 8.898218154907227,
21
+ "learning_rate": 6.042957455596862e-06,
22
+ "loss": 1.3015,
23
+ "step": 1468
 
24
  },
25
  {
26
+ "epoch": 0.20010905125408943,
27
+ "eval_nli-pairs_loss": 0.9115270376205444,
28
+ "eval_nli-pairs_runtime": 3.7365,
29
+ "eval_nli-pairs_samples_per_second": 1822.03,
30
+ "eval_nli-pairs_steps_per_second": 57.005,
31
+ "step": 1468
32
  },
33
  {
34
+ "epoch": 0.20010905125408943,
35
+ "eval_qnli-contrastive_loss": 0.03581170365214348,
36
+ "eval_qnli-contrastive_runtime": 3.4652,
37
+ "eval_qnli-contrastive_samples_per_second": 1576.52,
38
+ "eval_qnli-contrastive_steps_per_second": 49.347,
39
+ "step": 1468
40
  },
41
  {
42
+ "epoch": 0.30016357688113415,
43
+ "grad_norm": 5.427567005157471,
44
+ "learning_rate": 9.074762494836845e-06,
45
+ "loss": 0.89,
46
+ "step": 2202
 
47
  },
48
  {
49
+ "epoch": 0.40021810250817885,
50
+ "grad_norm": 3.5350825786590576,
51
+ "learning_rate": 1.210656753407683e-05,
52
+ "loss": 0.716,
53
+ "step": 2936
 
54
  },
55
  {
56
+ "epoch": 0.40021810250817885,
57
+ "eval_nli-pairs_loss": 0.5944256782531738,
58
+ "eval_nli-pairs_runtime": 3.5093,
59
+ "eval_nli-pairs_samples_per_second": 1940.006,
60
+ "eval_nli-pairs_steps_per_second": 60.696,
61
+ "step": 2936
62
  },
63
  {
64
+ "epoch": 0.40021810250817885,
65
+ "eval_qnli-contrastive_loss": 0.016810204833745956,
66
+ "eval_qnli-contrastive_runtime": 3.3523,
67
+ "eval_qnli-contrastive_samples_per_second": 1629.638,
68
+ "eval_qnli-contrastive_steps_per_second": 51.01,
69
+ "step": 2936
70
  },
71
  {
72
+ "epoch": 0.5002726281352236,
73
+ "grad_norm": 9.52629280090332,
74
+ "learning_rate": 1.5134242048740192e-05,
75
+ "loss": 0.6365,
76
+ "step": 3670
 
77
  },
78
  {
79
+ "epoch": 0.6003271537622683,
80
+ "grad_norm": 11.004107475280762,
81
+ "learning_rate": 1.8166047087980174e-05,
82
+ "loss": 0.5883,
83
+ "step": 4404
84
  },
85
  {
86
+ "epoch": 0.6003271537622683,
87
+ "eval_nli-pairs_loss": 0.49746155738830566,
88
+ "eval_nli-pairs_runtime": 3.5691,
89
+ "eval_nli-pairs_samples_per_second": 1907.459,
90
+ "eval_nli-pairs_steps_per_second": 59.678,
91
+ "step": 4404
92
  },
93
  {
94
+ "epoch": 0.6003271537622683,
95
+ "eval_qnli-contrastive_loss": 0.016411835327744484,
96
+ "eval_qnli-contrastive_runtime": 3.3328,
97
+ "eval_qnli-contrastive_samples_per_second": 1639.167,
98
+ "eval_qnli-contrastive_steps_per_second": 51.308,
99
+ "step": 4404
100
  },
101
  {
102
+ "epoch": 0.700381679389313,
103
+ "grad_norm": 9.219084739685059,
104
+ "learning_rate": 1.995708117651556e-05,
105
+ "loss": 0.5192,
106
+ "step": 5138
107
  },
108
  {
109
+ "epoch": 0.8004362050163577,
110
+ "grad_norm": 7.066645622253418,
111
+ "learning_rate": 1.946925849011595e-05,
112
+ "loss": 0.4961,
113
+ "step": 5872
 
114
  },
115
  {
116
+ "epoch": 0.8004362050163577,
117
+ "eval_nli-pairs_loss": 0.44500303268432617,
118
+ "eval_nli-pairs_runtime": 3.6078,
119
+ "eval_nli-pairs_samples_per_second": 1887.014,
120
+ "eval_nli-pairs_steps_per_second": 59.038,
121
+ "step": 5872
122
  },
123
  {
124
+ "epoch": 0.8004362050163577,
125
+ "eval_qnli-contrastive_loss": 0.028794871643185616,
126
+ "eval_qnli-contrastive_runtime": 3.335,
127
+ "eval_qnli-contrastive_samples_per_second": 1638.074,
128
+ "eval_qnli-contrastive_steps_per_second": 51.274,
129
+ "step": 5872
130
  },
131
  {
132
+ "epoch": 0.9004907306434023,
133
+ "grad_norm": 0.0,
134
+ "learning_rate": 1.8462745233342613e-05,
135
+ "loss": 0.6035,
136
+ "step": 6606
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
137
  }
138
  ],
139
+ "logging_steps": 734,
140
+ "max_steps": 14672,
141
  "num_input_tokens_seen": 0,
142
+ "num_train_epochs": 2,
143
+ "save_steps": 7336,
144
  "stateful_callbacks": {
145
  "TrainerControl": {
146
  "args": {
 
148
  "should_evaluate": false,
149
  "should_log": false,
150
  "should_save": true,
151
+ "should_training_stop": false
152
  },
153
  "attributes": {}
154
  }
155
  },
156
  "total_flos": 0.0,
157
+ "train_batch_size": 94,
158
  "trial_name": null,
159
  "trial_params": null
160
  }
last-checkpoint/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e3927bcd6a0300af031763633963eca0939bd43c5b1e09d98c207a1618aa7358
3
  size 5624
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3b7d09e178b7126106b67daae47202be477d3e950354366f995c79aad0ae7f8f
3
  size 5624