bobox's picture
Cached Gist embedd loss
a86524d verified
metadata
base_model: microsoft/deberta-v3-small
datasets:
  - tals/vitaminc
language:
  - en
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
  - cosine_accuracy
  - cosine_accuracy_threshold
  - cosine_f1
  - cosine_f1_threshold
  - cosine_precision
  - cosine_recall
  - cosine_ap
  - dot_accuracy
  - dot_accuracy_threshold
  - dot_f1
  - dot_f1_threshold
  - dot_precision
  - dot_recall
  - dot_ap
  - manhattan_accuracy
  - manhattan_accuracy_threshold
  - manhattan_f1
  - manhattan_f1_threshold
  - manhattan_precision
  - manhattan_recall
  - manhattan_ap
  - euclidean_accuracy
  - euclidean_accuracy_threshold
  - euclidean_f1
  - euclidean_f1_threshold
  - euclidean_precision
  - euclidean_recall
  - euclidean_ap
  - max_accuracy
  - max_accuracy_threshold
  - max_f1
  - max_f1_threshold
  - max_precision
  - max_recall
  - max_ap
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:225247
  - loss:CachedGISTEmbedLoss
widget:
  - source_sentence: what is exfo toolbox
    sentences:
      - >-
        Eye dilation from eye drops used for examination of the eye usually
        lasts from 4 to 24 hours, depending upon the strength of the drop and
        upon the individual patient.
      - >-
        Garden Grove is a city in northern Orange County in the U.S. state of
        California, 34 miles (55 km) south of Los Angeles. The population was
        170,883 at the 2010 United States Census. State Route 22, also known as
        the Garden Grove Freeway, passes through the city in an east-west
        direction.
      - >-
        EXFO ToolBox Office is a product that offers you a collection of viewers
        and analyzers. It enables you to manage and analyze results acquired
        from fiber optic test modules and instruments.
  - source_sentence: >-
      More than 273 people have died from the 2019-20 coronavirus outside
      mainland China .
    sentences:
      - >-
        More than 3,700 people have died : around 3,100 in mainland China and
        around 550 in all other countries combined .
      - >-
        More than 3,200 people have died : almost 3,000 in mainland China and
        around 275 in other countries .
      - more than 4,900 deaths have been attributed to COVID-19 .
  - source_sentence: >-
      Ultrasound, a diagnostic technology, uses high-frequency vibrations
      transmitted into any tissue in contact with the transducer.
    sentences:
      - >-
        What diagnostic technology uses high-frequency vibrations transmitted
        into any tissue in contact with the transducer?
      - The abnormal cells cannot carry oxygen properly and can get stuck where?
      - What type of organism is a bacteria?
  - source_sentence: >-
      When you add moles of gas to a baloon by blowing it up, the volume
      increases.
    sentences:
      - What shape is the lens of the eye?
      - >-
        What happens to the volume of a balloon when you add moles of gas to it
        by blowing up?
      - >-
        Most turtle bodies are covered by a special bony or cartilaginous shell
        developed from their what?
  - source_sentence: >-
      What was the name of eleven rulers of the 19th and 20th Egyptian
      dynasties?
    sentences:
      - >-
        Airlines Yugoslavia 1968 - 1968 Renamed ^ Comments : Aviogenex was
        formed on 21May1968 as Genex Airlines. Restarted under current name on
        30Apr1969 & liquidated in Feb2015 ^ Genealogy : Genex Airlines
        >Aviogenex 1968 - 1986 Renamed ^ Comments : Adria Airways was formed on
        14Mar1961 & operations started on 30Jun1961 as Adria Airways, renamed to
        Inex in 1968 and back to Adria again in 1986. National airline of
        Slovenia ^ Genealogy : Adria Airways >Inex Adria Airways >Adria Airways
        JAT (Jugoslovenski Aerotransport) 1947 - 2003 Renamed ^ Comments : Air
        Serbia was founded as Aeroput on 17Jun1927, renamed to JAT on 01Apr1947.
        Started ops on 15Apr1947, Renamed again on 08Aug2003 to JAT Airways &
        reformed as Air Serbia on 26Oct2013 ^ Genealogy : Aeroput >JAT
        (Jugoslovenski Aerotransport) >JAT Airways >Air Serbia Jugoslovenski
        Aerotransport
      - >-
        List of Rulers of Ancient Egypt and Nubia | Lists of Rulers | Heilbrunn
        Timeline of Art History | The Metropolitan Museum of Art The
        Metropolitan Museum of Art List of Rulers of Ancient Egypt and Nubia See
        works of art 30.8.234 52.127.4 Our knowledge of the succession of
        Egyptian kings is based on kinglists kept by the ancient Egyptians
        themselves. The most famous are the Palermo Stone, which covers the
        period from the earliest dynasties to the middle of Dynasty 5; the
        Abydos Kinglist, which Seti I had carved on his temple at Abydos; and
        the Turin Canon, a papyrus that covers the period from the earliest
        dynasties to the reign of Ramesses II. All are incomplete or
        fragmentary. We also rely on the History of Egypt written by Manetho in
        the third century B.C. A priest in the temple at Heliopolis, Manetho had
        access to many original sources and it was he who divided the kings into
        the thirty dynasties we use today. It is to this structure of dynasties
        and listed kings that we now attempt to link an absolute chronology of
        dates in terms of our own calendrical system. The process is made
        difficult by the fragmentary condition of the kinglists and by
        differences in the calendrical years used at various times. Some
        astronomical observations from the ancient Egyptians have survived,
        allowing us to calculate absolute dates within a margin of error.
        Synchronisms with the other civilizations of the ancient world are also
        of limited use.
      - >-
        What is the "Jack Sprat" nursery rhyme? | Reference.com What is the
        "Jack Sprat" nursery rhyme? A: Quick Answer "Jack Sprat" is a
        traditional English nursery rhyme whose main verse says, "Jack Sprat
        could eat no fat. His wife could eat no lean. And so between them both,
        you see, they licked the platter clean." Though it was likely sung by
        children long before, "Jack Sprat" was first published around 1765 in
        the compilation "Mother Goose's Melody." Full Answer According to
        Rhymes.org, a U.K. website devoted to nursery rhyme lyrics and origins,
        the "Jack Sprat" nursery rhyme has its origins in British history. In
        one interpretation, Jack Sprat was King Charles I, who ruled England in
        the early part of the 17th century, and his wife was Queen Henrietta
        Maria. Parliament refused to finance the king's war with Spain, which
        made him lean. However, the queen fattened the coffers by levying an
        illegal war tax. In an alternative version, the "Jack Sprat" nursery
        rhyme is linked to King Richard and his brother John of the Robin Hood
        legend. Jack Sprat was King John, the usurper who tried to take over the
        crown when King Richard went off to fight in the Crusades in the 12th
        century. When King Richard was captured, John had to raise a ransom to
        rescue him, leaving the country lean. The wife was Joan, daughter of the
        Earl of Gloucester, the greedy wife of King John. However, after King
        Richard died and John became king, he had his marriage with Joan
        annulled.
model-index:
  - name: SentenceTransformer based on microsoft/deberta-v3-small
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.7673854808079448
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7776198286738142
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.782368447545155
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7720687033298573
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7882638792170585
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7775073687564514
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.7669147371310585
            name: Pearson Dot
          - type: spearman_dot
            value: 0.7762894632049069
            name: Spearman Dot
          - type: pearson_max
            value: 0.7882638792170585
            name: Pearson Max
          - type: spearman_max
            value: 0.7776198286738142
            name: Spearman Max
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: allNLI dev
          type: allNLI-dev
        metrics:
          - type: cosine_accuracy
            value: 0.708984375
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.8714957237243652
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.5913043478260869
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.7768557071685791
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.4738675958188153
            name: Cosine Precision
          - type: cosine_recall
            value: 0.7861271676300579
            name: Cosine Recall
          - type: cosine_ap
            value: 0.5644305887001508
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.7109375
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 674.426025390625
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.5913043478260869
            name: Dot F1
          - type: dot_f1_threshold
            value: 603.435302734375
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.4738675958188153
            name: Dot Precision
          - type: dot_recall
            value: 0.7861271676300579
            name: Dot Recall
          - type: dot_ap
            value: 0.5664868031504724
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.7109375
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 294.4728088378906
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.5935483870967742
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 401.1482849121094
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.4726027397260274
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.7976878612716763
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.5642688421649988
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.7109375
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 14.565500259399414
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.5913043478260869
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 18.60409164428711
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.4738675958188153
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.7861271676300579
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.5645557227019772
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.7109375
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 674.426025390625
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.5935483870967742
            name: Max F1
          - type: max_f1_threshold
            value: 603.435302734375
            name: Max F1 Threshold
          - type: max_precision
            value: 0.4738675958188153
            name: Max Precision
          - type: max_recall
            value: 0.7976878612716763
            name: Max Recall
          - type: max_ap
            value: 0.5664868031504724
            name: Max Ap
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: Qnli dev
          type: Qnli-dev
        metrics:
          - type: cosine_accuracy
            value: 0.6796875
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.7726649045944214
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.6925675675675677
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.7317887544631958
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.5758426966292135
            name: Cosine Precision
          - type: cosine_recall
            value: 0.8686440677966102
            name: Cosine Recall
          - type: cosine_ap
            value: 0.7302564198016936
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.67578125
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 598.0419921875
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.6912751677852348
            name: Dot F1
          - type: dot_f1_threshold
            value: 565.4718017578125
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.5722222222222222
            name: Dot Precision
          - type: dot_recall
            value: 0.8728813559322034
            name: Dot Recall
          - type: dot_ap
            value: 0.7300462025003271
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.6796875
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 404.8309020996094
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.6933333333333332
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 444.99224853515625
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.5714285714285714
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.8813559322033898
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.7369214156436785
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.6796875
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 18.790739059448242
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.6934306569343065
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 19.35132598876953
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.6089743589743589
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.8050847457627118
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.7307381840067684
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.6796875
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 598.0419921875
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.6934306569343065
            name: Max F1
          - type: max_f1_threshold
            value: 565.4718017578125
            name: Max F1 Threshold
          - type: max_precision
            value: 0.6089743589743589
            name: Max Precision
          - type: max_recall
            value: 0.8813559322033898
            name: Max Recall
          - type: max_ap
            value: 0.7369214156436785
            name: Max Ap

SentenceTransformer based on microsoft/deberta-v3-small

This is a sentence-transformers model finetuned from microsoft/deberta-v3-small. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/deberta-v3-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Language: en

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model 
  (1): AdvancedWeightedPooling(
    (linear_cls): Linear(in_features=768, out_features=768, bias=True)
    (linear_mean): Linear(in_features=768, out_features=768, bias=True)
    (mha): MultiheadAttention(
      (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
    )
    (layernorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm_cls): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm_mean): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("bobox/DeBERTa3-s-CustomPoolin-v3-step1")
# Run inference
sentences = [
    'What was the name of eleven rulers of the 19th and 20th Egyptian dynasties?',
    'List of Rulers of Ancient Egypt and Nubia | Lists of Rulers | Heilbrunn Timeline of Art History | The Metropolitan Museum of Art The Metropolitan Museum of Art List of Rulers of Ancient Egypt and Nubia See works of art 30.8.234 52.127.4 Our knowledge of the succession of Egyptian kings is based on kinglists kept by the ancient Egyptians themselves. The most famous are the Palermo Stone, which covers the period from the earliest dynasties to the middle of Dynasty 5; the Abydos Kinglist, which Seti I had carved on his temple at Abydos; and the Turin Canon, a papyrus that covers the period from the earliest dynasties to the reign of Ramesses II. All are incomplete or fragmentary. We also rely on the History of Egypt written by Manetho in the third century B.C. A priest in the temple at Heliopolis, Manetho had access to many original sources and it was he who divided the kings into the thirty dynasties we use today. It is to this structure of dynasties and listed kings that we now attempt to link an absolute chronology of dates in terms of our own calendrical system. The process is made difficult by the fragmentary condition of the kinglists and by differences in the calendrical years used at various times. Some astronomical observations from the ancient Egyptians have survived, allowing us to calculate absolute dates within a margin of error. Synchronisms with the other civilizations of the ancient world are also of limited use.',
    'What is the "Jack Sprat" nursery rhyme? | Reference.com What is the "Jack Sprat" nursery rhyme? A: Quick Answer "Jack Sprat" is a traditional English nursery rhyme whose main verse says, "Jack Sprat could eat no fat. His wife could eat no lean. And so between them both, you see, they licked the platter clean." Though it was likely sung by children long before, "Jack Sprat" was first published around 1765 in the compilation "Mother Goose\'s Melody." Full Answer According to Rhymes.org, a U.K. website devoted to nursery rhyme lyrics and origins, the "Jack Sprat" nursery rhyme has its origins in British history. In one interpretation, Jack Sprat was King Charles I, who ruled England in the early part of the 17th century, and his wife was Queen Henrietta Maria. Parliament refused to finance the king\'s war with Spain, which made him lean. However, the queen fattened the coffers by levying an illegal war tax. In an alternative version, the "Jack Sprat" nursery rhyme is linked to King Richard and his brother John of the Robin Hood legend. Jack Sprat was King John, the usurper who tried to take over the crown when King Richard went off to fight in the Crusades in the 12th century. When King Richard was captured, John had to raise a ransom to rescue him, leaving the country lean. The wife was Joan, daughter of the Earl of Gloucester, the greedy wife of King John. However, after King Richard died and John became king, he had his marriage with Joan annulled.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.7674
spearman_cosine 0.7776
pearson_manhattan 0.7824
spearman_manhattan 0.7721
pearson_euclidean 0.7883
spearman_euclidean 0.7775
pearson_dot 0.7669
spearman_dot 0.7763
pearson_max 0.7883
spearman_max 0.7776

Binary Classification

Metric Value
cosine_accuracy 0.709
cosine_accuracy_threshold 0.8715
cosine_f1 0.5913
cosine_f1_threshold 0.7769
cosine_precision 0.4739
cosine_recall 0.7861
cosine_ap 0.5644
dot_accuracy 0.7109
dot_accuracy_threshold 674.426
dot_f1 0.5913
dot_f1_threshold 603.4353
dot_precision 0.4739
dot_recall 0.7861
dot_ap 0.5665
manhattan_accuracy 0.7109
manhattan_accuracy_threshold 294.4728
manhattan_f1 0.5935
manhattan_f1_threshold 401.1483
manhattan_precision 0.4726
manhattan_recall 0.7977
manhattan_ap 0.5643
euclidean_accuracy 0.7109
euclidean_accuracy_threshold 14.5655
euclidean_f1 0.5913
euclidean_f1_threshold 18.6041
euclidean_precision 0.4739
euclidean_recall 0.7861
euclidean_ap 0.5646
max_accuracy 0.7109
max_accuracy_threshold 674.426
max_f1 0.5935
max_f1_threshold 603.4353
max_precision 0.4739
max_recall 0.7977
max_ap 0.5665

Binary Classification

Metric Value
cosine_accuracy 0.6797
cosine_accuracy_threshold 0.7727
cosine_f1 0.6926
cosine_f1_threshold 0.7318
cosine_precision 0.5758
cosine_recall 0.8686
cosine_ap 0.7303
dot_accuracy 0.6758
dot_accuracy_threshold 598.042
dot_f1 0.6913
dot_f1_threshold 565.4718
dot_precision 0.5722
dot_recall 0.8729
dot_ap 0.73
manhattan_accuracy 0.6797
manhattan_accuracy_threshold 404.8309
manhattan_f1 0.6933
manhattan_f1_threshold 444.9922
manhattan_precision 0.5714
manhattan_recall 0.8814
manhattan_ap 0.7369
euclidean_accuracy 0.6797
euclidean_accuracy_threshold 18.7907
euclidean_f1 0.6934
euclidean_f1_threshold 19.3513
euclidean_precision 0.609
euclidean_recall 0.8051
euclidean_ap 0.7307
max_accuracy 0.6797
max_accuracy_threshold 598.042
max_f1 0.6934
max_f1_threshold 565.4718
max_precision 0.609
max_recall 0.8814
max_ap 0.7369

Training Details

Evaluation Dataset

vitaminc-pairs

  • Dataset: vitaminc-pairs at be6febb
  • Size: 128 evaluation samples
  • Columns: claim and evidence
  • Approximate statistics based on the first 128 samples:
    claim evidence
    type string string
    details
    • min: 9 tokens
    • mean: 21.42 tokens
    • max: 41 tokens
    • min: 11 tokens
    • mean: 35.55 tokens
    • max: 79 tokens
  • Samples:
    claim evidence
    Dragon Con had over 5000 guests . Among the more than 6000 guests and musical performers at the 2009 convention were such notables as Patrick Stewart , William Shatner , Leonard Nimoy , Terry Gilliam , Bruce Boxleitner , James Marsters , and Mary McDonnell .
    COVID-19 has reached more than 185 countries . As of , more than cases of COVID-19 have been reported in more than 190 countries and 200 territories , resulting in more than deaths .
    In March , Italy had 3.6x times more cases of coronavirus than China . As of 12 March , among nations with at least one million citizens , Italy has the world 's highest per capita rate of positive coronavirus cases at 206.1 cases per million people ( 3.6x times the rate of China ) and is the country with the second-highest number of positive cases as well as of deaths in the world , after China .
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.025}
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 100
  • per_device_eval_batch_size: 256
  • gradient_accumulation_steps: 2
  • lr_scheduler_type: cosine_with_min_lr
  • lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 1.6666666666666667e-05}
  • warmup_ratio: 0.33
  • save_safetensors: False
  • fp16: True
  • push_to_hub: True
  • hub_model_id: bobox/DeBERTa3-s-CustomPoolin-v3-step1-checkpoints-tmp
  • hub_strategy: all_checkpoints
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 100
  • per_device_eval_batch_size: 256
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: cosine_with_min_lr
  • lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 1.6666666666666667e-05}
  • warmup_ratio: 0.33
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: bobox/DeBERTa3-s-CustomPoolin-v3-step1-checkpoints-tmp
  • hub_strategy: all_checkpoints
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss vitaminc-pairs loss negation-triplets loss scitail-pairs-pos loss scitail-pairs-qa loss xsum-pairs loss sciq pairs loss qasc pairs loss openbookqa pairs loss msmarco pairs loss nq pairs loss trivia pairs loss gooaq pairs loss paws-pos loss global dataset loss sts-test_spearman_cosine allNLI-dev_max_ap Qnli-dev_max_ap
0.0168 8 10.2928 - - - - - - - - - - - - - - - - -
0.0336 16 9.2166 - - - - - - - - - - - - - - - - -
0.0504 24 9.4858 - - - - - - - - - - - - - - - - -
0.0672 32 10.6143 - - - - - - - - - - - - - - - - -
0.0840 40 8.7553 - - - - - - - - - - - - - - - - -
0.1008 48 10.9939 - - - - - - - - - - - - - - - - -
0.1176 56 7.6039 - - - - - - - - - - - - - - - - -
0.1345 64 5.9498 - - - - - - - - - - - - - - - - -
0.1513 72 7.3051 3.2988 3.9604 1.9818 2.1997 6.0515 0.6095 6.3199 4.8391 6.4886 6.6406 6.4894 6.1527 2.0082 4.9577 0.3066 0.3444 0.5627
0.1681 80 8.3034 - - - - - - - - - - - - - - - - -
0.1849 88 7.6669 - - - - - - - - - - - - - - - - -
0.2017 96 6.6415 - - - - - - - - - - - - - - - - -
0.2185 104 5.7797 - - - - - - - - - - - - - - - - -
0.2353 112 5.8361 - - - - - - - - - - - - - - - - -
0.2521 120 5.3339 - - - - - - - - - - - - - - - - -
0.2689 128 5.5908 - - - - - - - - - - - - - - - - -
0.2857 136 5.3209 - - - - - - - - - - - - - - - - -
0.3025 144 5.5359 3.3310 3.8580 1.4769 1.6994 5.4819 0.5385 5.2021 4.4410 5.3419 5.5506 5.6972 5.3376 1.4170 3.9169 0.2954 0.3795 0.6317
0.3193 152 5.4713 - - - - - - - - - - - - - - - - -
0.3361 160 4.9368 - - - - - - - - - - - - - - - - -
0.3529 168 4.6594 - - - - - - - - - - - - - - - - -
0.3697 176 4.8392 - - - - - - - - - - - - - - - - -
0.3866 184 4.414 - - - - - - - - - - - - - - - - -
0.4034 192 4.891 - - - - - - - - - - - - - - - - -
0.4202 200 4.4553 - - - - - - - - - - - - - - - - -
0.4370 208 3.9729 - - - - - - - - - - - - - - - - -
0.4538 216 3.7705 3.2468 3.6435 0.7890 0.7356 3.9327 0.4082 3.7175 3.5404 3.5351 4.0506 3.9953 3.6074 0.4195 2.4726 0.3791 0.4133 0.6779
0.4706 224 3.8409 - - - - - - - - - - - - - - - - -
0.4874 232 3.7894 - - - - - - - - - - - - - - - - -
0.5042 240 3.3523 - - - - - - - - - - - - - - - - -
0.5210 248 3.2407 - - - - - - - - - - - - - - - - -
0.5378 256 3.3203 - - - - - - - - - - - - - - - - -
0.5546 264 2.8457 - - - - - - - - - - - - - - - - -
0.5714 272 2.4181 - - - - - - - - - - - - - - - - -
0.5882 280 3.4589 - - - - - - - - - - - - - - - - -
0.6050 288 2.8203 3.1119 3.1485 0.4531 0.2652 2.6895 0.2656 2.5542 2.7523 2.6600 3.1773 3.2099 2.7316 0.2006 1.6342 0.5257 0.4717 0.7078
0.6218 296 2.4697 - - - - - - - - - - - - - - - - -
0.6387 304 2.4654 - - - - - - - - - - - - - - - - -
0.6555 312 2.4236 - - - - - - - - - - - - - - - - -
0.6723 320 2.2879 - - - - - - - - - - - - - - - - -
0.6891 328 2.2145 - - - - - - - - - - - - - - - - -
0.7059 336 1.8464 - - - - - - - - - - - - - - - - -
0.7227 344 2.0086 - - - - - - - - - - - - - - - - -
0.7395 352 2.0635 - - - - - - - - - - - - - - - - -
0.7563 360 1.8584 3.3202 2.5793 0.3434 0.1618 1.6759 0.1834 1.6454 2.1257 2.1938 2.5316 2.4558 2.0596 0.0984 1.2206 0.6610 0.5199 0.7119
0.7731 368 2.0286 - - - - - - - - - - - - - - - - -
0.7899 376 1.9389 - - - - - - - - - - - - - - - - -
0.8067 384 1.7453 - - - - - - - - - - - - - - - - -
0.8235 392 1.6629 - - - - - - - - - - - - - - - - -
0.8403 400 1.2724 - - - - - - - - - - - - - - - - -
0.8571 408 1.7824 - - - - - - - - - - - - - - - - -
0.8739 416 1.5826 - - - - - - - - - - - - - - - - -
0.8908 424 1.1971 - - - - - - - - - - - - - - - - -
0.9076 432 1.5228 3.3624 2.1952 0.3006 0.1223 1.1091 0.1582 1.2383 1.8664 1.7434 2.3959 2.0697 1.7563 0.0766 1.0193 0.7292 0.5194 0.7126
0.9244 440 1.3323 - - - - - - - - - - - - - - - - -
0.9412 448 1.5124 - - - - - - - - - - - - - - - - -
0.9580 456 1.5565 - - - - - - - - - - - - - - - - -
0.9748 464 1.3672 - - - - - - - - - - - - - - - - -
0.9916 472 1.0382 - - - - - - - - - - - - - - - - -
1.0084 480 1.0626 - - - - - - - - - - - - - - - - -
1.0252 488 1.3539 - - - - - - - - - - - - - - - - -
1.0420 496 1.1723 - - - - - - - - - - - - - - - - -
1.0588 504 1.4235 3.4031 1.9759 0.2554 0.0814 0.9034 0.1378 1.1603 1.7589 1.5608 2.1230 1.7719 1.6633 0.0720 0.9380 0.7523 0.5297 0.7129
1.0756 512 1.2283 - - - - - - - - - - - - - - - - -
1.0924 520 1.2455 - - - - - - - - - - - - - - - - -
1.1092 528 1.4265 - - - - - - - - - - - - - - - - -
1.1261 536 1.296 - - - - - - - - - - - - - - - - -
1.1429 544 0.8763 - - - - - - - - - - - - - - - - -
1.1597 552 1.5678 - - - - - - - - - - - - - - - - -
1.1765 560 1.2548 - - - - - - - - - - - - - - - - -
1.1933 568 1.3731 - - - - - - - - - - - - - - - - -
1.2101 576 1.3023 3.3815 1.8740 0.2373 0.0769 0.7711 0.1237 0.9432 1.6871 1.5070 1.9947 1.6041 1.5579 0.0721 0.8661 0.7642 0.5412 0.7159
1.2269 584 0.8135 - - - - - - - - - - - - - - - - -
1.2437 592 1.0259 - - - - - - - - - - - - - - - - -
1.2605 600 1.1896 - - - - - - - - - - - - - - - - -
1.2773 608 1.0532 - - - - - - - - - - - - - - - - -
1.2941 616 1.3221 - - - - - - - - - - - - - - - - -
1.3109 624 1.3136 - - - - - - - - - - - - - - - - -
1.3277 632 1.2238 - - - - - - - - - - - - - - - - -
1.3445 640 1.2407 - - - - - - - - - - - - - - - - -
1.3613 648 1.2245 3.4717 1.7962 0.2242 0.0488 0.7472 0.1108 0.9272 1.6692 1.3845 1.9117 1.3410 1.4387 0.0701 0.8505 0.7680 0.5471 0.7227
1.3782 656 1.0428 - - - - - - - - - - - - - - - - -
1.3950 664 1.1391 - - - - - - - - - - - - - - - - -
1.4118 672 1.2632 - - - - - - - - - - - - - - - - -
1.4286 680 0.9403 - - - - - - - - - - - - - - - - -
1.4454 688 0.7571 - - - - - - - - - - - - - - - - -
1.4622 696 0.9436 - - - - - - - - - - - - - - - - -
1.4790 704 1.1239 - - - - - - - - - - - - - - - - -
1.4958 712 0.9499 - - - - - - - - - - - - - - - - -
1.5126 720 1.0945 3.6495 1.6693 0.2157 0.0492 0.6830 0.1049 0.9140 1.5967 1.4397 1.7394 1.3303 1.4334 0.0603 0.8185 0.7815 0.5606 0.7098
1.5294 728 1.1161 - - - - - - - - - - - - - - - - -
1.5462 736 1.0056 - - - - - - - - - - - - - - - - -
1.5630 744 1.1743 - - - - - - - - - - - - - - - - -
1.5798 752 0.9153 - - - - - - - - - - - - - - - - -
1.5966 760 1.1589 - - - - - - - - - - - - - - - - -
1.6134 768 0.9187 - - - - - - - - - - - - - - - - -
1.6303 776 0.6937 - - - - - - - - - - - - - - - - -
1.6471 784 0.9704 - - - - - - - - - - - - - - - - -
1.6639 792 0.7343 3.5442 1.6493 0.2208 0.0249 0.6152 0.0969 0.7111 1.5369 1.4058 1.7066 1.2784 1.3419 0.0585 0.7827 0.7749 0.5627 0.7284
1.6807 800 1.2878 - - - - - - - - - - - - - - - - -
1.6975 808 0.9898 - - - - - - - - - - - - - - - - -
1.7143 816 0.7613 - - - - - - - - - - - - - - - - -
1.7311 824 0.9612 - - - - - - - - - - - - - - - - -
1.7479 832 1.1524 - - - - - - - - - - - - - - - - -
1.7647 840 0.827 - - - - - - - - - - - - - - - - -
1.7815 848 1.1898 - - - - - - - - - - - - - - - - -
1.7983 856 1.0117 - - - - - - - - - - - - - - - - -
1.8151 864 0.7019 3.4544 1.6149 0.2035 0.0181 0.5525 0.0999 0.6641 1.5456 1.3911 1.7188 1.2547 1.3517 0.0562 0.7473 0.7684 0.5697 0.7329
1.8319 872 0.8352 - - - - - - - - - - - - - - - - -
1.8487 880 0.7836 - - - - - - - - - - - - - - - - -
1.8655 888 1.0187 - - - - - - - - - - - - - - - - -
1.8824 896 0.74 - - - - - - - - - - - - - - - - -
1.8992 904 0.7263 - - - - - - - - - - - - - - - - -
1.9160 912 0.8073 - - - - - - - - - - - - - - - - -
1.9328 920 0.8185 - - - - - - - - - - - - - - - - -
1.9496 928 1.0992 - - - - - - - - - - - - - - - - -
1.9664 936 0.9973 3.5110 1.5776 0.2035 0.0250 0.5881 0.0934 0.6719 1.5059 1.2970 1.6186 1.1815 1.2714 0.0564 0.7213 0.7799 0.5544 0.7341
1.9832 944 0.6662 - - - - - - - - - - - - - - - - -
2.0 952 0.533 - - - - - - - - - - - - - - - - -
2.0168 960 0.7712 - - - - - - - - - - - - - - - - -
2.0336 968 0.6879 - - - - - - - - - - - - - - - - -
2.0504 976 0.7975 - - - - - - - - - - - - - - - - -
2.0672 984 0.873 - - - - - - - - - - - - - - - - -
2.0840 992 0.7995 - - - - - - - - - - - - - - - - -
2.1008 1000 1.0119 - - - - - - - - - - - - - - - - -
2.1176 1008 0.6317 3.6778 1.5845 0.2102 0.0228 0.5851 0.0977 0.6411 1.4752 1.2992 1.6314 1.1260 1.2683 0.0556 0.7329 0.7693 0.5614 0.7274
2.1345 1016 0.72 - - - - - - - - - - - - - - - - -
2.1513 1024 0.9418 - - - - - - - - - - - - - - - - -
2.1681 1032 0.7848 - - - - - - - - - - - - - - - - -
2.1849 1040 0.6965 - - - - - - - - - - - - - - - - -
2.2017 1048 1.0447 - - - - - - - - - - - - - - - - -
2.2185 1056 0.6361 - - - - - - - - - - - - - - - - -
2.2353 1064 0.6837 - - - - - - - - - - - - - - - - -
2.2521 1072 0.5713 - - - - - - - - - - - - - - - - -
2.2689 1080 0.8193 3.6399 1.5565 0.2069 0.0213 0.5440 0.0904 0.6057 1.4815 1.2856 1.6441 1.1469 1.2540 0.0543 0.7216 0.7765 0.5599 0.7322
2.2857 1088 0.9754 - - - - - - - - - - - - - - - - -
2.3025 1096 0.8932 - - - - - - - - - - - - - - - - -
2.3193 1104 0.8716 - - - - - - - - - - - - - - - - -
2.3361 1112 0.8787 - - - - - - - - - - - - - - - - -
2.3529 1120 0.9529 - - - - - - - - - - - - - - - - -
2.3697 1128 0.775 - - - - - - - - - - - - - - - - -
2.3866 1136 0.6178 - - - - - - - - - - - - - - - - -
2.4034 1144 0.8384 - - - - - - - - - - - - - - - - -
2.4202 1152 0.9425 3.5672 1.5244 0.2111 0.0162 0.5593 0.0893 0.5759 1.4933 1.2703 1.5815 1.1202 1.2132 0.0531 0.7058 0.7730 0.5635 0.7350
2.4370 1160 0.4551 - - - - - - - - - - - - - - - - -
2.4538 1168 0.6392 - - - - - - - - - - - - - - - - -
2.4706 1176 0.8341 - - - - - - - - - - - - - - - - -
2.4874 1184 0.7392 - - - - - - - - - - - - - - - - -
2.5042 1192 0.7646 - - - - - - - - - - - - - - - - -
2.5210 1200 0.8613 - - - - - - - - - - - - - - - - -
2.5378 1208 0.7585 - - - - - - - - - - - - - - - - -
2.5546 1216 1.0611 - - - - - - - - - - - - - - - - -
2.5714 1224 0.6506 3.6439 1.5040 0.2125 0.0162 0.5282 0.0863 0.5858 1.5073 1.2444 1.5493 1.1014 1.2073 0.0532 0.7022 0.7774 0.5647 0.7328
2.5882 1232 0.8525 - - - - - - - - - - - - - - - - -
2.6050 1240 0.6304 - - - - - - - - - - - - - - - - -
2.6218 1248 0.6354 - - - - - - - - - - - - - - - - -
2.6387 1256 0.6583 - - - - - - - - - - - - - - - - -
2.6555 1264 0.5964 - - - - - - - - - - - - - - - - -
2.6723 1272 0.818 - - - - - - - - - - - - - - - - -
2.6891 1280 0.8635 - - - - - - - - - - - - - - - - -
2.7059 1288 0.6389 - - - - - - - - - - - - - - - - -
2.7227 1296 0.6819 3.6131 1.5104 0.2084 0.0148 0.5229 0.0854 0.5588 1.4963 1.2766 1.5679 1.0982 1.2203 0.0529 0.7059 0.7762 0.5659 0.7355
2.7395 1304 0.7878 - - - - - - - - - - - - - - - - -
2.7563 1312 0.7638 - - - - - - - - - - - - - - - - -
2.7731 1320 0.8885 - - - - - - - - - - - - - - - - -
2.7899 1328 0.8184 - - - - - - - - - - - - - - - - -
2.8067 1336 0.7472 - - - - - - - - - - - - - - - - -
2.8235 1344 0.7012 - - - - - - - - - - - - - - - - -
2.8403 1352 0.4622 - - - - - - - - - - - - - - - - -
2.8571 1360 0.846 - - - - - - - - - - - - - - - - -
2.8739 1368 0.8308 3.6224 1.5088 0.2084 0.0148 0.5118 0.0858 0.5523 1.4941 1.2756 1.5808 1.0925 1.2114 0.0521 0.7022 0.7765 0.5662 0.7366
2.8908 1376 0.5334 - - - - - - - - - - - - - - - - -
2.9076 1384 0.7893 - - - - - - - - - - - - - - - - -
2.9244 1392 0.6897 - - - - - - - - - - - - - - - - -
2.9412 1400 0.7803 - - - - - - - - - - - - - - - - -
2.9580 1408 0.841 - - - - - - - - - - - - - - - - -
2.9748 1416 0.787 - - - - - - - - - - - - - - - - -
2.9916 1424 0.5861 - - - - - - - - - - - - - - - - -
3.0 1428 - 3.6139 1.5071 0.2084 0.0150 0.5124 0.0862 0.5532 1.4924 1.2700 1.5806 1.0905 1.2081 0.0519 0.6997 0.7776 0.5665 0.7369

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.2.0
  • Transformers: 4.44.2
  • PyTorch: 2.4.1+cu121
  • Accelerate: 0.34.2
  • Datasets: 3.0.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}