bobox's picture
Training in progress, step 110, checkpoint
6b3d1b1 verified
metadata
base_model: microsoft/deberta-v3-small
datasets: []
language: []
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
  - cosine_accuracy
  - cosine_accuracy_threshold
  - cosine_f1
  - cosine_f1_threshold
  - cosine_precision
  - cosine_recall
  - cosine_ap
  - dot_accuracy
  - dot_accuracy_threshold
  - dot_f1
  - dot_f1_threshold
  - dot_precision
  - dot_recall
  - dot_ap
  - manhattan_accuracy
  - manhattan_accuracy_threshold
  - manhattan_f1
  - manhattan_f1_threshold
  - manhattan_precision
  - manhattan_recall
  - manhattan_ap
  - euclidean_accuracy
  - euclidean_accuracy_threshold
  - euclidean_f1
  - euclidean_f1_threshold
  - euclidean_precision
  - euclidean_recall
  - euclidean_ap
  - max_accuracy
  - max_accuracy_threshold
  - max_f1
  - max_f1_threshold
  - max_precision
  - max_recall
  - max_ap
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:116445
  - loss:CachedGISTEmbedLoss
widget:
  - source_sentence: what is the main purpose of the brain
    sentences:
      - >-
        Brain Physiologically, the function of the brain is to exert centralized
        control over the other organs of the body. The brain acts on the rest of
        the body both by generating patterns of muscle activity and by driving
        the secretion of chemicals called hormones. This centralized control
        allows rapid and coordinated responses to changes in the environment.
        Some basic types of responsiveness such as reflexes can be mediated by
        the spinal cord or peripheral ganglia, but sophisticated purposeful
        control of behavior based on complex sensory input requires the
        information integrating capabilities of a centralized brain.
      - >-
        How do scientists know that some mountains were once at the bottom of an
        ocean?
      - >-
        The Smiths Wiki | Fandom powered by Wikia Share Ad blocker interference
        detected! Wikia is a free-to-use site that makes money from advertising.
        We have a modified experience for viewers using ad blockers Wikia is not
        accessible if you’ve made further modifications. Remove the custom ad
        blocker rule(s) and the page will load as expected. The Smiths were an
        English rock band formed in Manchester in 1982. Based on the songwriting
        partnership of Morrissey (vocals) and Johnny Marr (guitar), the band
        also included Andy Rourke (bass), Mike Joyce (drums) and for a brief
        time Craig Gannon (rhythm guitar). Critics have called them one of the
        most important alternative rock bands to emerge from the British
        independent music scene of the 1980s,and the group has had major
        influence on subsequent artists. Morrissey's lovelorn tales of
        alienation found an audience amongst youth culture bored by the
        ubiquitous synthesiser-pop bands of the early 1980s, while Marr's
        complex melodies helped return guitar-based music to popularity. The
        group were signed to the independent record label Rough Trade Records ,
        for whom they released four studio albums and several compilations, as
        well as numerous non-LP singles. Although they had limited commercial
        success outside the UK while they were still together, and never
        released a single that charted higher than number 10 in their home
        country, The Smiths won a growing following, and they remain cult and
        commercial favourites. The band broke up in 1987 amid disagreements
        between Morrissey and Marr and has turned down several offers to reform.
        Welcome to The Smiths Wiki
  - source_sentence: There were 29 Muslims fatalities in the Cave of the Patriarchs massacre .
    sentences:
      - >-
        In August , after the end of the war in June 1902 , Higgins Southampton
        left the `` SSBavarian '' and returned to Cape Town the following month
        .
      - >-
        Between 29 and 52 Muslims were killed and more than 100 others wounded .
        [   Settlers remember gunman Goldstein ; Hebron riots continue ] .
      - >-
        29 Muslims were killed and more than 100 others wounded . [   Settlers
        remember gunman Goldstein ; Hebron riots continue ] .
  - source_sentence: are tabby cats all male?
    sentences:
      - >-
        Did you know orange tabby cats are typically male? In fact, up to 80
        percent of orange tabbies are male, making orange female cats a bit of a
        rarity. According to the BBC's Focus Magazine, the ginger gene in cats
        works a little differently compared to humans; it is on the X
        chromosome.
      - >-
        Shawnee Trails Council was formed from the merger of the Four Rivers
        Council and the Audubon Council .
      - |
        A picture of a modern looking kitchen area
  - source_sentence: >-
      Aamir Khan agreed to act immediately after reading Mehra 's screenplay in
      `` Rang De Basanti '' .
    sentences:
      - >-
        Chris Rea —   Free listening, videos, concerts, stats and photos at
        Last.fm singer-songwriter Christopher Anton Rea (pronounced Ree-ah),
        born 4 March 1951, is a singer, songwriter, and guitarist from
        Middlesbrough, England. Rea's recording career began in 1978. Although
        he almost immediately had a US hit single with "Fool (If You Think It's
        Over)", Rea's initial focus was on continental Europe, releasing eight
        albums in the 1980s. It wasn't until 1985's Shamrock Diaries and the
        songs "Stainsby Girls" and "Josephine," that UK audiences began to take
        notice of him. Follow up albums… read more
      - "Healthy Fast Food Meal No. 1. Grilled Chicken Sandwich and Fruit Cup (Chick-fil-A) Several fast food chains offer a grilled chicken sandwich. The trick is ordering it without mayo or creamy sauce, and making sure itâ\x80\x99s served with a whole grain bun."
      - >-
        Aamir Khan agreed to act in `` Rang De Basanti '' immediately after
        reading Mehra 's script .
  - source_sentence: 'A man wearing a blue bow tie and a fedora hat in a car. '
    sentences:
      - A man takes a photo of himself wearing a bowtie and hat
      - Scientists explain the world based on what?
      - "County of Angus - definition of County of Angus by The Free Dictionary County of Angus - definition of County of Angus by The Free Dictionary http://www.thefreedictionary.com/County+of+Angus \_(ăng′gəs) n. Any of a breed of hornless beef cattle that originated in Scotland and are usually black but also occur in a red variety. Also called Black Angus. [After Angus, former county of Scotland.] Angus (ˈæŋɡəs) n (Placename) a council area of E Scotland on the North Sea: the historical county of Angus became part of Tayside region in 1975; reinstated as a unitary authority (excluding City of Dundee) in 1996. Administrative centre: Forfar. Pop: 107 520 (2003 est). Area: 2181 sq km (842 sq miles) An•gus"
model-index:
  - name: SentenceTransformer based on microsoft/deberta-v3-small
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.7489263204555723
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7626005619606424
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7591990025704353
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7477882076989188
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7622787611500085
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7539243664071233
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.6493790443582248
            name: Pearson Dot
          - type: spearman_dot
            value: 0.6306412644605037
            name: Spearman Dot
          - type: pearson_max
            value: 0.7622787611500085
            name: Pearson Max
          - type: spearman_max
            value: 0.7626005619606424
            name: Spearman Max
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: allNLI dev
          type: allNLI-dev
        metrics:
          - type: cosine_accuracy
            value: 0.7109375
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.916961669921875
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.5853658536585366
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.8279993534088135
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.4748201438848921
            name: Cosine Precision
          - type: cosine_recall
            value: 0.7630057803468208
            name: Cosine Recall
          - type: cosine_ap
            value: 0.5495769497490841
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.671875
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 481.2850646972656
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.549165120593692
            name: Dot F1
          - type: dot_f1_threshold
            value: 381.15167236328125
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.40437158469945356
            name: Dot Precision
          - type: dot_recall
            value: 0.8554913294797688
            name: Dot Recall
          - type: dot_ap
            value: 0.45293867777170244
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.71484375
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 186.7671356201172
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.5696465696465696
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 268.783935546875
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.4448051948051948
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.791907514450867
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.5511647333663136
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.71484375
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 8.915003776550293
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.574074074074074
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 12.812746047973633
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.47876447876447875
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.7167630057803468
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.5535962824434967
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.71484375
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 481.2850646972656
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.5853658536585366
            name: Max F1
          - type: max_f1_threshold
            value: 381.15167236328125
            name: Max F1 Threshold
          - type: max_precision
            value: 0.47876447876447875
            name: Max Precision
          - type: max_recall
            value: 0.8554913294797688
            name: Max Recall
          - type: max_ap
            value: 0.5535962824434967
            name: Max Ap
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: Qnli dev
          type: Qnli-dev
        metrics:
          - type: cosine_accuracy
            value: 0.681640625
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.8160840272903442
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.6917562724014337
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.7854001522064209
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.5993788819875776
            name: Cosine Precision
          - type: cosine_recall
            value: 0.8177966101694916
            name: Cosine Recall
          - type: cosine_ap
            value: 0.7109982147608755
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.6484375
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 392.5464782714844
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.6688311688311689
            name: Dot F1
          - type: dot_f1_threshold
            value: 368.7878723144531
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.5421052631578948
            name: Dot Precision
          - type: dot_recall
            value: 0.8728813559322034
            name: Dot Recall
          - type: dot_ap
            value: 0.6053421534358263
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.685546875
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 244.63809204101562
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.6938053097345133
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 295.4796142578125
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.5957446808510638
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.8305084745762712
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.7216536349653324
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.6875
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 13.026724815368652
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.689407540394973
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 14.538017272949219
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.5981308411214953
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.8135593220338984
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.7181091181717016
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.6875
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 392.5464782714844
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.6938053097345133
            name: Max F1
          - type: max_f1_threshold
            value: 368.7878723144531
            name: Max F1 Threshold
          - type: max_precision
            value: 0.5993788819875776
            name: Max Precision
          - type: max_recall
            value: 0.8728813559322034
            name: Max Recall
          - type: max_ap
            value: 0.7216536349653324
            name: Max Ap

SentenceTransformer based on microsoft/deberta-v3-small

This is a sentence-transformers model finetuned from microsoft/deberta-v3-small on the bobox/enhanced_nli-50_k dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/deberta-v3-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • bobox/enhanced_nli-50_k

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("bobox/DeBERTa-small-ST-UnifiedDatasets-baseline-checkpoints-tmp")
# Run inference
sentences = [
    'A man wearing a blue bow tie and a fedora hat in a car. ',
    'A man takes a photo of himself wearing a bowtie and hat',
    'County of Angus - definition of County of Angus by The Free Dictionary County of Angus - definition of County of Angus by The Free Dictionary http://www.thefreedictionary.com/County+of+Angus \xa0(ăng′gəs) n. Any of a breed of hornless beef cattle that originated in Scotland and are usually black but also occur in a red variety. Also called Black Angus. [After Angus, former county of Scotland.] Angus (ˈæŋɡəs) n (Placename) a council area of E Scotland on the North Sea: the historical county of Angus became part of Tayside region in 1975; reinstated as a unitary authority (excluding City of Dundee) in 1996. Administrative centre: Forfar. Pop: 107 520 (2003 est). Area: 2181 sq km (842 sq miles) An•gus',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.7489
spearman_cosine 0.7626
pearson_manhattan 0.7592
spearman_manhattan 0.7478
pearson_euclidean 0.7623
spearman_euclidean 0.7539
pearson_dot 0.6494
spearman_dot 0.6306
pearson_max 0.7623
spearman_max 0.7626

Binary Classification

Metric Value
cosine_accuracy 0.7109
cosine_accuracy_threshold 0.917
cosine_f1 0.5854
cosine_f1_threshold 0.828
cosine_precision 0.4748
cosine_recall 0.763
cosine_ap 0.5496
dot_accuracy 0.6719
dot_accuracy_threshold 481.2851
dot_f1 0.5492
dot_f1_threshold 381.1517
dot_precision 0.4044
dot_recall 0.8555
dot_ap 0.4529
manhattan_accuracy 0.7148
manhattan_accuracy_threshold 186.7671
manhattan_f1 0.5696
manhattan_f1_threshold 268.7839
manhattan_precision 0.4448
manhattan_recall 0.7919
manhattan_ap 0.5512
euclidean_accuracy 0.7148
euclidean_accuracy_threshold 8.915
euclidean_f1 0.5741
euclidean_f1_threshold 12.8127
euclidean_precision 0.4788
euclidean_recall 0.7168
euclidean_ap 0.5536
max_accuracy 0.7148
max_accuracy_threshold 481.2851
max_f1 0.5854
max_f1_threshold 381.1517
max_precision 0.4788
max_recall 0.8555
max_ap 0.5536

Binary Classification

Metric Value
cosine_accuracy 0.6816
cosine_accuracy_threshold 0.8161
cosine_f1 0.6918
cosine_f1_threshold 0.7854
cosine_precision 0.5994
cosine_recall 0.8178
cosine_ap 0.711
dot_accuracy 0.6484
dot_accuracy_threshold 392.5465
dot_f1 0.6688
dot_f1_threshold 368.7879
dot_precision 0.5421
dot_recall 0.8729
dot_ap 0.6053
manhattan_accuracy 0.6855
manhattan_accuracy_threshold 244.6381
manhattan_f1 0.6938
manhattan_f1_threshold 295.4796
manhattan_precision 0.5957
manhattan_recall 0.8305
manhattan_ap 0.7217
euclidean_accuracy 0.6875
euclidean_accuracy_threshold 13.0267
euclidean_f1 0.6894
euclidean_f1_threshold 14.538
euclidean_precision 0.5981
euclidean_recall 0.8136
euclidean_ap 0.7181
max_accuracy 0.6875
max_accuracy_threshold 392.5465
max_f1 0.6938
max_f1_threshold 368.7879
max_precision 0.5994
max_recall 0.8729
max_ap 0.7217

Training Details

Training Dataset

bobox/enhanced_nli-50_k

  • Dataset: bobox/enhanced_nli-50_k
  • Size: 116,445 training samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 4 tokens
    • mean: 33.67 tokens
    • max: 338 tokens
    • min: 2 tokens
    • mean: 51.48 tokens
    • max: 512 tokens
  • Samples:
    sentence1 sentence2
    who is darnell from my name is earl Eddie Steeples Eddie Steeples (born November 25, 1973)[1] is an American actor known for his roles as the "Rubberband Man" in an advertising campaign for OfficeMax, and as Darnell Turner on the NBC sitcom My Name Is Earl.
    Ferrell and the Chili Peppers toured together in 2013 . Ferrell and the Chili Peppers wrapped up I 'm With You World Tour in April 2013 .
    Cells have four cycles. How many cycles do cells have?
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.025}
    

Evaluation Dataset

bobox/enhanced_nli-50_k

  • Dataset: bobox/enhanced_nli-50_k
  • Size: 1,506 evaluation samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 3 tokens
    • mean: 32.36 tokens
    • max: 341 tokens
    • min: 2 tokens
    • mean: 61.99 tokens
    • max: 431 tokens
  • Samples:
    sentence1 sentence2
    Interestingly, snakes use their forked tongues to smell. Snakes use their tongue to smell things.
    Soil is a renewable resource that can take thousand of years to form. What is a renewable resource that can take thousand of years to form?
    As of March 22 , there were more than 321,000 cases with over 13,600 deaths and more than 96,000 recoveries reported worldwide . As of 22 March , more than 321,000 cases of COVID-19 have been reported in over 180 countries and territories , resulting in more than 13,600 deaths and 96,000 recoveries .
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.025}
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 640
  • per_device_eval_batch_size: 128
  • learning_rate: 3.75e-05
  • weight_decay: 0.0005
  • lr_scheduler_type: cosine_with_min_lr
  • lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 7.499999999999999e-06}
  • warmup_ratio: 0.33
  • save_safetensors: False
  • fp16: True
  • push_to_hub: True
  • hub_model_id: bobox/DeBERTa-small-ST-UnifiedDatasets-baseline-checkpoints-tmp
  • hub_strategy: all_checkpoints
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 640
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 3.75e-05
  • weight_decay: 0.0005
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: cosine_with_min_lr
  • lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 7.499999999999999e-06}
  • warmup_ratio: 0.33
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: bobox/DeBERTa-small-ST-UnifiedDatasets-baseline-checkpoints-tmp
  • hub_strategy: all_checkpoints
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss Qnli-dev_max_ap allNLI-dev_max_ap sts-test_spearman_cosine
0.0055 1 8.8159 - - - -
0.0110 2 9.1259 - - - -
0.0165 3 8.9017 - - - -
0.0220 4 9.1969 - - - -
0.0275 5 9.3716 1.3746 0.6067 0.3706 0.1943
0.0330 6 9.0425 - - - -
0.0385 7 8.7309 - - - -
0.0440 8 9.0123 - - - -
0.0495 9 8.8095 - - - -
0.0549 10 9.3194 1.3227 0.6089 0.3721 0.1976
0.0604 11 8.9873 - - - -
0.0659 12 8.5575 - - - -
0.0714 13 8.8096 - - - -
0.0769 14 8.0996 - - - -
0.0824 15 8.1942 1.2244 0.6140 0.3743 0.2085
0.0879 16 8.1654 - - - -
0.0934 17 7.7336 - - - -
0.0989 18 7.9535 - - - -
0.1044 19 7.9322 - - - -
0.1099 20 7.6812 1.1301 0.6199 0.3790 0.2233
0.1154 21 7.551 - - - -
0.1209 22 7.3788 - - - -
0.1264 23 7.1746 - - - -
0.1319 24 7.1849 - - - -
0.1374 25 7.1085 1.0723 0.6195 0.3852 0.2357
0.1429 26 7.3926 - - - -
0.1484 27 7.1817 - - - -
0.1538 28 7.239 - - - -
0.1593 29 7.0023 - - - -
0.1648 30 6.9898 1.0282 0.6215 0.3898 0.2477
0.1703 31 6.9776 - - - -
0.1758 32 6.8088 - - - -
0.1813 33 6.8916 - - - -
0.1868 34 6.6931 - - - -
0.1923 35 6.5707 0.9846 0.6253 0.3952 0.2608
0.1978 36 6.6231 - - - -
0.2033 37 6.4951 - - - -
0.2088 38 6.4607 - - - -
0.2143 39 6.4504 - - - -
0.2198 40 6.3649 0.9314 0.6299 0.4041 0.2738
0.2253 41 6.2244 - - - -
0.2308 42 6.007 - - - -
0.2363 43 5.977 - - - -
0.2418 44 6.0748 - - - -
0.2473 45 5.7946 0.8549 0.6404 0.4116 0.2847
0.2527 46 5.8751 - - - -
0.2582 47 5.543 - - - -
0.2637 48 5.5511 - - - -
0.2692 49 5.411 - - - -
0.2747 50 5.378 0.7943 0.6557 0.4159 0.2866
0.2802 51 5.3831 - - - -
0.2857 52 4.9729 - - - -
0.2912 53 5.0425 - - - -
0.2967 54 4.9446 - - - -
0.3022 55 4.9288 0.7178 0.6679 0.4273 0.3132
0.3077 56 4.8434 - - - -
0.3132 57 4.6914 - - - -
0.3187 58 4.5254 - - - -
0.3242 59 4.6734 - - - -
0.3297 60 4.2421 0.6202 0.6684 0.4423 0.3580
0.3352 61 4.2234 - - - -
0.3407 62 4.0225 - - - -
0.3462 63 4.0034 - - - -
0.3516 64 3.994 - - - -
0.3571 65 3.651 0.5489 0.6750 0.4569 0.4014
0.3626 66 3.9308 - - - -
0.3681 67 3.8694 - - - -
0.3736 68 3.7159 - - - -
0.3791 69 3.6499 - - - -
0.3846 70 3.4749 0.4923 0.6734 0.4701 0.4465
0.3901 71 3.3356 - - - -
0.3956 72 3.4768 - - - -
0.4011 73 3.2748 - - - -
0.4066 74 3.2789 - - - -
0.4121 75 2.9815 0.4422 0.6759 0.4747 0.4924
0.4176 76 3.2356 - - - -
0.4231 77 2.946 - - - -
0.4286 78 2.8888 - - - -
0.4341 79 2.8992 - - - -
0.4396 80 2.9901 0.4040 0.6786 0.4781 0.5478
0.4451 81 2.6608 - - - -
0.4505 82 2.831 - - - -
0.4560 83 2.5503 - - - -
0.4615 84 2.8576 - - - -
0.4670 85 2.5726 0.3711 0.6858 0.4898 0.6134
0.4725 86 2.7197 - - - -
0.4780 87 2.5123 - - - -
0.4835 88 2.553 - - - -
0.4890 89 2.4862 - - - -
0.4945 90 2.491 0.3450 0.6997 0.5077 0.6668
0.5 91 2.3648 - - - -
0.5055 92 2.3788 - - - -
0.5110 93 2.3758 - - - -
0.5165 94 2.3319 - - - -
0.5220 95 2.2336 0.3238 0.7048 0.5252 0.7018
0.5275 96 2.3036 - - - -
0.5330 97 2.3034 - - - -
0.5385 98 2.207 - - - -
0.5440 99 2.1732 - - - -
0.5495 100 2.1743 0.3036 0.7091 0.5418 0.7272
0.5549 101 2.086 - - - -
0.5604 102 2.0223 - - - -
0.5659 103 2.0878 - - - -
0.5714 104 1.9475 - - - -
0.5769 105 2.1524 0.2853 0.7159 0.5499 0.7489
0.5824 106 1.9393 - - - -
0.5879 107 2.1308 - - - -
0.5934 108 1.9469 - - - -
0.5989 109 1.8683 - - - -
0.6044 110 1.8167 0.2702 0.7217 0.5536 0.7626

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.0.1
  • Transformers: 4.44.0
  • PyTorch: 2.4.0
  • Accelerate: 0.33.0
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}