metadata
base_model: microsoft/deberta-v3-small
datasets: []
language: []
library_name: sentence-transformers
metrics:
- pearson_cosine
- spearman_cosine
- pearson_manhattan
- spearman_manhattan
- pearson_euclidean
- spearman_euclidean
- pearson_dot
- spearman_dot
- pearson_max
- spearman_max
- cosine_accuracy
- cosine_accuracy_threshold
- cosine_f1
- cosine_f1_threshold
- cosine_precision
- cosine_recall
- cosine_ap
- dot_accuracy
- dot_accuracy_threshold
- dot_f1
- dot_f1_threshold
- dot_precision
- dot_recall
- dot_ap
- manhattan_accuracy
- manhattan_accuracy_threshold
- manhattan_f1
- manhattan_f1_threshold
- manhattan_precision
- manhattan_recall
- manhattan_ap
- euclidean_accuracy
- euclidean_accuracy_threshold
- euclidean_f1
- euclidean_f1_threshold
- euclidean_precision
- euclidean_recall
- euclidean_ap
- max_accuracy
- max_accuracy_threshold
- max_f1
- max_f1_threshold
- max_precision
- max_recall
- max_ap
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:116445
- loss:CachedGISTEmbedLoss
widget:
- source_sentence: what is the main purpose of the brain
sentences:
- >-
Brain Physiologically, the function of the brain is to exert centralized
control over the other organs of the body. The brain acts on the rest of
the body both by generating patterns of muscle activity and by driving
the secretion of chemicals called hormones. This centralized control
allows rapid and coordinated responses to changes in the environment.
Some basic types of responsiveness such as reflexes can be mediated by
the spinal cord or peripheral ganglia, but sophisticated purposeful
control of behavior based on complex sensory input requires the
information integrating capabilities of a centralized brain.
- >-
How do scientists know that some mountains were once at the bottom of an
ocean?
- >-
The Smiths Wiki | Fandom powered by Wikia Share Ad blocker interference
detected! Wikia is a free-to-use site that makes money from advertising.
We have a modified experience for viewers using ad blockers Wikia is not
accessible if you’ve made further modifications. Remove the custom ad
blocker rule(s) and the page will load as expected. The Smiths were an
English rock band formed in Manchester in 1982. Based on the songwriting
partnership of Morrissey (vocals) and Johnny Marr (guitar), the band
also included Andy Rourke (bass), Mike Joyce (drums) and for a brief
time Craig Gannon (rhythm guitar). Critics have called them one of the
most important alternative rock bands to emerge from the British
independent music scene of the 1980s,and the group has had major
influence on subsequent artists. Morrissey's lovelorn tales of
alienation found an audience amongst youth culture bored by the
ubiquitous synthesiser-pop bands of the early 1980s, while Marr's
complex melodies helped return guitar-based music to popularity. The
group were signed to the independent record label Rough Trade Records ,
for whom they released four studio albums and several compilations, as
well as numerous non-LP singles. Although they had limited commercial
success outside the UK while they were still together, and never
released a single that charted higher than number 10 in their home
country, The Smiths won a growing following, and they remain cult and
commercial favourites. The band broke up in 1987 amid disagreements
between Morrissey and Marr and has turned down several offers to reform.
Welcome to The Smiths Wiki
- source_sentence: There were 29 Muslims fatalities in the Cave of the Patriarchs massacre .
sentences:
- >-
In August , after the end of the war in June 1902 , Higgins Southampton
left the `` SSBavarian '' and returned to Cape Town the following month
.
- >-
Between 29 and 52 Muslims were killed and more than 100 others wounded .
[ Settlers remember gunman Goldstein ; Hebron riots continue ] .
- >-
29 Muslims were killed and more than 100 others wounded . [ Settlers
remember gunman Goldstein ; Hebron riots continue ] .
- source_sentence: are tabby cats all male?
sentences:
- >-
Did you know orange tabby cats are typically male? In fact, up to 80
percent of orange tabbies are male, making orange female cats a bit of a
rarity. According to the BBC's Focus Magazine, the ginger gene in cats
works a little differently compared to humans; it is on the X
chromosome.
- >-
Shawnee Trails Council was formed from the merger of the Four Rivers
Council and the Audubon Council .
- |
A picture of a modern looking kitchen area
- source_sentence: >-
Aamir Khan agreed to act immediately after reading Mehra 's screenplay in
`` Rang De Basanti '' .
sentences:
- >-
Chris Rea — Free listening, videos, concerts, stats and photos at
Last.fm singer-songwriter Christopher Anton Rea (pronounced Ree-ah),
born 4 March 1951, is a singer, songwriter, and guitarist from
Middlesbrough, England. Rea's recording career began in 1978. Although
he almost immediately had a US hit single with "Fool (If You Think It's
Over)", Rea's initial focus was on continental Europe, releasing eight
albums in the 1980s. It wasn't until 1985's Shamrock Diaries and the
songs "Stainsby Girls" and "Josephine," that UK audiences began to take
notice of him. Follow up albums… read more
- "Healthy Fast Food Meal No. 1. Grilled Chicken Sandwich and Fruit Cup (Chick-fil-A) Several fast food chains offer a grilled chicken sandwich. The trick is ordering it without mayo or creamy sauce, and making sure itâ\x80\x99s served with a whole grain bun."
- >-
Aamir Khan agreed to act in `` Rang De Basanti '' immediately after
reading Mehra 's script .
- source_sentence: 'A man wearing a blue bow tie and a fedora hat in a car. '
sentences:
- A man takes a photo of himself wearing a bowtie and hat
- Scientists explain the world based on what?
- "County of Angus - definition of County of Angus by The Free Dictionary County of Angus - definition of County of Angus by The Free Dictionary http://www.thefreedictionary.com/County+of+Angus \_(ăng′gəs) n. Any of a breed of hornless beef cattle that originated in Scotland and are usually black but also occur in a red variety. Also called Black Angus. [After Angus, former county of Scotland.] Angus (ˈæŋɡəs) n (Placename) a council area of E Scotland on the North Sea: the historical county of Angus became part of Tayside region in 1975; reinstated as a unitary authority (excluding City of Dundee) in 1996. Administrative centre: Forfar. Pop: 107 520 (2003 est). Area: 2181 sq km (842 sq miles) An•gus"
model-index:
- name: SentenceTransformer based on microsoft/deberta-v3-small
results:
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test
type: sts-test
metrics:
- type: pearson_cosine
value: 0.7489263204555723
name: Pearson Cosine
- type: spearman_cosine
value: 0.7626005619606424
name: Spearman Cosine
- type: pearson_manhattan
value: 0.7591990025704353
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.7477882076989188
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.7622787611500085
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.7539243664071233
name: Spearman Euclidean
- type: pearson_dot
value: 0.6493790443582248
name: Pearson Dot
- type: spearman_dot
value: 0.6306412644605037
name: Spearman Dot
- type: pearson_max
value: 0.7622787611500085
name: Pearson Max
- type: spearman_max
value: 0.7626005619606424
name: Spearman Max
- task:
type: binary-classification
name: Binary Classification
dataset:
name: allNLI dev
type: allNLI-dev
metrics:
- type: cosine_accuracy
value: 0.7109375
name: Cosine Accuracy
- type: cosine_accuracy_threshold
value: 0.916961669921875
name: Cosine Accuracy Threshold
- type: cosine_f1
value: 0.5853658536585366
name: Cosine F1
- type: cosine_f1_threshold
value: 0.8279993534088135
name: Cosine F1 Threshold
- type: cosine_precision
value: 0.4748201438848921
name: Cosine Precision
- type: cosine_recall
value: 0.7630057803468208
name: Cosine Recall
- type: cosine_ap
value: 0.5495769497490841
name: Cosine Ap
- type: dot_accuracy
value: 0.671875
name: Dot Accuracy
- type: dot_accuracy_threshold
value: 481.2850646972656
name: Dot Accuracy Threshold
- type: dot_f1
value: 0.549165120593692
name: Dot F1
- type: dot_f1_threshold
value: 381.15167236328125
name: Dot F1 Threshold
- type: dot_precision
value: 0.40437158469945356
name: Dot Precision
- type: dot_recall
value: 0.8554913294797688
name: Dot Recall
- type: dot_ap
value: 0.45293867777170244
name: Dot Ap
- type: manhattan_accuracy
value: 0.71484375
name: Manhattan Accuracy
- type: manhattan_accuracy_threshold
value: 186.7671356201172
name: Manhattan Accuracy Threshold
- type: manhattan_f1
value: 0.5696465696465696
name: Manhattan F1
- type: manhattan_f1_threshold
value: 268.783935546875
name: Manhattan F1 Threshold
- type: manhattan_precision
value: 0.4448051948051948
name: Manhattan Precision
- type: manhattan_recall
value: 0.791907514450867
name: Manhattan Recall
- type: manhattan_ap
value: 0.5511647333663136
name: Manhattan Ap
- type: euclidean_accuracy
value: 0.71484375
name: Euclidean Accuracy
- type: euclidean_accuracy_threshold
value: 8.915003776550293
name: Euclidean Accuracy Threshold
- type: euclidean_f1
value: 0.574074074074074
name: Euclidean F1
- type: euclidean_f1_threshold
value: 12.812746047973633
name: Euclidean F1 Threshold
- type: euclidean_precision
value: 0.47876447876447875
name: Euclidean Precision
- type: euclidean_recall
value: 0.7167630057803468
name: Euclidean Recall
- type: euclidean_ap
value: 0.5535962824434967
name: Euclidean Ap
- type: max_accuracy
value: 0.71484375
name: Max Accuracy
- type: max_accuracy_threshold
value: 481.2850646972656
name: Max Accuracy Threshold
- type: max_f1
value: 0.5853658536585366
name: Max F1
- type: max_f1_threshold
value: 381.15167236328125
name: Max F1 Threshold
- type: max_precision
value: 0.47876447876447875
name: Max Precision
- type: max_recall
value: 0.8554913294797688
name: Max Recall
- type: max_ap
value: 0.5535962824434967
name: Max Ap
- task:
type: binary-classification
name: Binary Classification
dataset:
name: Qnli dev
type: Qnli-dev
metrics:
- type: cosine_accuracy
value: 0.681640625
name: Cosine Accuracy
- type: cosine_accuracy_threshold
value: 0.8160840272903442
name: Cosine Accuracy Threshold
- type: cosine_f1
value: 0.6917562724014337
name: Cosine F1
- type: cosine_f1_threshold
value: 0.7854001522064209
name: Cosine F1 Threshold
- type: cosine_precision
value: 0.5993788819875776
name: Cosine Precision
- type: cosine_recall
value: 0.8177966101694916
name: Cosine Recall
- type: cosine_ap
value: 0.7109982147608755
name: Cosine Ap
- type: dot_accuracy
value: 0.6484375
name: Dot Accuracy
- type: dot_accuracy_threshold
value: 392.5464782714844
name: Dot Accuracy Threshold
- type: dot_f1
value: 0.6688311688311689
name: Dot F1
- type: dot_f1_threshold
value: 368.7878723144531
name: Dot F1 Threshold
- type: dot_precision
value: 0.5421052631578948
name: Dot Precision
- type: dot_recall
value: 0.8728813559322034
name: Dot Recall
- type: dot_ap
value: 0.6053421534358263
name: Dot Ap
- type: manhattan_accuracy
value: 0.685546875
name: Manhattan Accuracy
- type: manhattan_accuracy_threshold
value: 244.63809204101562
name: Manhattan Accuracy Threshold
- type: manhattan_f1
value: 0.6938053097345133
name: Manhattan F1
- type: manhattan_f1_threshold
value: 295.4796142578125
name: Manhattan F1 Threshold
- type: manhattan_precision
value: 0.5957446808510638
name: Manhattan Precision
- type: manhattan_recall
value: 0.8305084745762712
name: Manhattan Recall
- type: manhattan_ap
value: 0.7216536349653324
name: Manhattan Ap
- type: euclidean_accuracy
value: 0.6875
name: Euclidean Accuracy
- type: euclidean_accuracy_threshold
value: 13.026724815368652
name: Euclidean Accuracy Threshold
- type: euclidean_f1
value: 0.689407540394973
name: Euclidean F1
- type: euclidean_f1_threshold
value: 14.538017272949219
name: Euclidean F1 Threshold
- type: euclidean_precision
value: 0.5981308411214953
name: Euclidean Precision
- type: euclidean_recall
value: 0.8135593220338984
name: Euclidean Recall
- type: euclidean_ap
value: 0.7181091181717016
name: Euclidean Ap
- type: max_accuracy
value: 0.6875
name: Max Accuracy
- type: max_accuracy_threshold
value: 392.5464782714844
name: Max Accuracy Threshold
- type: max_f1
value: 0.6938053097345133
name: Max F1
- type: max_f1_threshold
value: 368.7878723144531
name: Max F1 Threshold
- type: max_precision
value: 0.5993788819875776
name: Max Precision
- type: max_recall
value: 0.8728813559322034
name: Max Recall
- type: max_ap
value: 0.7216536349653324
name: Max Ap
SentenceTransformer based on microsoft/deberta-v3-small
This is a sentence-transformers model finetuned from microsoft/deberta-v3-small on the bobox/enhanced_nli-50_k dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: microsoft/deberta-v3-small
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
- Training Dataset:
- bobox/enhanced_nli-50_k
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("bobox/DeBERTa-small-ST-UnifiedDatasets-baseline-checkpoints-tmp")
# Run inference
sentences = [
'A man wearing a blue bow tie and a fedora hat in a car. ',
'A man takes a photo of himself wearing a bowtie and hat',
'County of Angus - definition of County of Angus by The Free Dictionary County of Angus - definition of County of Angus by The Free Dictionary http://www.thefreedictionary.com/County+of+Angus \xa0(ăng′gəs) n. Any of a breed of hornless beef cattle that originated in Scotland and are usually black but also occur in a red variety. Also called Black Angus. [After Angus, former county of Scotland.] Angus (ˈæŋɡəs) n (Placename) a council area of E Scotland on the North Sea: the historical county of Angus became part of Tayside region in 1975; reinstated as a unitary authority (excluding City of Dundee) in 1996. Administrative centre: Forfar. Pop: 107 520 (2003 est). Area: 2181 sq km (842 sq miles) An•gus',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Semantic Similarity
- Dataset:
sts-test
- Evaluated with
EmbeddingSimilarityEvaluator
Metric | Value |
---|---|
pearson_cosine | 0.7489 |
spearman_cosine | 0.7626 |
pearson_manhattan | 0.7592 |
spearman_manhattan | 0.7478 |
pearson_euclidean | 0.7623 |
spearman_euclidean | 0.7539 |
pearson_dot | 0.6494 |
spearman_dot | 0.6306 |
pearson_max | 0.7623 |
spearman_max | 0.7626 |
Binary Classification
- Dataset:
allNLI-dev
- Evaluated with
BinaryClassificationEvaluator
Metric | Value |
---|---|
cosine_accuracy | 0.7109 |
cosine_accuracy_threshold | 0.917 |
cosine_f1 | 0.5854 |
cosine_f1_threshold | 0.828 |
cosine_precision | 0.4748 |
cosine_recall | 0.763 |
cosine_ap | 0.5496 |
dot_accuracy | 0.6719 |
dot_accuracy_threshold | 481.2851 |
dot_f1 | 0.5492 |
dot_f1_threshold | 381.1517 |
dot_precision | 0.4044 |
dot_recall | 0.8555 |
dot_ap | 0.4529 |
manhattan_accuracy | 0.7148 |
manhattan_accuracy_threshold | 186.7671 |
manhattan_f1 | 0.5696 |
manhattan_f1_threshold | 268.7839 |
manhattan_precision | 0.4448 |
manhattan_recall | 0.7919 |
manhattan_ap | 0.5512 |
euclidean_accuracy | 0.7148 |
euclidean_accuracy_threshold | 8.915 |
euclidean_f1 | 0.5741 |
euclidean_f1_threshold | 12.8127 |
euclidean_precision | 0.4788 |
euclidean_recall | 0.7168 |
euclidean_ap | 0.5536 |
max_accuracy | 0.7148 |
max_accuracy_threshold | 481.2851 |
max_f1 | 0.5854 |
max_f1_threshold | 381.1517 |
max_precision | 0.4788 |
max_recall | 0.8555 |
max_ap | 0.5536 |
Binary Classification
- Dataset:
Qnli-dev
- Evaluated with
BinaryClassificationEvaluator
Metric | Value |
---|---|
cosine_accuracy | 0.6816 |
cosine_accuracy_threshold | 0.8161 |
cosine_f1 | 0.6918 |
cosine_f1_threshold | 0.7854 |
cosine_precision | 0.5994 |
cosine_recall | 0.8178 |
cosine_ap | 0.711 |
dot_accuracy | 0.6484 |
dot_accuracy_threshold | 392.5465 |
dot_f1 | 0.6688 |
dot_f1_threshold | 368.7879 |
dot_precision | 0.5421 |
dot_recall | 0.8729 |
dot_ap | 0.6053 |
manhattan_accuracy | 0.6855 |
manhattan_accuracy_threshold | 244.6381 |
manhattan_f1 | 0.6938 |
manhattan_f1_threshold | 295.4796 |
manhattan_precision | 0.5957 |
manhattan_recall | 0.8305 |
manhattan_ap | 0.7217 |
euclidean_accuracy | 0.6875 |
euclidean_accuracy_threshold | 13.0267 |
euclidean_f1 | 0.6894 |
euclidean_f1_threshold | 14.538 |
euclidean_precision | 0.5981 |
euclidean_recall | 0.8136 |
euclidean_ap | 0.7181 |
max_accuracy | 0.6875 |
max_accuracy_threshold | 392.5465 |
max_f1 | 0.6938 |
max_f1_threshold | 368.7879 |
max_precision | 0.5994 |
max_recall | 0.8729 |
max_ap | 0.7217 |
Training Details
Training Dataset
bobox/enhanced_nli-50_k
- Dataset: bobox/enhanced_nli-50_k
- Size: 116,445 training samples
- Columns:
sentence1
andsentence2
- Approximate statistics based on the first 1000 samples:
sentence1 sentence2 type string string details - min: 4 tokens
- mean: 33.67 tokens
- max: 338 tokens
- min: 2 tokens
- mean: 51.48 tokens
- max: 512 tokens
- Samples:
sentence1 sentence2 who is darnell from my name is earl
Eddie Steeples Eddie Steeples (born November 25, 1973)[1] is an American actor known for his roles as the "Rubberband Man" in an advertising campaign for OfficeMax, and as Darnell Turner on the NBC sitcom My Name Is Earl.
Ferrell and the Chili Peppers toured together in 2013 .
Ferrell and the Chili Peppers wrapped up I 'm With You World Tour in April 2013 .
Cells have four cycles.
How many cycles do cells have?
- Loss:
CachedGISTEmbedLoss
with these parameters:{'guide': SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ), 'temperature': 0.025}
Evaluation Dataset
bobox/enhanced_nli-50_k
- Dataset: bobox/enhanced_nli-50_k
- Size: 1,506 evaluation samples
- Columns:
sentence1
andsentence2
- Approximate statistics based on the first 1000 samples:
sentence1 sentence2 type string string details - min: 3 tokens
- mean: 32.36 tokens
- max: 341 tokens
- min: 2 tokens
- mean: 61.99 tokens
- max: 431 tokens
- Samples:
sentence1 sentence2 Interestingly, snakes use their forked tongues to smell.
Snakes use their tongue to smell things.
Soil is a renewable resource that can take thousand of years to form.
What is a renewable resource that can take thousand of years to form?
As of March 22 , there were more than 321,000 cases with over 13,600 deaths and more than 96,000 recoveries reported worldwide .
As of 22 March , more than 321,000 cases of COVID-19 have been reported in over 180 countries and territories , resulting in more than 13,600 deaths and 96,000 recoveries .
- Loss:
CachedGISTEmbedLoss
with these parameters:{'guide': SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ), 'temperature': 0.025}
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 640per_device_eval_batch_size
: 128learning_rate
: 3.75e-05weight_decay
: 0.0005lr_scheduler_type
: cosine_with_min_lrlr_scheduler_kwargs
: {'num_cycles': 0.5, 'min_lr': 7.499999999999999e-06}warmup_ratio
: 0.33save_safetensors
: Falsefp16
: Truepush_to_hub
: Truehub_model_id
: bobox/DeBERTa-small-ST-UnifiedDatasets-baseline-checkpoints-tmphub_strategy
: all_checkpointsbatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 640per_device_eval_batch_size
: 128per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 3.75e-05weight_decay
: 0.0005adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 3max_steps
: -1lr_scheduler_type
: cosine_with_min_lrlr_scheduler_kwargs
: {'num_cycles': 0.5, 'min_lr': 7.499999999999999e-06}warmup_ratio
: 0.33warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Falsesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Trueresume_from_checkpoint
: Nonehub_model_id
: bobox/DeBERTa-small-ST-UnifiedDatasets-baseline-checkpoints-tmphub_strategy
: all_checkpointshub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseeval_use_gather_object
: Falsebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Click to expand
Epoch | Step | Training Loss | loss | Qnli-dev_max_ap | allNLI-dev_max_ap | sts-test_spearman_cosine |
---|---|---|---|---|---|---|
0.0055 | 1 | 8.8159 | - | - | - | - |
0.0110 | 2 | 9.1259 | - | - | - | - |
0.0165 | 3 | 8.9017 | - | - | - | - |
0.0220 | 4 | 9.1969 | - | - | - | - |
0.0275 | 5 | 9.3716 | 1.3746 | 0.6067 | 0.3706 | 0.1943 |
0.0330 | 6 | 9.0425 | - | - | - | - |
0.0385 | 7 | 8.7309 | - | - | - | - |
0.0440 | 8 | 9.0123 | - | - | - | - |
0.0495 | 9 | 8.8095 | - | - | - | - |
0.0549 | 10 | 9.3194 | 1.3227 | 0.6089 | 0.3721 | 0.1976 |
0.0604 | 11 | 8.9873 | - | - | - | - |
0.0659 | 12 | 8.5575 | - | - | - | - |
0.0714 | 13 | 8.8096 | - | - | - | - |
0.0769 | 14 | 8.0996 | - | - | - | - |
0.0824 | 15 | 8.1942 | 1.2244 | 0.6140 | 0.3743 | 0.2085 |
0.0879 | 16 | 8.1654 | - | - | - | - |
0.0934 | 17 | 7.7336 | - | - | - | - |
0.0989 | 18 | 7.9535 | - | - | - | - |
0.1044 | 19 | 7.9322 | - | - | - | - |
0.1099 | 20 | 7.6812 | 1.1301 | 0.6199 | 0.3790 | 0.2233 |
0.1154 | 21 | 7.551 | - | - | - | - |
0.1209 | 22 | 7.3788 | - | - | - | - |
0.1264 | 23 | 7.1746 | - | - | - | - |
0.1319 | 24 | 7.1849 | - | - | - | - |
0.1374 | 25 | 7.1085 | 1.0723 | 0.6195 | 0.3852 | 0.2357 |
0.1429 | 26 | 7.3926 | - | - | - | - |
0.1484 | 27 | 7.1817 | - | - | - | - |
0.1538 | 28 | 7.239 | - | - | - | - |
0.1593 | 29 | 7.0023 | - | - | - | - |
0.1648 | 30 | 6.9898 | 1.0282 | 0.6215 | 0.3898 | 0.2477 |
0.1703 | 31 | 6.9776 | - | - | - | - |
0.1758 | 32 | 6.8088 | - | - | - | - |
0.1813 | 33 | 6.8916 | - | - | - | - |
0.1868 | 34 | 6.6931 | - | - | - | - |
0.1923 | 35 | 6.5707 | 0.9846 | 0.6253 | 0.3952 | 0.2608 |
0.1978 | 36 | 6.6231 | - | - | - | - |
0.2033 | 37 | 6.4951 | - | - | - | - |
0.2088 | 38 | 6.4607 | - | - | - | - |
0.2143 | 39 | 6.4504 | - | - | - | - |
0.2198 | 40 | 6.3649 | 0.9314 | 0.6299 | 0.4041 | 0.2738 |
0.2253 | 41 | 6.2244 | - | - | - | - |
0.2308 | 42 | 6.007 | - | - | - | - |
0.2363 | 43 | 5.977 | - | - | - | - |
0.2418 | 44 | 6.0748 | - | - | - | - |
0.2473 | 45 | 5.7946 | 0.8549 | 0.6404 | 0.4116 | 0.2847 |
0.2527 | 46 | 5.8751 | - | - | - | - |
0.2582 | 47 | 5.543 | - | - | - | - |
0.2637 | 48 | 5.5511 | - | - | - | - |
0.2692 | 49 | 5.411 | - | - | - | - |
0.2747 | 50 | 5.378 | 0.7943 | 0.6557 | 0.4159 | 0.2866 |
0.2802 | 51 | 5.3831 | - | - | - | - |
0.2857 | 52 | 4.9729 | - | - | - | - |
0.2912 | 53 | 5.0425 | - | - | - | - |
0.2967 | 54 | 4.9446 | - | - | - | - |
0.3022 | 55 | 4.9288 | 0.7178 | 0.6679 | 0.4273 | 0.3132 |
0.3077 | 56 | 4.8434 | - | - | - | - |
0.3132 | 57 | 4.6914 | - | - | - | - |
0.3187 | 58 | 4.5254 | - | - | - | - |
0.3242 | 59 | 4.6734 | - | - | - | - |
0.3297 | 60 | 4.2421 | 0.6202 | 0.6684 | 0.4423 | 0.3580 |
0.3352 | 61 | 4.2234 | - | - | - | - |
0.3407 | 62 | 4.0225 | - | - | - | - |
0.3462 | 63 | 4.0034 | - | - | - | - |
0.3516 | 64 | 3.994 | - | - | - | - |
0.3571 | 65 | 3.651 | 0.5489 | 0.6750 | 0.4569 | 0.4014 |
0.3626 | 66 | 3.9308 | - | - | - | - |
0.3681 | 67 | 3.8694 | - | - | - | - |
0.3736 | 68 | 3.7159 | - | - | - | - |
0.3791 | 69 | 3.6499 | - | - | - | - |
0.3846 | 70 | 3.4749 | 0.4923 | 0.6734 | 0.4701 | 0.4465 |
0.3901 | 71 | 3.3356 | - | - | - | - |
0.3956 | 72 | 3.4768 | - | - | - | - |
0.4011 | 73 | 3.2748 | - | - | - | - |
0.4066 | 74 | 3.2789 | - | - | - | - |
0.4121 | 75 | 2.9815 | 0.4422 | 0.6759 | 0.4747 | 0.4924 |
0.4176 | 76 | 3.2356 | - | - | - | - |
0.4231 | 77 | 2.946 | - | - | - | - |
0.4286 | 78 | 2.8888 | - | - | - | - |
0.4341 | 79 | 2.8992 | - | - | - | - |
0.4396 | 80 | 2.9901 | 0.4040 | 0.6786 | 0.4781 | 0.5478 |
0.4451 | 81 | 2.6608 | - | - | - | - |
0.4505 | 82 | 2.831 | - | - | - | - |
0.4560 | 83 | 2.5503 | - | - | - | - |
0.4615 | 84 | 2.8576 | - | - | - | - |
0.4670 | 85 | 2.5726 | 0.3711 | 0.6858 | 0.4898 | 0.6134 |
0.4725 | 86 | 2.7197 | - | - | - | - |
0.4780 | 87 | 2.5123 | - | - | - | - |
0.4835 | 88 | 2.553 | - | - | - | - |
0.4890 | 89 | 2.4862 | - | - | - | - |
0.4945 | 90 | 2.491 | 0.3450 | 0.6997 | 0.5077 | 0.6668 |
0.5 | 91 | 2.3648 | - | - | - | - |
0.5055 | 92 | 2.3788 | - | - | - | - |
0.5110 | 93 | 2.3758 | - | - | - | - |
0.5165 | 94 | 2.3319 | - | - | - | - |
0.5220 | 95 | 2.2336 | 0.3238 | 0.7048 | 0.5252 | 0.7018 |
0.5275 | 96 | 2.3036 | - | - | - | - |
0.5330 | 97 | 2.3034 | - | - | - | - |
0.5385 | 98 | 2.207 | - | - | - | - |
0.5440 | 99 | 2.1732 | - | - | - | - |
0.5495 | 100 | 2.1743 | 0.3036 | 0.7091 | 0.5418 | 0.7272 |
0.5549 | 101 | 2.086 | - | - | - | - |
0.5604 | 102 | 2.0223 | - | - | - | - |
0.5659 | 103 | 2.0878 | - | - | - | - |
0.5714 | 104 | 1.9475 | - | - | - | - |
0.5769 | 105 | 2.1524 | 0.2853 | 0.7159 | 0.5499 | 0.7489 |
0.5824 | 106 | 1.9393 | - | - | - | - |
0.5879 | 107 | 2.1308 | - | - | - | - |
0.5934 | 108 | 1.9469 | - | - | - | - |
0.5989 | 109 | 1.8683 | - | - | - | - |
0.6044 | 110 | 1.8167 | 0.2702 | 0.7217 | 0.5536 | 0.7626 |
Framework Versions
- Python: 3.10.14
- Sentence Transformers: 3.0.1
- Transformers: 4.44.0
- PyTorch: 2.4.0
- Accelerate: 0.33.0
- Datasets: 2.21.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}