--- base_model: microsoft/deberta-v3-small datasets: [] language: [] library_name: sentence-transformers metrics: - pearson_cosine - spearman_cosine - pearson_manhattan - spearman_manhattan - pearson_euclidean - spearman_euclidean - pearson_dot - spearman_dot - pearson_max - spearman_max - cosine_accuracy - cosine_accuracy_threshold - cosine_f1 - cosine_f1_threshold - cosine_precision - cosine_recall - cosine_ap - dot_accuracy - dot_accuracy_threshold - dot_f1 - dot_f1_threshold - dot_precision - dot_recall - dot_ap - manhattan_accuracy - manhattan_accuracy_threshold - manhattan_f1 - manhattan_f1_threshold - manhattan_precision - manhattan_recall - manhattan_ap - euclidean_accuracy - euclidean_accuracy_threshold - euclidean_f1 - euclidean_f1_threshold - euclidean_precision - euclidean_recall - euclidean_ap - max_accuracy - max_accuracy_threshold - max_f1 - max_f1_threshold - max_precision - max_recall - max_ap pipeline_tag: sentence-similarity tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:116445 - loss:CachedGISTEmbedLoss widget: - source_sentence: what is the main purpose of the brain sentences: - Brain Physiologically, the function of the brain is to exert centralized control over the other organs of the body. The brain acts on the rest of the body both by generating patterns of muscle activity and by driving the secretion of chemicals called hormones. This centralized control allows rapid and coordinated responses to changes in the environment. Some basic types of responsiveness such as reflexes can be mediated by the spinal cord or peripheral ganglia, but sophisticated purposeful control of behavior based on complex sensory input requires the information integrating capabilities of a centralized brain. - How do scientists know that some mountains were once at the bottom of an ocean? - The Smiths Wiki | Fandom powered by Wikia Share Ad blocker interference detected! Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected. The Smiths were an English rock band formed in Manchester in 1982. Based on the songwriting partnership of Morrissey (vocals) and Johnny Marr (guitar), the band also included Andy Rourke (bass), Mike Joyce (drums) and for a brief time Craig Gannon (rhythm guitar). Critics have called them one of the most important alternative rock bands to emerge from the British independent music scene of the 1980s,and the group has had major influence on subsequent artists. Morrissey's lovelorn tales of alienation found an audience amongst youth culture bored by the ubiquitous synthesiser-pop bands of the early 1980s, while Marr's complex melodies helped return guitar-based music to popularity. The group were signed to the independent record label Rough Trade Records , for whom they released four studio albums and several compilations, as well as numerous non-LP singles. Although they had limited commercial success outside the UK while they were still together, and never released a single that charted higher than number 10 in their home country, The Smiths won a growing following, and they remain cult and commercial favourites. The band broke up in 1987 amid disagreements between Morrissey and Marr and has turned down several offers to reform. Welcome to The Smiths Wiki - source_sentence: There were 29 Muslims fatalities in the Cave of the Patriarchs massacre . sentences: - In August , after the end of the war in June 1902 , Higgins Southampton left the `` SSBavarian '' and returned to Cape Town the following month . - Between 29 and 52 Muslims were killed and more than 100 others wounded . [ Settlers remember gunman Goldstein ; Hebron riots continue ] . - 29 Muslims were killed and more than 100 others wounded . [ Settlers remember gunman Goldstein ; Hebron riots continue ] . - source_sentence: are tabby cats all male? sentences: - Did you know orange tabby cats are typically male? In fact, up to 80 percent of orange tabbies are male, making orange female cats a bit of a rarity. According to the BBC's Focus Magazine, the ginger gene in cats works a little differently compared to humans; it is on the X chromosome. - Shawnee Trails Council was formed from the merger of the Four Rivers Council and the Audubon Council . - 'A picture of a modern looking kitchen area ' - source_sentence: Aamir Khan agreed to act immediately after reading Mehra 's screenplay in `` Rang De Basanti '' . sentences: - Chris Rea — Free listening, videos, concerts, stats and photos at Last.fm singer-songwriter Christopher Anton Rea (pronounced Ree-ah), born 4 March 1951, is a singer, songwriter, and guitarist from Middlesbrough, England. Rea's recording career began in 1978. Although he almost immediately had a US hit single with "Fool (If You Think It's Over)", Rea's initial focus was on continental Europe, releasing eight albums in the 1980s. It wasn't until 1985's Shamrock Diaries and the songs "Stainsby Girls" and "Josephine," that UK audiences began to take notice of him. Follow up albums… read more - "Healthy Fast Food Meal No. 1. Grilled Chicken Sandwich and Fruit Cup (Chick-fil-A)\ \ Several fast food chains offer a grilled chicken sandwich. The trick is ordering\ \ it without mayo or creamy sauce, and making sure itâ\x80\x99s served with a\ \ whole grain bun." - Aamir Khan agreed to act in `` Rang De Basanti '' immediately after reading Mehra 's script . - source_sentence: 'A man wearing a blue bow tie and a fedora hat in a car. ' sentences: - A man takes a photo of himself wearing a bowtie and hat - Scientists explain the world based on what? - 'County of Angus - definition of County of Angus by The Free Dictionary County of Angus - definition of County of Angus by The Free Dictionary http://www.thefreedictionary.com/County+of+Angus  (ăng′gəs) n. Any of a breed of hornless beef cattle that originated in Scotland and are usually black but also occur in a red variety. Also called Black Angus. [After Angus, former county of Scotland.] Angus (ˈæŋɡəs) n (Placename) a council area of E Scotland on the North Sea: the historical county of Angus became part of Tayside region in 1975; reinstated as a unitary authority (excluding City of Dundee) in 1996. Administrative centre: Forfar. Pop: 107 520 (2003 est). Area: 2181 sq km (842 sq miles) An•gus' model-index: - name: SentenceTransformer based on microsoft/deberta-v3-small results: - task: type: semantic-similarity name: Semantic Similarity dataset: name: sts test type: sts-test metrics: - type: pearson_cosine value: 0.2589065791031549 name: Pearson Cosine - type: spearman_cosine value: 0.31323211323674593 name: Spearman Cosine - type: pearson_manhattan value: 0.27236487282828553 name: Pearson Manhattan - type: spearman_manhattan value: 0.29656486394161036 name: Spearman Manhattan - type: pearson_euclidean value: 0.2585939429800171 name: Pearson Euclidean - type: spearman_euclidean value: 0.2833925986586202 name: Spearman Euclidean - type: pearson_dot value: 0.28511212645281553 name: Pearson Dot - type: spearman_dot value: 0.2967423026930272 name: Spearman Dot - type: pearson_max value: 0.28511212645281553 name: Pearson Max - type: spearman_max value: 0.31323211323674593 name: Spearman Max - task: type: binary-classification name: Binary Classification dataset: name: allNLI dev type: allNLI-dev metrics: - type: cosine_accuracy value: 0.66796875 name: Cosine Accuracy - type: cosine_accuracy_threshold value: 0.9721465110778809 name: Cosine Accuracy Threshold - type: cosine_f1 value: 0.5343511450381679 name: Cosine F1 - type: cosine_f1_threshold value: 0.85741126537323 name: Cosine F1 Threshold - type: cosine_precision value: 0.39886039886039887 name: Cosine Precision - type: cosine_recall value: 0.8092485549132948 name: Cosine Recall - type: cosine_ap value: 0.4140638596370657 name: Cosine Ap - type: dot_accuracy value: 0.666015625 name: Dot Accuracy - type: dot_accuracy_threshold value: 518.88671875 name: Dot Accuracy Threshold - type: dot_f1 value: 0.514018691588785 name: Dot F1 - type: dot_f1_threshold value: 323.9651184082031 name: Dot F1 Threshold - type: dot_precision value: 0.35181236673773986 name: Dot Precision - type: dot_recall value: 0.953757225433526 name: Dot Recall - type: dot_ap value: 0.3781233337023534 name: Dot Ap - type: manhattan_accuracy value: 0.671875 name: Manhattan Accuracy - type: manhattan_accuracy_threshold value: 114.41839599609375 name: Manhattan Accuracy Threshold - type: manhattan_f1 value: 0.5384615384615384 name: Manhattan F1 - type: manhattan_f1_threshold value: 226.82566833496094 name: Manhattan F1 Threshold - type: manhattan_precision value: 0.3941018766756032 name: Manhattan Precision - type: manhattan_recall value: 0.8497109826589595 name: Manhattan Recall - type: manhattan_ap value: 0.4272864144491257 name: Manhattan Ap - type: euclidean_accuracy value: 0.671875 name: Euclidean Accuracy - type: euclidean_accuracy_threshold value: 5.084325790405273 name: Euclidean Accuracy Threshold - type: euclidean_f1 value: 0.5404339250493098 name: Euclidean F1 - type: euclidean_f1_threshold value: 11.333902359008789 name: Euclidean F1 Threshold - type: euclidean_precision value: 0.4101796407185629 name: Euclidean Precision - type: euclidean_recall value: 0.791907514450867 name: Euclidean Recall - type: euclidean_ap value: 0.41769294415599645 name: Euclidean Ap - type: max_accuracy value: 0.671875 name: Max Accuracy - type: max_accuracy_threshold value: 518.88671875 name: Max Accuracy Threshold - type: max_f1 value: 0.5404339250493098 name: Max F1 - type: max_f1_threshold value: 323.9651184082031 name: Max F1 Threshold - type: max_precision value: 0.4101796407185629 name: Max Precision - type: max_recall value: 0.953757225433526 name: Max Recall - type: max_ap value: 0.4272864144491257 name: Max Ap - task: type: binary-classification name: Binary Classification dataset: name: Qnli dev type: Qnli-dev metrics: - type: cosine_accuracy value: 0.640625 name: Cosine Accuracy - type: cosine_accuracy_threshold value: 0.8695281744003296 name: Cosine Accuracy Threshold - type: cosine_f1 value: 0.6578512396694215 name: Cosine F1 - type: cosine_f1_threshold value: 0.7936367988586426 name: Cosine F1 Threshold - type: cosine_precision value: 0.5392953929539296 name: Cosine Precision - type: cosine_recall value: 0.8432203389830508 name: Cosine Recall - type: cosine_ap value: 0.6314640856589909 name: Cosine Ap - type: dot_accuracy value: 0.609375 name: Dot Accuracy - type: dot_accuracy_threshold value: 351.17626953125 name: Dot Accuracy Threshold - type: dot_f1 value: 0.6501650165016502 name: Dot F1 - type: dot_f1_threshold value: 316.48046875 name: Dot F1 Threshold - type: dot_precision value: 0.5324324324324324 name: Dot Precision - type: dot_recall value: 0.8347457627118644 name: Dot Recall - type: dot_ap value: 0.5366456296706419 name: Dot Ap - type: manhattan_accuracy value: 0.658203125 name: Manhattan Accuracy - type: manhattan_accuracy_threshold value: 206.32894897460938 name: Manhattan Accuracy Threshold - type: manhattan_f1 value: 0.652373660030628 name: Manhattan F1 - type: manhattan_f1_threshold value: 261.3590393066406 name: Manhattan F1 Threshold - type: manhattan_precision value: 0.5107913669064749 name: Manhattan Precision - type: manhattan_recall value: 0.902542372881356 name: Manhattan Recall - type: manhattan_ap value: 0.6679289689394285 name: Manhattan Ap - type: euclidean_accuracy value: 0.65234375 name: Euclidean Accuracy - type: euclidean_accuracy_threshold value: 10.764808654785156 name: Euclidean Accuracy Threshold - type: euclidean_f1 value: 0.6393210749646393 name: Euclidean F1 - type: euclidean_f1_threshold value: 15.096710205078125 name: Euclidean F1 Threshold - type: euclidean_precision value: 0.47983014861995754 name: Euclidean Precision - type: euclidean_recall value: 0.9576271186440678 name: Euclidean Recall - type: euclidean_ap value: 0.6460602994393339 name: Euclidean Ap - type: max_accuracy value: 0.658203125 name: Max Accuracy - type: max_accuracy_threshold value: 351.17626953125 name: Max Accuracy Threshold - type: max_f1 value: 0.6578512396694215 name: Max F1 - type: max_f1_threshold value: 316.48046875 name: Max F1 Threshold - type: max_precision value: 0.5392953929539296 name: Max Precision - type: max_recall value: 0.9576271186440678 name: Max Recall - type: max_ap value: 0.6679289689394285 name: Max Ap --- # SentenceTransformer based on microsoft/deberta-v3-small This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) on the bobox/enhanced_nli-50_k dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 768 tokens - **Similarity Function:** Cosine Similarity - **Training Dataset:** - bobox/enhanced_nli-50_k ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("bobox/DeBERTa-small-ST-UnifiedDatasets-baseline-checkpoints-tmp") # Run inference sentences = [ 'A man wearing a blue bow tie and a fedora hat in a car. ', 'A man takes a photo of himself wearing a bowtie and hat', 'County of Angus - definition of County of Angus by The Free Dictionary County of Angus - definition of County of Angus by The Free Dictionary http://www.thefreedictionary.com/County+of+Angus \xa0(ăng′gəs) n. Any of a breed of hornless beef cattle that originated in Scotland and are usually black but also occur in a red variety. Also called Black Angus. [After Angus, former county of Scotland.] Angus (ˈæŋɡəs) n (Placename) a council area of E Scotland on the North Sea: the historical county of Angus became part of Tayside region in 1975; reinstated as a unitary authority (excluding City of Dundee) in 1996. Administrative centre: Forfar. Pop: 107 520 (2003 est). Area: 2181 sq km (842 sq miles) An•gus', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Semantic Similarity * Dataset: `sts-test` * Evaluated with [EmbeddingSimilarityEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator) | Metric | Value | |:--------------------|:-----------| | pearson_cosine | 0.2589 | | **spearman_cosine** | **0.3132** | | pearson_manhattan | 0.2724 | | spearman_manhattan | 0.2966 | | pearson_euclidean | 0.2586 | | spearman_euclidean | 0.2834 | | pearson_dot | 0.2851 | | spearman_dot | 0.2967 | | pearson_max | 0.2851 | | spearman_max | 0.3132 | #### Binary Classification * Dataset: `allNLI-dev` * Evaluated with [BinaryClassificationEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator) | Metric | Value | |:-----------------------------|:-----------| | cosine_accuracy | 0.668 | | cosine_accuracy_threshold | 0.9721 | | cosine_f1 | 0.5344 | | cosine_f1_threshold | 0.8574 | | cosine_precision | 0.3989 | | cosine_recall | 0.8092 | | cosine_ap | 0.4141 | | dot_accuracy | 0.666 | | dot_accuracy_threshold | 518.8867 | | dot_f1 | 0.514 | | dot_f1_threshold | 323.9651 | | dot_precision | 0.3518 | | dot_recall | 0.9538 | | dot_ap | 0.3781 | | manhattan_accuracy | 0.6719 | | manhattan_accuracy_threshold | 114.4184 | | manhattan_f1 | 0.5385 | | manhattan_f1_threshold | 226.8257 | | manhattan_precision | 0.3941 | | manhattan_recall | 0.8497 | | manhattan_ap | 0.4273 | | euclidean_accuracy | 0.6719 | | euclidean_accuracy_threshold | 5.0843 | | euclidean_f1 | 0.5404 | | euclidean_f1_threshold | 11.3339 | | euclidean_precision | 0.4102 | | euclidean_recall | 0.7919 | | euclidean_ap | 0.4177 | | max_accuracy | 0.6719 | | max_accuracy_threshold | 518.8867 | | max_f1 | 0.5404 | | max_f1_threshold | 323.9651 | | max_precision | 0.4102 | | max_recall | 0.9538 | | **max_ap** | **0.4273** | #### Binary Classification * Dataset: `Qnli-dev` * Evaluated with [BinaryClassificationEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator) | Metric | Value | |:-----------------------------|:-----------| | cosine_accuracy | 0.6406 | | cosine_accuracy_threshold | 0.8695 | | cosine_f1 | 0.6579 | | cosine_f1_threshold | 0.7936 | | cosine_precision | 0.5393 | | cosine_recall | 0.8432 | | cosine_ap | 0.6315 | | dot_accuracy | 0.6094 | | dot_accuracy_threshold | 351.1763 | | dot_f1 | 0.6502 | | dot_f1_threshold | 316.4805 | | dot_precision | 0.5324 | | dot_recall | 0.8347 | | dot_ap | 0.5366 | | manhattan_accuracy | 0.6582 | | manhattan_accuracy_threshold | 206.3289 | | manhattan_f1 | 0.6524 | | manhattan_f1_threshold | 261.359 | | manhattan_precision | 0.5108 | | manhattan_recall | 0.9025 | | manhattan_ap | 0.6679 | | euclidean_accuracy | 0.6523 | | euclidean_accuracy_threshold | 10.7648 | | euclidean_f1 | 0.6393 | | euclidean_f1_threshold | 15.0967 | | euclidean_precision | 0.4798 | | euclidean_recall | 0.9576 | | euclidean_ap | 0.6461 | | max_accuracy | 0.6582 | | max_accuracy_threshold | 351.1763 | | max_f1 | 0.6579 | | max_f1_threshold | 316.4805 | | max_precision | 0.5393 | | max_recall | 0.9576 | | **max_ap** | **0.6679** | ## Training Details ### Training Dataset #### bobox/enhanced_nli-50_k * Dataset: bobox/enhanced_nli-50_k * Size: 116,445 training samples * Columns: sentence1 and sentence2 * Approximate statistics based on the first 1000 samples: | | sentence1 | sentence2 | |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | sentence1 | sentence2 | |:---------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | who is darnell from my name is earl | Eddie Steeples Eddie Steeples (born November 25, 1973)[1] is an American actor known for his roles as the "Rubberband Man" in an advertising campaign for OfficeMax, and as Darnell Turner on the NBC sitcom My Name Is Earl. | | Ferrell and the Chili Peppers toured together in 2013 . | Ferrell and the Chili Peppers wrapped up I 'm With You World Tour in April 2013 . | | Cells have four cycles. | How many cycles do cells have? | * Loss: [CachedGISTEmbedLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedgistembedloss) with these parameters: ```json {'guide': SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ), 'temperature': 0.025} ``` ### Evaluation Dataset #### bobox/enhanced_nli-50_k * Dataset: bobox/enhanced_nli-50_k * Size: 1,506 evaluation samples * Columns: sentence1 and sentence2 * Approximate statistics based on the first 1000 samples: | | sentence1 | sentence2 | |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | sentence1 | sentence2 | |:----------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Interestingly, snakes use their forked tongues to smell. | Snakes use their tongue to smell things. | | Soil is a renewable resource that can take thousand of years to form. | What is a renewable resource that can take thousand of years to form? | | As of March 22 , there were more than 321,000 cases with over 13,600 deaths and more than 96,000 recoveries reported worldwide . | As of 22 March , more than 321,000 cases of COVID-19 have been reported in over 180 countries and territories , resulting in more than 13,600 deaths and 96,000 recoveries . | * Loss: [CachedGISTEmbedLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedgistembedloss) with these parameters: ```json {'guide': SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ), 'temperature': 0.025} ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: steps - `per_device_train_batch_size`: 640 - `per_device_eval_batch_size`: 128 - `learning_rate`: 3.75e-05 - `weight_decay`: 0.0005 - `lr_scheduler_type`: cosine_with_min_lr - `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 7.499999999999999e-06} - `warmup_ratio`: 0.33 - `save_safetensors`: False - `fp16`: True - `push_to_hub`: True - `hub_model_id`: bobox/DeBERTa-small-ST-UnifiedDatasets-baseline-checkpoints-tmp - `hub_strategy`: all_checkpoints - `batch_sampler`: no_duplicates #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: steps - `prediction_loss_only`: True - `per_device_train_batch_size`: 640 - `per_device_eval_batch_size`: 128 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 3.75e-05 - `weight_decay`: 0.0005 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 3 - `max_steps`: -1 - `lr_scheduler_type`: cosine_with_min_lr - `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 7.499999999999999e-06} - `warmup_ratio`: 0.33 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: False - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: True - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: False - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: True - `resume_from_checkpoint`: None - `hub_model_id`: bobox/DeBERTa-small-ST-UnifiedDatasets-baseline-checkpoints-tmp - `hub_strategy`: all_checkpoints - `hub_private_repo`: False - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `eval_use_gather_object`: False - `batch_sampler`: no_duplicates - `multi_dataset_batch_sampler`: proportional
### Training Logs | Epoch | Step | Training Loss | loss | Qnli-dev_max_ap | allNLI-dev_max_ap | sts-test_spearman_cosine | |:------:|:----:|:-------------:|:------:|:---------------:|:-----------------:|:------------------------:| | 0.0055 | 1 | 8.8159 | - | - | - | - | | 0.0110 | 2 | 9.1259 | - | - | - | - | | 0.0165 | 3 | 8.9017 | - | - | - | - | | 0.0220 | 4 | 9.1969 | - | - | - | - | | 0.0275 | 5 | 9.3716 | 1.3746 | 0.6067 | 0.3706 | 0.1943 | | 0.0330 | 6 | 9.0425 | - | - | - | - | | 0.0385 | 7 | 8.7309 | - | - | - | - | | 0.0440 | 8 | 9.0123 | - | - | - | - | | 0.0495 | 9 | 8.8095 | - | - | - | - | | 0.0549 | 10 | 9.3194 | 1.3227 | 0.6089 | 0.3721 | 0.1976 | | 0.0604 | 11 | 8.9873 | - | - | - | - | | 0.0659 | 12 | 8.5575 | - | - | - | - | | 0.0714 | 13 | 8.8096 | - | - | - | - | | 0.0769 | 14 | 8.0996 | - | - | - | - | | 0.0824 | 15 | 8.1942 | 1.2244 | 0.6140 | 0.3743 | 0.2085 | | 0.0879 | 16 | 8.1654 | - | - | - | - | | 0.0934 | 17 | 7.7336 | - | - | - | - | | 0.0989 | 18 | 7.9535 | - | - | - | - | | 0.1044 | 19 | 7.9322 | - | - | - | - | | 0.1099 | 20 | 7.6812 | 1.1301 | 0.6199 | 0.3790 | 0.2233 | | 0.1154 | 21 | 7.551 | - | - | - | - | | 0.1209 | 22 | 7.3788 | - | - | - | - | | 0.1264 | 23 | 7.1746 | - | - | - | - | | 0.1319 | 24 | 7.1849 | - | - | - | - | | 0.1374 | 25 | 7.1085 | 1.0723 | 0.6195 | 0.3852 | 0.2357 | | 0.1429 | 26 | 7.3926 | - | - | - | - | | 0.1484 | 27 | 7.1817 | - | - | - | - | | 0.1538 | 28 | 7.239 | - | - | - | - | | 0.1593 | 29 | 7.0023 | - | - | - | - | | 0.1648 | 30 | 6.9898 | 1.0282 | 0.6215 | 0.3898 | 0.2477 | | 0.1703 | 31 | 6.9776 | - | - | - | - | | 0.1758 | 32 | 6.8088 | - | - | - | - | | 0.1813 | 33 | 6.8916 | - | - | - | - | | 0.1868 | 34 | 6.6931 | - | - | - | - | | 0.1923 | 35 | 6.5707 | 0.9846 | 0.6253 | 0.3952 | 0.2608 | | 0.1978 | 36 | 6.6231 | - | - | - | - | | 0.2033 | 37 | 6.4951 | - | - | - | - | | 0.2088 | 38 | 6.4607 | - | - | - | - | | 0.2143 | 39 | 6.4504 | - | - | - | - | | 0.2198 | 40 | 6.3649 | 0.9314 | 0.6299 | 0.4041 | 0.2738 | | 0.2253 | 41 | 6.2244 | - | - | - | - | | 0.2308 | 42 | 6.007 | - | - | - | - | | 0.2363 | 43 | 5.977 | - | - | - | - | | 0.2418 | 44 | 6.0748 | - | - | - | - | | 0.2473 | 45 | 5.7946 | 0.8549 | 0.6404 | 0.4116 | 0.2847 | | 0.2527 | 46 | 5.8751 | - | - | - | - | | 0.2582 | 47 | 5.543 | - | - | - | - | | 0.2637 | 48 | 5.5511 | - | - | - | - | | 0.2692 | 49 | 5.411 | - | - | - | - | | 0.2747 | 50 | 5.378 | 0.7943 | 0.6557 | 0.4159 | 0.2866 | | 0.2802 | 51 | 5.3831 | - | - | - | - | | 0.2857 | 52 | 4.9729 | - | - | - | - | | 0.2912 | 53 | 5.0425 | - | - | - | - | | 0.2967 | 54 | 4.9446 | - | - | - | - | | 0.3022 | 55 | 4.9288 | 0.7178 | 0.6679 | 0.4273 | 0.3132 | ### Framework Versions - Python: 3.10.14 - Sentence Transformers: 3.0.1 - Transformers: 4.44.0 - PyTorch: 2.4.0 - Accelerate: 0.33.0 - Datasets: 2.21.0 - Tokenizers: 0.19.1 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ```