Training in progress, step 110, checkpoint

6b3d1b1 verified 11 months ago

46.8 kB

	---
	base_model: microsoft/deberta-v3-small
	datasets: []
	language: []
	library_name: sentence-transformers
	metrics:
	- pearson_cosine
	- spearman_cosine
	- pearson_manhattan
	- spearman_manhattan
	- pearson_euclidean
	- spearman_euclidean
	- pearson_dot
	- spearman_dot
	- pearson_max
	- spearman_max
	- cosine_accuracy
	- cosine_accuracy_threshold
	- cosine_f1
	- cosine_f1_threshold
	- cosine_precision
	- cosine_recall
	- cosine_ap
	- dot_accuracy
	- dot_accuracy_threshold
	- dot_f1
	- dot_f1_threshold
	- dot_precision
	- dot_recall
	- dot_ap
	- manhattan_accuracy
	- manhattan_accuracy_threshold
	- manhattan_f1
	- manhattan_f1_threshold
	- manhattan_precision
	- manhattan_recall
	- manhattan_ap
	- euclidean_accuracy
	- euclidean_accuracy_threshold
	- euclidean_f1
	- euclidean_f1_threshold
	- euclidean_precision
	- euclidean_recall
	- euclidean_ap
	- max_accuracy
	- max_accuracy_threshold
	- max_f1
	- max_f1_threshold
	- max_precision
	- max_recall
	- max_ap
	pipeline_tag: sentence-similarity
	tags:
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	- generated_from_trainer
	- dataset_size:116445
	- loss:CachedGISTEmbedLoss
	widget:
	- source_sentence: what is the main purpose of the brain
	sentences:
	- Brain Physiologically, the function of the brain is to exert centralized control
	over the other organs of the body. The brain acts on the rest of the body both
	by generating patterns of muscle activity and by driving the secretion of chemicals
	called hormones. This centralized control allows rapid and coordinated responses
	to changes in the environment. Some basic types of responsiveness such as reflexes
	can be mediated by the spinal cord or peripheral ganglia, but sophisticated purposeful
	control of behavior based on complex sensory input requires the information integrating
	capabilities of a centralized brain.
	- How do scientists know that some mountains were once at the bottom of an ocean?
	- The Smiths Wiki \| Fandom powered by Wikia Share Ad blocker interference detected!
	Wikia is a free-to-use site that makes money from advertising. We have a modified
	experience for viewers using ad blockers Wikia is not accessible if you’ve made
	further modifications. Remove the custom ad blocker rule(s) and the page will
	load as expected. The Smiths were an English rock band formed in Manchester in
	1982. Based on the songwriting partnership of Morrissey (vocals) and Johnny Marr
	(guitar), the band also included Andy Rourke (bass), Mike Joyce (drums) and for
	a brief time Craig Gannon (rhythm guitar). Critics have called them one of the
	most important alternative rock bands to emerge from the British independent music
	scene of the 1980s,and the group has had major influence on subsequent artists.
	Morrissey's lovelorn tales of alienation found an audience amongst youth culture
	bored by the ubiquitous synthesiser-pop bands of the early 1980s, while Marr's
	complex melodies helped return guitar-based music to popularity. The group were
	signed to the independent record label Rough Trade Records , for whom they released
	four studio albums and several compilations, as well as numerous non-LP singles.
	Although they had limited commercial success outside the UK while they were still
	together, and never released a single that charted higher than number 10 in their
	home country, The Smiths won a growing following, and they remain cult and commercial
	favourites. The band broke up in 1987 amid disagreements between Morrissey and
	Marr and has turned down several offers to reform. Welcome to The Smiths Wiki
	- source_sentence: There were 29 Muslims fatalities in the Cave of the Patriarchs
	massacre .
	sentences:
	- In August , after the end of the war in June 1902 , Higgins Southampton left the
	`` SSBavarian '' and returned to Cape Town the following month .
	- Between 29 and 52 Muslims were killed and more than 100 others wounded . [ Settlers
	remember gunman Goldstein ; Hebron riots continue ] .
	- 29 Muslims were killed and more than 100 others wounded . [ Settlers remember
	gunman Goldstein ; Hebron riots continue ] .
	- source_sentence: are tabby cats all male?
	sentences:
	- Did you know orange tabby cats are typically male? In fact, up to 80 percent of
	orange tabbies are male, making orange female cats a bit of a rarity. According
	to the BBC's Focus Magazine, the ginger gene in cats works a little differently
	compared to humans; it is on the X chromosome.
	- Shawnee Trails Council was formed from the merger of the Four Rivers Council and
	the Audubon Council .
	- 'A picture of a modern looking kitchen area

	'
	- source_sentence: Aamir Khan agreed to act immediately after reading Mehra 's screenplay
	in `` Rang De Basanti '' .
	sentences:
	- Chris Rea — Free listening, videos, concerts, stats and photos at Last.fm singer-songwriter
	Christopher Anton Rea (pronounced Ree-ah), born 4 March 1951, is a singer, songwriter,
	and guitarist from Middlesbrough, England. Rea's recording career began in 1978.
	Although he almost immediately had a US hit single with "Fool (If You Think It's
	Over)", Rea's initial focus was on continental Europe, releasing eight albums
	in the 1980s. It wasn't until 1985's Shamrock Diaries and the songs "Stainsby
	Girls" and "Josephine," that UK audiences began to take notice of him. Follow
	up albums… read more
	- "Healthy Fast Food Meal No. 1. Grilled Chicken Sandwich and Fruit Cup (Chick-fil-A)\
	\ Several fast food chains offer a grilled chicken sandwich. The trick is ordering\
	\ it without mayo or creamy sauce, and making sure itâ\x80\x99s served with a\
	\ whole grain bun."
	- Aamir Khan agreed to act in `` Rang De Basanti '' immediately after reading Mehra
	's script .
	- source_sentence: 'A man wearing a blue bow tie and a fedora hat in a car. '
	sentences:
	- A man takes a photo of himself wearing a bowtie and hat
	- Scientists explain the world based on what?
	- 'County of Angus - definition of County of Angus by The Free Dictionary County
	of Angus - definition of County of Angus by The Free Dictionary http://www.thefreedictionary.com/County+of+Angus
	(ăng′gəs) n. Any of a breed of hornless beef cattle that originated in Scotland
	and are usually black but also occur in a red variety. Also called Black Angus.
	[After Angus, former county of Scotland.] Angus (ˈæŋɡəs) n (Placename) a council
	area of E Scotland on the North Sea: the historical county of Angus became part
	of Tayside region in 1975; reinstated as a unitary authority (excluding City of
	Dundee) in 1996. Administrative centre: Forfar. Pop: 107 520 (2003 est). Area:
	2181 sq km (842 sq miles) An•gus'
	model-index:
	- name: SentenceTransformer based on microsoft/deberta-v3-small
	results:
	- task:
	type: semantic-similarity
	name: Semantic Similarity
	dataset:
	name: sts test
	type: sts-test
	metrics:
	- type: pearson_cosine
	value: 0.7489263204555723
	name: Pearson Cosine
	- type: spearman_cosine
	value: 0.7626005619606424
	name: Spearman Cosine
	- type: pearson_manhattan
	value: 0.7591990025704353
	name: Pearson Manhattan
	- type: spearman_manhattan
	value: 0.7477882076989188
	name: Spearman Manhattan
	- type: pearson_euclidean
	value: 0.7622787611500085
	name: Pearson Euclidean
	- type: spearman_euclidean
	value: 0.7539243664071233
	name: Spearman Euclidean
	- type: pearson_dot
	value: 0.6493790443582248
	name: Pearson Dot
	- type: spearman_dot
	value: 0.6306412644605037
	name: Spearman Dot
	- type: pearson_max
	value: 0.7622787611500085
	name: Pearson Max
	- type: spearman_max
	value: 0.7626005619606424
	name: Spearman Max
	- task:
	type: binary-classification
	name: Binary Classification
	dataset:
	name: allNLI dev
	type: allNLI-dev
	metrics:
	- type: cosine_accuracy
	value: 0.7109375
	name: Cosine Accuracy
	- type: cosine_accuracy_threshold
	value: 0.916961669921875
	name: Cosine Accuracy Threshold
	- type: cosine_f1
	value: 0.5853658536585366
	name: Cosine F1
	- type: cosine_f1_threshold
	value: 0.8279993534088135
	name: Cosine F1 Threshold
	- type: cosine_precision
	value: 0.4748201438848921
	name: Cosine Precision
	- type: cosine_recall
	value: 0.7630057803468208
	name: Cosine Recall
	- type: cosine_ap
	value: 0.5495769497490841
	name: Cosine Ap
	- type: dot_accuracy
	value: 0.671875
	name: Dot Accuracy
	- type: dot_accuracy_threshold
	value: 481.2850646972656
	name: Dot Accuracy Threshold
	- type: dot_f1
	value: 0.549165120593692
	name: Dot F1
	- type: dot_f1_threshold
	value: 381.15167236328125
	name: Dot F1 Threshold
	- type: dot_precision
	value: 0.40437158469945356
	name: Dot Precision
	- type: dot_recall
	value: 0.8554913294797688
	name: Dot Recall
	- type: dot_ap
	value: 0.45293867777170244
	name: Dot Ap
	- type: manhattan_accuracy
	value: 0.71484375
	name: Manhattan Accuracy
	- type: manhattan_accuracy_threshold
	value: 186.7671356201172
	name: Manhattan Accuracy Threshold
	- type: manhattan_f1
	value: 0.5696465696465696
	name: Manhattan F1
	- type: manhattan_f1_threshold
	value: 268.783935546875
	name: Manhattan F1 Threshold
	- type: manhattan_precision
	value: 0.4448051948051948
	name: Manhattan Precision
	- type: manhattan_recall
	value: 0.791907514450867
	name: Manhattan Recall
	- type: manhattan_ap
	value: 0.5511647333663136
	name: Manhattan Ap
	- type: euclidean_accuracy
	value: 0.71484375
	name: Euclidean Accuracy
	- type: euclidean_accuracy_threshold
	value: 8.915003776550293
	name: Euclidean Accuracy Threshold
	- type: euclidean_f1
	value: 0.574074074074074
	name: Euclidean F1
	- type: euclidean_f1_threshold
	value: 12.812746047973633
	name: Euclidean F1 Threshold
	- type: euclidean_precision
	value: 0.47876447876447875
	name: Euclidean Precision
	- type: euclidean_recall
	value: 0.7167630057803468
	name: Euclidean Recall
	- type: euclidean_ap
	value: 0.5535962824434967
	name: Euclidean Ap
	- type: max_accuracy
	value: 0.71484375
	name: Max Accuracy
	- type: max_accuracy_threshold
	value: 481.2850646972656
	name: Max Accuracy Threshold
	- type: max_f1
	value: 0.5853658536585366
	name: Max F1
	- type: max_f1_threshold
	value: 381.15167236328125
	name: Max F1 Threshold
	- type: max_precision
	value: 0.47876447876447875
	name: Max Precision
	- type: max_recall
	value: 0.8554913294797688
	name: Max Recall
	- type: max_ap
	value: 0.5535962824434967
	name: Max Ap
	- task:
	type: binary-classification
	name: Binary Classification
	dataset:
	name: Qnli dev
	type: Qnli-dev
	metrics:
	- type: cosine_accuracy
	value: 0.681640625
	name: Cosine Accuracy
	- type: cosine_accuracy_threshold
	value: 0.8160840272903442
	name: Cosine Accuracy Threshold
	- type: cosine_f1
	value: 0.6917562724014337
	name: Cosine F1
	- type: cosine_f1_threshold
	value: 0.7854001522064209
	name: Cosine F1 Threshold
	- type: cosine_precision
	value: 0.5993788819875776
	name: Cosine Precision
	- type: cosine_recall
	value: 0.8177966101694916
	name: Cosine Recall
	- type: cosine_ap
	value: 0.7109982147608755
	name: Cosine Ap
	- type: dot_accuracy
	value: 0.6484375
	name: Dot Accuracy
	- type: dot_accuracy_threshold
	value: 392.5464782714844
	name: Dot Accuracy Threshold
	- type: dot_f1
	value: 0.6688311688311689
	name: Dot F1
	- type: dot_f1_threshold
	value: 368.7878723144531
	name: Dot F1 Threshold
	- type: dot_precision
	value: 0.5421052631578948
	name: Dot Precision
	- type: dot_recall
	value: 0.8728813559322034
	name: Dot Recall
	- type: dot_ap
	value: 0.6053421534358263
	name: Dot Ap
	- type: manhattan_accuracy
	value: 0.685546875
	name: Manhattan Accuracy
	- type: manhattan_accuracy_threshold
	value: 244.63809204101562
	name: Manhattan Accuracy Threshold
	- type: manhattan_f1
	value: 0.6938053097345133
	name: Manhattan F1
	- type: manhattan_f1_threshold
	value: 295.4796142578125
	name: Manhattan F1 Threshold
	- type: manhattan_precision
	value: 0.5957446808510638
	name: Manhattan Precision
	- type: manhattan_recall
	value: 0.8305084745762712
	name: Manhattan Recall
	- type: manhattan_ap
	value: 0.7216536349653324
	name: Manhattan Ap
	- type: euclidean_accuracy
	value: 0.6875
	name: Euclidean Accuracy
	- type: euclidean_accuracy_threshold
	value: 13.026724815368652
	name: Euclidean Accuracy Threshold
	- type: euclidean_f1
	value: 0.689407540394973
	name: Euclidean F1
	- type: euclidean_f1_threshold
	value: 14.538017272949219
	name: Euclidean F1 Threshold
	- type: euclidean_precision
	value: 0.5981308411214953
	name: Euclidean Precision
	- type: euclidean_recall
	value: 0.8135593220338984
	name: Euclidean Recall
	- type: euclidean_ap
	value: 0.7181091181717016
	name: Euclidean Ap
	- type: max_accuracy
	value: 0.6875
	name: Max Accuracy
	- type: max_accuracy_threshold
	value: 392.5464782714844
	name: Max Accuracy Threshold
	- type: max_f1
	value: 0.6938053097345133
	name: Max F1
	- type: max_f1_threshold
	value: 368.7878723144531
	name: Max F1 Threshold
	- type: max_precision
	value: 0.5993788819875776
	name: Max Precision
	- type: max_recall
	value: 0.8728813559322034
	name: Max Recall
	- type: max_ap
	value: 0.7216536349653324
	name: Max Ap
	---

	# SentenceTransformer based on microsoft/deberta-v3-small

	This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) on the bobox/enhanced_nli-50_k dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

	## Model Details

	### Model Description
	- Model Type: Sentence Transformer
	- Base model: [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) <!-- at revision a36c739020e01763fe789b4b85e2df55d6180012 -->
	- Maximum Sequence Length: 512 tokens
	- Output Dimensionality: 768 tokens
	- Similarity Function: Cosine Similarity
	- Training Dataset:
	- bobox/enhanced_nli-50_k
	<!-- - Language: Unknown -->
	<!-- - License: Unknown -->

	### Model Sources

	- Documentation: [Sentence Transformers Documentation](https://sbert.net)
	- Repository: [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
	- Hugging Face: [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

	### Full Model Architecture

	```
	SentenceTransformer(
	(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model
	(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
	)
	```

	## Usage

	### Direct Usage (Sentence Transformers)

	First install the Sentence Transformers library:

	```bash
	pip install -U sentence-transformers
	```

	Then you can load this model and run inference.
	```python
	from sentence_transformers import SentenceTransformer

	# Download from the 🤗 Hub
	model = SentenceTransformer("bobox/DeBERTa-small-ST-UnifiedDatasets-baseline-checkpoints-tmp")
	# Run inference
	sentences = [
	'A man wearing a blue bow tie and a fedora hat in a car. ',
	'A man takes a photo of himself wearing a bowtie and hat',
	'County of Angus - definition of County of Angus by The Free Dictionary County of Angus - definition of County of Angus by The Free Dictionary http://www.thefreedictionary.com/County+of+Angus \xa0(ăng′gəs) n. Any of a breed of hornless beef cattle that originated in Scotland and are usually black but also occur in a red variety. Also called Black Angus. [After Angus, former county of Scotland.] Angus (ˈæŋɡəs) n (Placename) a council area of E Scotland on the North Sea: the historical county of Angus became part of Tayside region in 1975; reinstated as a unitary authority (excluding City of Dundee) in 1996. Administrative centre: Forfar. Pop: 107 520 (2003 est). Area: 2181 sq km (842 sq miles) An•gus',
	]
	embeddings = model.encode(sentences)
	print(embeddings.shape)
	# [3, 768]

	# Get the similarity scores for the embeddings
	similarities = model.similarity(embeddings, embeddings)
	print(similarities.shape)
	# [3, 3]
	```

	<!--
	### Direct Usage (Transformers)

	<details><summary>Click to see the direct usage in Transformers</summary>

	</details>
	-->

	<!--
	### Downstream Usage (Sentence Transformers)

	You can finetune this model on your own dataset.

	<details><summary>Click to expand</summary>

	</details>
	-->

	<!--
	### Out-of-Scope Use

	List how the model may foreseeably be misused and address what users ought not to do with the model.
	-->

	## Evaluation

	### Metrics

	#### Semantic Similarity
	* Dataset: `sts-test`
	* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)

	\| Metric \| Value \|
	\|:--------------------\|:-----------\|
	\| pearson_cosine \| 0.7489 \|
	\| spearman_cosine \| 0.7626 \|
	\| pearson_manhattan \| 0.7592 \|
	\| spearman_manhattan \| 0.7478 \|
	\| pearson_euclidean \| 0.7623 \|
	\| spearman_euclidean \| 0.7539 \|
	\| pearson_dot \| 0.6494 \|
	\| spearman_dot \| 0.6306 \|
	\| pearson_max \| 0.7623 \|
	\| spearman_max \| 0.7626 \|

	#### Binary Classification
	* Dataset: `allNLI-dev`
	* Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)

	\| Metric \| Value \|
	\|:-----------------------------\|:-----------\|
	\| cosine_accuracy \| 0.7109 \|
	\| cosine_accuracy_threshold \| 0.917 \|
	\| cosine_f1 \| 0.5854 \|
	\| cosine_f1_threshold \| 0.828 \|
	\| cosine_precision \| 0.4748 \|
	\| cosine_recall \| 0.763 \|
	\| cosine_ap \| 0.5496 \|
	\| dot_accuracy \| 0.6719 \|
	\| dot_accuracy_threshold \| 481.2851 \|
	\| dot_f1 \| 0.5492 \|
	\| dot_f1_threshold \| 381.1517 \|
	\| dot_precision \| 0.4044 \|
	\| dot_recall \| 0.8555 \|
	\| dot_ap \| 0.4529 \|
	\| manhattan_accuracy \| 0.7148 \|
	\| manhattan_accuracy_threshold \| 186.7671 \|
	\| manhattan_f1 \| 0.5696 \|
	\| manhattan_f1_threshold \| 268.7839 \|
	\| manhattan_precision \| 0.4448 \|
	\| manhattan_recall \| 0.7919 \|
	\| manhattan_ap \| 0.5512 \|
	\| euclidean_accuracy \| 0.7148 \|
	\| euclidean_accuracy_threshold \| 8.915 \|
	\| euclidean_f1 \| 0.5741 \|
	\| euclidean_f1_threshold \| 12.8127 \|
	\| euclidean_precision \| 0.4788 \|
	\| euclidean_recall \| 0.7168 \|
	\| euclidean_ap \| 0.5536 \|
	\| max_accuracy \| 0.7148 \|
	\| max_accuracy_threshold \| 481.2851 \|
	\| max_f1 \| 0.5854 \|
	\| max_f1_threshold \| 381.1517 \|
	\| max_precision \| 0.4788 \|
	\| max_recall \| 0.8555 \|
	\| max_ap \| 0.5536 \|

	#### Binary Classification
	* Dataset: `Qnli-dev`
	* Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)

	\| Metric \| Value \|
	\|:-----------------------------\|:-----------\|
	\| cosine_accuracy \| 0.6816 \|
	\| cosine_accuracy_threshold \| 0.8161 \|
	\| cosine_f1 \| 0.6918 \|
	\| cosine_f1_threshold \| 0.7854 \|
	\| cosine_precision \| 0.5994 \|
	\| cosine_recall \| 0.8178 \|
	\| cosine_ap \| 0.711 \|
	\| dot_accuracy \| 0.6484 \|
	\| dot_accuracy_threshold \| 392.5465 \|
	\| dot_f1 \| 0.6688 \|
	\| dot_f1_threshold \| 368.7879 \|
	\| dot_precision \| 0.5421 \|
	\| dot_recall \| 0.8729 \|
	\| dot_ap \| 0.6053 \|
	\| manhattan_accuracy \| 0.6855 \|
	\| manhattan_accuracy_threshold \| 244.6381 \|
	\| manhattan_f1 \| 0.6938 \|
	\| manhattan_f1_threshold \| 295.4796 \|
	\| manhattan_precision \| 0.5957 \|
	\| manhattan_recall \| 0.8305 \|
	\| manhattan_ap \| 0.7217 \|
	\| euclidean_accuracy \| 0.6875 \|
	\| euclidean_accuracy_threshold \| 13.0267 \|
	\| euclidean_f1 \| 0.6894 \|
	\| euclidean_f1_threshold \| 14.538 \|
	\| euclidean_precision \| 0.5981 \|
	\| euclidean_recall \| 0.8136 \|
	\| euclidean_ap \| 0.7181 \|
	\| max_accuracy \| 0.6875 \|
	\| max_accuracy_threshold \| 392.5465 \|
	\| max_f1 \| 0.6938 \|
	\| max_f1_threshold \| 368.7879 \|
	\| max_precision \| 0.5994 \|
	\| max_recall \| 0.8729 \|
	\| max_ap \| 0.7217 \|

	<!--
	## Bias, Risks and Limitations

	What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.
	-->

	<!--
	### Recommendations

	What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.
	-->

	## Training Details

	### Training Dataset

	#### bobox/enhanced_nli-50_k

	* Dataset: bobox/enhanced_nli-50_k
	* Size: 116,445 training samples
	* Columns: <code>sentence1</code> and <code>sentence2</code>
	* Approximate statistics based on the first 1000 samples:
	\| \| sentence1 \| sentence2 \|
	\|:--------\|:-----------------------------------------------------------------------------------\|:-----------------------------------------------------------------------------------\|
	\| type \| string \| string \|
	\| details \| <ul><li>min: 4 tokens</li><li>mean: 33.67 tokens</li><li>max: 338 tokens</li></ul> \| <ul><li>min: 2 tokens</li><li>mean: 51.48 tokens</li><li>max: 512 tokens</li></ul> \|
	* Samples:
	\| sentence1 \| sentence2 \|
	\|:---------------------------------------------------------------------\|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|
	\| <code>who is darnell from my name is earl</code> \| <code>Eddie Steeples Eddie Steeples (born November 25, 1973)[1] is an American actor known for his roles as the "Rubberband Man" in an advertising campaign for OfficeMax, and as Darnell Turner on the NBC sitcom My Name Is Earl.</code> \|
	\| <code>Ferrell and the Chili Peppers toured together in 2013 .</code> \| <code>Ferrell and the Chili Peppers wrapped up I 'm With You World Tour in April 2013 .</code> \|
	\| <code>Cells have four cycles.</code> \| <code>How many cycles do cells have?</code> \|
	* Loss: [<code>CachedGISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedgistembedloss) with these parameters:
	```json
	{'guide': SentenceTransformer(
	(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
	(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
	(2): Normalize()
	), 'temperature': 0.025}
	```

	### Evaluation Dataset

	#### bobox/enhanced_nli-50_k

	* Dataset: bobox/enhanced_nli-50_k
	* Size: 1,506 evaluation samples
	* Columns: <code>sentence1</code> and <code>sentence2</code>
	* Approximate statistics based on the first 1000 samples:
	\| \| sentence1 \| sentence2 \|
	\|:--------\|:-----------------------------------------------------------------------------------\|:-----------------------------------------------------------------------------------\|
	\| type \| string \| string \|
	\| details \| <ul><li>min: 3 tokens</li><li>mean: 32.36 tokens</li><li>max: 341 tokens</li></ul> \| <ul><li>min: 2 tokens</li><li>mean: 61.99 tokens</li><li>max: 431 tokens</li></ul> \|
	* Samples:
	\| sentence1 \| sentence2 \|
	\|:----------------------------------------------------------------------------------------------------------------------------------------------\|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|
	\| <code>Interestingly, snakes use their forked tongues to smell.</code> \| <code>Snakes use their tongue to smell things.</code> \|
	\| <code>Soil is a renewable resource that can take thousand of years to form.</code> \| <code>What is a renewable resource that can take thousand of years to form?</code> \|
	\| <code>As of March 22 , there were more than 321,000 cases with over 13,600 deaths and more than 96,000 recoveries reported worldwide .</code> \| <code>As of 22 March , more than 321,000 cases of COVID-19 have been reported in over 180 countries and territories , resulting in more than 13,600 deaths and 96,000 recoveries .</code> \|
	* Loss: [<code>CachedGISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedgistembedloss) with these parameters:
	```json
	{'guide': SentenceTransformer(
	(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
	(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
	(2): Normalize()
	), 'temperature': 0.025}
	```

	### Training Hyperparameters
	#### Non-Default Hyperparameters

	- `eval_strategy`: steps
	- `per_device_train_batch_size`: 640
	- `per_device_eval_batch_size`: 128
	- `learning_rate`: 3.75e-05
	- `weight_decay`: 0.0005
	- `lr_scheduler_type`: cosine_with_min_lr
	- `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 7.499999999999999e-06}
	- `warmup_ratio`: 0.33
	- `save_safetensors`: False
	- `fp16`: True
	- `push_to_hub`: True
	- `hub_model_id`: bobox/DeBERTa-small-ST-UnifiedDatasets-baseline-checkpoints-tmp
	- `hub_strategy`: all_checkpoints
	- `batch_sampler`: no_duplicates

	#### All Hyperparameters
	<details><summary>Click to expand</summary>

	- `overwrite_output_dir`: False
	- `do_predict`: False
	- `eval_strategy`: steps
	- `prediction_loss_only`: True
	- `per_device_train_batch_size`: 640
	- `per_device_eval_batch_size`: 128
	- `per_gpu_train_batch_size`: None
	- `per_gpu_eval_batch_size`: None
	- `gradient_accumulation_steps`: 1
	- `eval_accumulation_steps`: None
	- `torch_empty_cache_steps`: None
	- `learning_rate`: 3.75e-05
	- `weight_decay`: 0.0005
	- `adam_beta1`: 0.9
	- `adam_beta2`: 0.999
	- `adam_epsilon`: 1e-08
	- `max_grad_norm`: 1.0
	- `num_train_epochs`: 3
	- `max_steps`: -1
	- `lr_scheduler_type`: cosine_with_min_lr
	- `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 7.499999999999999e-06}
	- `warmup_ratio`: 0.33
	- `warmup_steps`: 0
	- `log_level`: passive
	- `log_level_replica`: warning
	- `log_on_each_node`: True
	- `logging_nan_inf_filter`: True
	- `save_safetensors`: False
	- `save_on_each_node`: False
	- `save_only_model`: False
	- `restore_callback_states_from_checkpoint`: False
	- `no_cuda`: False
	- `use_cpu`: False
	- `use_mps_device`: False
	- `seed`: 42
	- `data_seed`: None
	- `jit_mode_eval`: False
	- `use_ipex`: False
	- `bf16`: False
	- `fp16`: True
	- `fp16_opt_level`: O1
	- `half_precision_backend`: auto
	- `bf16_full_eval`: False
	- `fp16_full_eval`: False
	- `tf32`: None
	- `local_rank`: 0
	- `ddp_backend`: None
	- `tpu_num_cores`: None
	- `tpu_metrics_debug`: False
	- `debug`: []
	- `dataloader_drop_last`: False
	- `dataloader_num_workers`: 0
	- `dataloader_prefetch_factor`: None
	- `past_index`: -1
	- `disable_tqdm`: False
	- `remove_unused_columns`: True
	- `label_names`: None
	- `load_best_model_at_end`: False
	- `ignore_data_skip`: False
	- `fsdp`: []
	- `fsdp_min_num_params`: 0
	- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
	- `fsdp_transformer_layer_cls_to_wrap`: None
	- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
	- `deepspeed`: None
	- `label_smoothing_factor`: 0.0
	- `optim`: adamw_torch
	- `optim_args`: None
	- `adafactor`: False
	- `group_by_length`: False
	- `length_column_name`: length
	- `ddp_find_unused_parameters`: None
	- `ddp_bucket_cap_mb`: None
	- `ddp_broadcast_buffers`: False
	- `dataloader_pin_memory`: True
	- `dataloader_persistent_workers`: False
	- `skip_memory_metrics`: True
	- `use_legacy_prediction_loop`: False
	- `push_to_hub`: True
	- `resume_from_checkpoint`: None
	- `hub_model_id`: bobox/DeBERTa-small-ST-UnifiedDatasets-baseline-checkpoints-tmp
	- `hub_strategy`: all_checkpoints
	- `hub_private_repo`: False
	- `hub_always_push`: False
	- `gradient_checkpointing`: False
	- `gradient_checkpointing_kwargs`: None
	- `include_inputs_for_metrics`: False
	- `eval_do_concat_batches`: True
	- `fp16_backend`: auto
	- `push_to_hub_model_id`: None
	- `push_to_hub_organization`: None
	- `mp_parameters`:
	- `auto_find_batch_size`: False
	- `full_determinism`: False
	- `torchdynamo`: None
	- `ray_scope`: last
	- `ddp_timeout`: 1800
	- `torch_compile`: False
	- `torch_compile_backend`: None
	- `torch_compile_mode`: None
	- `dispatch_batches`: None
	- `split_batches`: None
	- `include_tokens_per_second`: False
	- `include_num_input_tokens_seen`: False
	- `neftune_noise_alpha`: None
	- `optim_target_modules`: None
	- `batch_eval_metrics`: False
	- `eval_on_start`: False
	- `eval_use_gather_object`: False
	- `batch_sampler`: no_duplicates
	- `multi_dataset_batch_sampler`: proportional

	</details>

	### Training Logs
	<details><summary>Click to expand</summary>

	\| Epoch \| Step \| Training Loss \| loss \| Qnli-dev_max_ap \| allNLI-dev_max_ap \| sts-test_spearman_cosine \|
	\|:------:\|:----:\|:-------------:\|:------:\|:---------------:\|:-----------------:\|:------------------------:\|
	\| 0.0055 \| 1 \| 8.8159 \| - \| - \| - \| - \|
	\| 0.0110 \| 2 \| 9.1259 \| - \| - \| - \| - \|
	\| 0.0165 \| 3 \| 8.9017 \| - \| - \| - \| - \|
	\| 0.0220 \| 4 \| 9.1969 \| - \| - \| - \| - \|
	\| 0.0275 \| 5 \| 9.3716 \| 1.3746 \| 0.6067 \| 0.3706 \| 0.1943 \|
	\| 0.0330 \| 6 \| 9.0425 \| - \| - \| - \| - \|
	\| 0.0385 \| 7 \| 8.7309 \| - \| - \| - \| - \|
	\| 0.0440 \| 8 \| 9.0123 \| - \| - \| - \| - \|
	\| 0.0495 \| 9 \| 8.8095 \| - \| - \| - \| - \|
	\| 0.0549 \| 10 \| 9.3194 \| 1.3227 \| 0.6089 \| 0.3721 \| 0.1976 \|
	\| 0.0604 \| 11 \| 8.9873 \| - \| - \| - \| - \|
	\| 0.0659 \| 12 \| 8.5575 \| - \| - \| - \| - \|
	\| 0.0714 \| 13 \| 8.8096 \| - \| - \| - \| - \|
	\| 0.0769 \| 14 \| 8.0996 \| - \| - \| - \| - \|
	\| 0.0824 \| 15 \| 8.1942 \| 1.2244 \| 0.6140 \| 0.3743 \| 0.2085 \|
	\| 0.0879 \| 16 \| 8.1654 \| - \| - \| - \| - \|
	\| 0.0934 \| 17 \| 7.7336 \| - \| - \| - \| - \|
	\| 0.0989 \| 18 \| 7.9535 \| - \| - \| - \| - \|
	\| 0.1044 \| 19 \| 7.9322 \| - \| - \| - \| - \|
	\| 0.1099 \| 20 \| 7.6812 \| 1.1301 \| 0.6199 \| 0.3790 \| 0.2233 \|
	\| 0.1154 \| 21 \| 7.551 \| - \| - \| - \| - \|
	\| 0.1209 \| 22 \| 7.3788 \| - \| - \| - \| - \|
	\| 0.1264 \| 23 \| 7.1746 \| - \| - \| - \| - \|
	\| 0.1319 \| 24 \| 7.1849 \| - \| - \| - \| - \|
	\| 0.1374 \| 25 \| 7.1085 \| 1.0723 \| 0.6195 \| 0.3852 \| 0.2357 \|
	\| 0.1429 \| 26 \| 7.3926 \| - \| - \| - \| - \|
	\| 0.1484 \| 27 \| 7.1817 \| - \| - \| - \| - \|
	\| 0.1538 \| 28 \| 7.239 \| - \| - \| - \| - \|
	\| 0.1593 \| 29 \| 7.0023 \| - \| - \| - \| - \|
	\| 0.1648 \| 30 \| 6.9898 \| 1.0282 \| 0.6215 \| 0.3898 \| 0.2477 \|
	\| 0.1703 \| 31 \| 6.9776 \| - \| - \| - \| - \|
	\| 0.1758 \| 32 \| 6.8088 \| - \| - \| - \| - \|
	\| 0.1813 \| 33 \| 6.8916 \| - \| - \| - \| - \|
	\| 0.1868 \| 34 \| 6.6931 \| - \| - \| - \| - \|
	\| 0.1923 \| 35 \| 6.5707 \| 0.9846 \| 0.6253 \| 0.3952 \| 0.2608 \|
	\| 0.1978 \| 36 \| 6.6231 \| - \| - \| - \| - \|
	\| 0.2033 \| 37 \| 6.4951 \| - \| - \| - \| - \|
	\| 0.2088 \| 38 \| 6.4607 \| - \| - \| - \| - \|
	\| 0.2143 \| 39 \| 6.4504 \| - \| - \| - \| - \|
	\| 0.2198 \| 40 \| 6.3649 \| 0.9314 \| 0.6299 \| 0.4041 \| 0.2738 \|
	\| 0.2253 \| 41 \| 6.2244 \| - \| - \| - \| - \|
	\| 0.2308 \| 42 \| 6.007 \| - \| - \| - \| - \|
	\| 0.2363 \| 43 \| 5.977 \| - \| - \| - \| - \|
	\| 0.2418 \| 44 \| 6.0748 \| - \| - \| - \| - \|
	\| 0.2473 \| 45 \| 5.7946 \| 0.8549 \| 0.6404 \| 0.4116 \| 0.2847 \|
	\| 0.2527 \| 46 \| 5.8751 \| - \| - \| - \| - \|
	\| 0.2582 \| 47 \| 5.543 \| - \| - \| - \| - \|
	\| 0.2637 \| 48 \| 5.5511 \| - \| - \| - \| - \|
	\| 0.2692 \| 49 \| 5.411 \| - \| - \| - \| - \|
	\| 0.2747 \| 50 \| 5.378 \| 0.7943 \| 0.6557 \| 0.4159 \| 0.2866 \|
	\| 0.2802 \| 51 \| 5.3831 \| - \| - \| - \| - \|
	\| 0.2857 \| 52 \| 4.9729 \| - \| - \| - \| - \|
	\| 0.2912 \| 53 \| 5.0425 \| - \| - \| - \| - \|
	\| 0.2967 \| 54 \| 4.9446 \| - \| - \| - \| - \|
	\| 0.3022 \| 55 \| 4.9288 \| 0.7178 \| 0.6679 \| 0.4273 \| 0.3132 \|
	\| 0.3077 \| 56 \| 4.8434 \| - \| - \| - \| - \|
	\| 0.3132 \| 57 \| 4.6914 \| - \| - \| - \| - \|
	\| 0.3187 \| 58 \| 4.5254 \| - \| - \| - \| - \|
	\| 0.3242 \| 59 \| 4.6734 \| - \| - \| - \| - \|
	\| 0.3297 \| 60 \| 4.2421 \| 0.6202 \| 0.6684 \| 0.4423 \| 0.3580 \|
	\| 0.3352 \| 61 \| 4.2234 \| - \| - \| - \| - \|
	\| 0.3407 \| 62 \| 4.0225 \| - \| - \| - \| - \|
	\| 0.3462 \| 63 \| 4.0034 \| - \| - \| - \| - \|
	\| 0.3516 \| 64 \| 3.994 \| - \| - \| - \| - \|
	\| 0.3571 \| 65 \| 3.651 \| 0.5489 \| 0.6750 \| 0.4569 \| 0.4014 \|
	\| 0.3626 \| 66 \| 3.9308 \| - \| - \| - \| - \|
	\| 0.3681 \| 67 \| 3.8694 \| - \| - \| - \| - \|
	\| 0.3736 \| 68 \| 3.7159 \| - \| - \| - \| - \|
	\| 0.3791 \| 69 \| 3.6499 \| - \| - \| - \| - \|
	\| 0.3846 \| 70 \| 3.4749 \| 0.4923 \| 0.6734 \| 0.4701 \| 0.4465 \|
	\| 0.3901 \| 71 \| 3.3356 \| - \| - \| - \| - \|
	\| 0.3956 \| 72 \| 3.4768 \| - \| - \| - \| - \|
	\| 0.4011 \| 73 \| 3.2748 \| - \| - \| - \| - \|
	\| 0.4066 \| 74 \| 3.2789 \| - \| - \| - \| - \|
	\| 0.4121 \| 75 \| 2.9815 \| 0.4422 \| 0.6759 \| 0.4747 \| 0.4924 \|
	\| 0.4176 \| 76 \| 3.2356 \| - \| - \| - \| - \|
	\| 0.4231 \| 77 \| 2.946 \| - \| - \| - \| - \|
	\| 0.4286 \| 78 \| 2.8888 \| - \| - \| - \| - \|
	\| 0.4341 \| 79 \| 2.8992 \| - \| - \| - \| - \|
	\| 0.4396 \| 80 \| 2.9901 \| 0.4040 \| 0.6786 \| 0.4781 \| 0.5478 \|
	\| 0.4451 \| 81 \| 2.6608 \| - \| - \| - \| - \|
	\| 0.4505 \| 82 \| 2.831 \| - \| - \| - \| - \|
	\| 0.4560 \| 83 \| 2.5503 \| - \| - \| - \| - \|
	\| 0.4615 \| 84 \| 2.8576 \| - \| - \| - \| - \|
	\| 0.4670 \| 85 \| 2.5726 \| 0.3711 \| 0.6858 \| 0.4898 \| 0.6134 \|
	\| 0.4725 \| 86 \| 2.7197 \| - \| - \| - \| - \|
	\| 0.4780 \| 87 \| 2.5123 \| - \| - \| - \| - \|
	\| 0.4835 \| 88 \| 2.553 \| - \| - \| - \| - \|
	\| 0.4890 \| 89 \| 2.4862 \| - \| - \| - \| - \|
	\| 0.4945 \| 90 \| 2.491 \| 0.3450 \| 0.6997 \| 0.5077 \| 0.6668 \|
	\| 0.5 \| 91 \| 2.3648 \| - \| - \| - \| - \|
	\| 0.5055 \| 92 \| 2.3788 \| - \| - \| - \| - \|
	\| 0.5110 \| 93 \| 2.3758 \| - \| - \| - \| - \|
	\| 0.5165 \| 94 \| 2.3319 \| - \| - \| - \| - \|
	\| 0.5220 \| 95 \| 2.2336 \| 0.3238 \| 0.7048 \| 0.5252 \| 0.7018 \|
	\| 0.5275 \| 96 \| 2.3036 \| - \| - \| - \| - \|
	\| 0.5330 \| 97 \| 2.3034 \| - \| - \| - \| - \|
	\| 0.5385 \| 98 \| 2.207 \| - \| - \| - \| - \|
	\| 0.5440 \| 99 \| 2.1732 \| - \| - \| - \| - \|
	\| 0.5495 \| 100 \| 2.1743 \| 0.3036 \| 0.7091 \| 0.5418 \| 0.7272 \|
	\| 0.5549 \| 101 \| 2.086 \| - \| - \| - \| - \|
	\| 0.5604 \| 102 \| 2.0223 \| - \| - \| - \| - \|
	\| 0.5659 \| 103 \| 2.0878 \| - \| - \| - \| - \|
	\| 0.5714 \| 104 \| 1.9475 \| - \| - \| - \| - \|
	\| 0.5769 \| 105 \| 2.1524 \| 0.2853 \| 0.7159 \| 0.5499 \| 0.7489 \|
	\| 0.5824 \| 106 \| 1.9393 \| - \| - \| - \| - \|
	\| 0.5879 \| 107 \| 2.1308 \| - \| - \| - \| - \|
	\| 0.5934 \| 108 \| 1.9469 \| - \| - \| - \| - \|
	\| 0.5989 \| 109 \| 1.8683 \| - \| - \| - \| - \|
	\| 0.6044 \| 110 \| 1.8167 \| 0.2702 \| 0.7217 \| 0.5536 \| 0.7626 \|

	</details>

	### Framework Versions
	- Python: 3.10.14
	- Sentence Transformers: 3.0.1
	- Transformers: 4.44.0
	- PyTorch: 2.4.0
	- Accelerate: 0.33.0
	- Datasets: 2.21.0
	- Tokenizers: 0.19.1

	## Citation

	### BibTeX

	#### Sentence Transformers
	```bibtex
	@inproceedings{reimers-2019-sentence-bert,
	title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
	author = "Reimers, Nils and Gurevych, Iryna",
	booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
	month = "11",
	year = "2019",
	publisher = "Association for Computational Linguistics",
	url = "https://arxiv.org/abs/1908.10084",
	}
	```

	<!--
	## Glossary

	Clearly define terms in order to be accessible across audiences.
	-->

	<!--
	## Model Card Authors

	Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.
	-->

	<!--
	## Model Card Contact

	Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.
	-->