Add new CrossEncoder model

41dadcc verified 9 days ago

32.8 kB

	---
	language:
	- en
	tags:
	- sentence-transformers
	- cross-encoder
	- text-classification
	- generated_from_trainer
	- dataset_size:82326
	- loss:ListNetLoss
	base_model: microsoft/MiniLM-L12-H384-uncased
	datasets:
	- microsoft/ms_marco
	pipeline_tag: text-classification
	library_name: sentence-transformers
	metrics:
	- map
	- mrr@10
	- ndcg@10
	co2_eq_emissions:
	emissions: 91.67425151971155
	energy_consumed: 0.23584713101479168
	source: codecarbon
	training_type: fine-tuning
	on_cloud: false
	cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
	ram_total_size: 31.777088165283203
	hours_used: 0.862
	hardware_used: 1 x NVIDIA GeForce RTX 3090
	model-index:
	- name: CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
	results: []
	---

	# CrossEncoder based on microsoft/MiniLM-L12-H384-uncased

	This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) on the [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

	## Model Details

	### Model Description
	- Model Type: Cross Encoder
	- Base model: [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) <!-- at revision 44acabbec0ef496f6dbc93adadea57f376b7c0ec -->
	- Maximum Sequence Length: 512 tokens
	- Number of Output Labels: 1 label
	- Training Dataset:
	- [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco)
	- Language: en
	<!-- - License: Unknown -->

	### Model Sources

	- Documentation: [Sentence Transformers Documentation](https://sbert.net)
	- Documentation: [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
	- Repository: [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
	- Hugging Face: [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)

	## Usage

	### Direct Usage (Sentence Transformers)

	First install the Sentence Transformers library:

	```bash
	pip install -U sentence-transformers
	```

	Then you can load this model and run inference.
	```python
	from sentence_transformers import CrossEncoder

	# Download from the 🤗 Hub
	model = CrossEncoder("tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listnet")
	# Get scores for pairs of texts
	pairs = [
	['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
	['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
	['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
	]
	scores = model.predict(pairs)
	print(scores.shape)
	# (3,)

	# Or rank different texts based on similarity to a single text
	ranks = model.rank(
	'How many calories in an egg',
	[
	'There are on average between 55 and 80 calories in an egg depending on its size.',
	'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
	'Most of the calories in an egg come from the yellow yolk in the center.',
	]
	)
	# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
	```

	<!--
	### Direct Usage (Transformers)

	<details><summary>Click to see the direct usage in Transformers</summary>

	</details>
	-->

	<!--
	### Downstream Usage (Sentence Transformers)

	You can finetune this model on your own dataset.

	<details><summary>Click to expand</summary>

	</details>
	-->

	<!--
	### Out-of-Scope Use

	List how the model may foreseeably be misused and address what users ought not to do with the model.
	-->

	## Evaluation

	### Metrics

	#### Cross Encoder Reranking

	* Datasets: `NanoMSMARCO`, `NanoNFCorpus` and `NanoNQ`
	* Evaluated with [<code>CERerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CERerankingEvaluator)

	\| Metric \| NanoMSMARCO \| NanoNFCorpus \| NanoNQ \|
	\|:------------\|:---------------------\|:---------------------\|:---------------------\|
	\| map \| 0.5020 (+0.0124) \| 0.3389 (+0.0684) \| 0.5833 (+0.1626) \|
	\| mrr@10 \| 0.4884 (+0.0109) \| 0.5581 (+0.0582) \| 0.5848 (+0.1581) \|
	\| ndcg@10 \| 0.5545 (+0.0141) \| 0.3595 (+0.0345) \| 0.6487 (+0.1481) \|

	#### Cross Encoder Nano BEIR

	* Dataset: `NanoBEIR_mean`
	* Evaluated with [<code>CENanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CENanoBEIREvaluator)

	\| Metric \| Value \|
	\|:------------\|:---------------------\|
	\| map \| 0.4747 (+0.0812) \|
	\| mrr@10 \| 0.5437 (+0.0757) \|
	\| ndcg@10 \| 0.5209 (+0.0655) \|

	<!--
	## Bias, Risks and Limitations

	What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.
	-->

	<!--
	### Recommendations

	What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.
	-->

	## Training Details

	### Training Dataset

	#### ms_marco

	* Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
	* Size: 82,326 training samples
	* Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
	* Approximate statistics based on the first 1000 samples:
	\| \| query \| docs \| labels \|
	\|:--------\|:------------------------------------------------------------------------------------------------\|:------------------------------------\|:------------------------------------\|
	\| type \| string \| list \| list \|
	\| details \| <ul><li>min: 11 characters</li><li>mean: 33.24 characters</li><li>max: 101 characters</li></ul> \| <ul><li>size: 10 elements</li></ul> \| <ul><li>size: 10 elements</li></ul> \|
	* Samples:
	\| query \| docs \| labels \|
	\|:--------------------------------------------\|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|:----------------------------------\|
	\| <code>what are fiber lasers</code> \| <code>['From Wikipedia, the free encyclopedia. A fiber laser or fibre laser is a laser in which the active gain medium is an optical fiber doped with rare-earth elements such as erbium, ytterbium, neodymium, dysprosium, praseodymium, and thulium. They are related to doped fiber amplifiers, which provide light amplification without lasing. Many high-power fiber lasers are based on double-clad fiber. The gain medium forms the core of the fiber, which is surrounded by two layers of cladding. The lasing mode propagates in the core, while a multimode pump beam propagates in the inner cladding layer. The outer cladding keeps this pump light confined.', 'The fiber laser is a variation on the standard solid-state laser, with the medium being a clad fiber rather than a rod, a slab, or a disk. Laser light is emitted by a dopant in the central core of the fiber, and the core structure can range from simple to fairly complex. The doped fiber has a cavity mirror on each end; in practice, these are fiber ...</code> \| <code>[1, 0, 0, 0, 0, ...]</code> \|
	\| <code>fast can boar run</code> \| <code>['A wild boar can run at speeds of 30-35mph which is about 48.3-56.3km/h. As for weight, a wild boar weighs around 52-91kg which is about 115-200 pounds. Wild boars are native to Europe, Africa, and some parts of Asia. The body of a wild boar is around 0.8-2 meters long which is about 2.6-6.6 feet long.', 'Wild Turkeys can run at speeds up to 25 mph, and they can fly up to 55 mph. However, if being hunted by someone for the Thanksgiving or Christmas table-Who know how fast the … y will run or fly!', 'A wild hog can reach speeds of up to 35 mph when running at full speed. A hippo can run over 30 mph! report this answer. Updated on Wednesday, February 01 2012 at 03:09PM EST. Source: www.texasboars.com/...', "Les. Brown bears-are extremely fast, capable of running in short bursts as high as of 40 mph (64 km/h). Polar bears-have been clocked at a top speed of 35 mph (56 km/h), along a a road in Churchill, Canada. Grizzly bears-can reach top speeds of up to 30 mph (48km/h), but they can't m...</code> \| <code>[1, 0, 0, 0, 0, ...]</code> \|
	\| <code>what plant would grow in shade</code> \| <code>['Hostas are among the showiest and easy-to-grow perennial plants that grow in shade. They also offer the most variety of any of the multiple shade plants. Choose from miniatures that stay only a couple of inches wide or giants that sprawl 6 feet across or more. Japanese forestgrass (Hakonechloa macra) is a wonderful grass for plants that grow in shade. It offers a lovely waterfall-like habit and variegated varieties have bight gold, yellow, or white in the foliage.', 'Lilyturf (Liriope) is an easy-to-grow favorite shade plant. Loved for its grassy foliage and spikes of blue or white flowers in late summer, as well as its resistance to deer and rabbits, lilyturf is practically a plant-it-and-forget garden resident. It grows best in Zones 5-10 and grows a foot tall. Japanese forestgrass (Hakonechloa macra) is a wonderful grass for plants that grow in shade. It offers a lovely waterfall-like habit and variegated varieties have bight gold, yellow, or white in the foliage.', "Gardening in ...</code> \| <code>[1, 1, 0, 0, 0, ...]</code> \|
	* Loss: [<code>ListNetLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listnetloss) with these parameters:
	```json
	{
	"eps": 1e-10,
	"pad_value": -1
	}
	```

	### Evaluation Dataset

	#### ms_marco

	* Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
	* Size: 82,326 evaluation samples
	* Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
	* Approximate statistics based on the first 1000 samples:
	\| \| query \| docs \| labels \|
	\|:--------\|:----------------------------------------------------------------------------------------------\|:------------------------------------\|:------------------------------------\|
	\| type \| string \| list \| list \|
	\| details \| <ul><li>min: 11 characters</li><li>mean: 33.6 characters</li><li>max: 97 characters</li></ul> \| <ul><li>size: 10 elements</li></ul> \| <ul><li>size: 10 elements</li></ul> \|
	* Samples:
	\| query \| docs \| labels \|
	\|:----------------------------------------------------------\|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|:----------------------------------\|
	\| <code>can blue cheese cause mold allergic reaction</code> \| <code>['Mold Allergy. The blue spots found in blue cheese are mold. If you’ve been diagnosed with a mold allergy, eating blue cheese can trigger common mold allergic reaction symptoms. Mold allergies commonly arise from airborne spores during the spring, summer and fall months. Inhaled mold spores cause inflammation in the eyes, throat and sinuses. If eating blue cheese causes inflammation to develop anywhere in your body, make an appointment with your doctor because you may have an allergy to one or more of its ingredients. Blue cheese contains two highly allergenic substances: milk and mold. Most symptoms caused by an allergic reaction are the result of inflammation in soft tissue in different parts of the body. Your doctor may recommend allergy testing to determine the cause of the inflammation', 'Blue cheese allergy is a condition that has puzzled food experts quite a bit. The unique gourmet cheese with a mottled appearance can cause your body to swell up making you feel extremely uncomf...</code> \| <code>[1, 0, 0, 0, 0, ...]</code> \|
	\| <code>what does it cost for a facebook ad</code> \| <code>['Contributed by Jason Alleger. The cost of Facebook ads depends on a few factors, but generally ranges from $.05 – $5 per click. Facebook increases the cost of ads based on (a) targeting, (b) bids and (c) engagement. The more targeted your ads are, the more expensive they become. If you were to target ads to all Facebook users (all 1.06 billion), then you would pay just pennies. Sponsored Stories: 400 clicks to Facebook page – $200 ($.50 per click). Promoted Posts: 20,000 views – $100 ($5 per 1,000 views). It takes a lot of work to keep the cost-per-click down, as the advertiser needs to constantly be updating their ads to keep the cost low.', 'Can anyone who has advertised on facebook describe how much it cost you overall? Also, is there anyone who can mention if facebook advertising (and the specific type of facebook ad-social ad/etc, age group) was positive or negative for them in their ventures? Best Answer: Setting up an ad account and advertising on Facebook is easy. You can do ...</code> \| <code>[1, 0, 0, 0, 0, ...]</code> \|
	\| <code>how can ants get in dishwasher</code> \| <code>["Full Answer. Ants usually find their way into a dishwasher through the dryer vents or the drain. Although most people's first reaction is to turn to pesticides to solve the problem, the chemicals contained in pesticides can be harmful for children and pets.", "No ants in the house. I've used traps on both sides of dishwasher and under the sink where the drain and supply holes are. We have put vinegar in the dishwasher drain & have let it sit there for three days and the ants still come back. They are only in side the dishwasher never on the counter ,floor, sink.", '1 Then leave them alone for a number of weeks. 2 Exterior: Sprinkle granular ant bait around ant hills, along ant trails; again, anywhere they appear. 3 Pets will not be injured by these baits. 4 The ants quickly take the bait below ground to the queen, destroying the colony.', "A: Empty the dishwasher completely, and pour 1 gallon of vinegar down the dishwasher's drain. Leave this for a few minutes so any ants appearin...</code> \| <code>[1, 0, 0, 0, 0, ...]</code> \|
	* Loss: [<code>ListNetLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listnetloss) with these parameters:
	```json
	{
	"eps": 1e-10,
	"pad_value": -1
	}
	```

	### Training Hyperparameters
	#### Non-Default Hyperparameters

	- `eval_strategy`: steps
	- `learning_rate`: 2e-05
	- `num_train_epochs`: 1
	- `warmup_ratio`: 0.1
	- `seed`: 12
	- `bf16`: True
	- `load_best_model_at_end`: True

	#### All Hyperparameters
	<details><summary>Click to expand</summary>

	- `overwrite_output_dir`: False
	- `do_predict`: False
	- `eval_strategy`: steps
	- `prediction_loss_only`: True
	- `per_device_train_batch_size`: 8
	- `per_device_eval_batch_size`: 8
	- `per_gpu_train_batch_size`: None
	- `per_gpu_eval_batch_size`: None
	- `gradient_accumulation_steps`: 1
	- `eval_accumulation_steps`: None
	- `torch_empty_cache_steps`: None
	- `learning_rate`: 2e-05
	- `weight_decay`: 0.0
	- `adam_beta1`: 0.9
	- `adam_beta2`: 0.999
	- `adam_epsilon`: 1e-08
	- `max_grad_norm`: 1.0
	- `num_train_epochs`: 1
	- `max_steps`: -1
	- `lr_scheduler_type`: linear
	- `lr_scheduler_kwargs`: {}
	- `warmup_ratio`: 0.1
	- `warmup_steps`: 0
	- `log_level`: passive
	- `log_level_replica`: warning
	- `log_on_each_node`: True
	- `logging_nan_inf_filter`: True
	- `save_safetensors`: True
	- `save_on_each_node`: False
	- `save_only_model`: False
	- `restore_callback_states_from_checkpoint`: False
	- `no_cuda`: False
	- `use_cpu`: False
	- `use_mps_device`: False
	- `seed`: 12
	- `data_seed`: None
	- `jit_mode_eval`: False
	- `use_ipex`: False
	- `bf16`: True
	- `fp16`: False
	- `fp16_opt_level`: O1
	- `half_precision_backend`: auto
	- `bf16_full_eval`: False
	- `fp16_full_eval`: False
	- `tf32`: None
	- `local_rank`: 0
	- `ddp_backend`: None
	- `tpu_num_cores`: None
	- `tpu_metrics_debug`: False
	- `debug`: []
	- `dataloader_drop_last`: False
	- `dataloader_num_workers`: 0
	- `dataloader_prefetch_factor`: None
	- `past_index`: -1
	- `disable_tqdm`: False
	- `remove_unused_columns`: True
	- `label_names`: None
	- `load_best_model_at_end`: True
	- `ignore_data_skip`: False
	- `fsdp`: []
	- `fsdp_min_num_params`: 0
	- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
	- `fsdp_transformer_layer_cls_to_wrap`: None
	- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
	- `deepspeed`: None
	- `label_smoothing_factor`: 0.0
	- `optim`: adamw_torch
	- `optim_args`: None
	- `adafactor`: False
	- `group_by_length`: False
	- `length_column_name`: length
	- `ddp_find_unused_parameters`: None
	- `ddp_bucket_cap_mb`: None
	- `ddp_broadcast_buffers`: False
	- `dataloader_pin_memory`: True
	- `dataloader_persistent_workers`: False
	- `skip_memory_metrics`: True
	- `use_legacy_prediction_loop`: False
	- `push_to_hub`: False
	- `resume_from_checkpoint`: None
	- `hub_model_id`: None
	- `hub_strategy`: every_save
	- `hub_private_repo`: None
	- `hub_always_push`: False
	- `gradient_checkpointing`: False
	- `gradient_checkpointing_kwargs`: None
	- `include_inputs_for_metrics`: False
	- `include_for_metrics`: []
	- `eval_do_concat_batches`: True
	- `fp16_backend`: auto
	- `push_to_hub_model_id`: None
	- `push_to_hub_organization`: None
	- `mp_parameters`:
	- `auto_find_batch_size`: False
	- `full_determinism`: False
	- `torchdynamo`: None
	- `ray_scope`: last
	- `ddp_timeout`: 1800
	- `torch_compile`: False
	- `torch_compile_backend`: None
	- `torch_compile_mode`: None
	- `dispatch_batches`: None
	- `split_batches`: None
	- `include_tokens_per_second`: False
	- `include_num_input_tokens_seen`: False
	- `neftune_noise_alpha`: None
	- `optim_target_modules`: None
	- `batch_eval_metrics`: False
	- `eval_on_start`: False
	- `use_liger_kernel`: False
	- `eval_use_gather_object`: False
	- `average_tokens_across_devices`: False
	- `prompts`: None
	- `batch_sampler`: batch_sampler
	- `multi_dataset_batch_sampler`: proportional

	</details>

	### Training Logs
	\| Epoch \| Step \| Training Loss \| Validation Loss \| NanoMSMARCO_ndcg@10 \| NanoNFCorpus_ndcg@10 \| NanoNQ_ndcg@10 \| NanoBEIR_mean_ndcg@10 \|
	\|:----------:\|:--------:\|:-------------:\|:---------------:\|:--------------------:\|:--------------------:\|:--------------------:\|:---------------------:\|
	\| -1 \| -1 \| - \| - \| 0.0444 (-0.4960) \| 0.2663 (-0.0587) \| 0.0478 (-0.4528) \| 0.1195 (-0.3359) \|
	\| 0.0001 \| 1 \| 2.0806 \| - \| - \| - \| - \| - \|
	\| 0.0230 \| 200 \| 2.0875 \| - \| - \| - \| - \| - \|
	\| 0.0459 \| 400 \| 2.097 \| - \| - \| - \| - \| - \|
	\| 0.0689 \| 600 \| 2.0844 \| - \| - \| - \| - \| - \|
	\| 0.0918 \| 800 \| 2.0771 \| - \| - \| - \| - \| - \|
	\| 0.1148 \| 1000 \| 2.0699 \| - \| - \| - \| - \| - \|
	\| 0.1377 \| 1200 \| 2.0864 \| - \| - \| - \| - \| - \|
	\| 0.1607 \| 1400 \| 2.0676 \| - \| - \| - \| - \| - \|
	\| 0.1836 \| 1600 \| 2.0772 \| 2.0761 \| 0.5280 (-0.0125) \| 0.3529 (+0.0279) \| 0.5989 (+0.0983) \| 0.4933 (+0.0379) \|
	\| 0.2066 \| 1800 \| 2.0822 \| - \| - \| - \| - \| - \|
	\| 0.2295 \| 2000 \| 2.0777 \| - \| - \| - \| - \| - \|
	\| 0.2525 \| 2200 \| 2.075 \| - \| - \| - \| - \| - \|
	\| 0.2755 \| 2400 \| 2.0717 \| - \| - \| - \| - \| - \|
	\| 0.2984 \| 2600 \| 2.0854 \| - \| - \| - \| - \| - \|
	\| 0.3214 \| 2800 \| 2.0765 \| - \| - \| - \| - \| - \|
	\| 0.3443 \| 3000 \| 2.0678 \| - \| - \| - \| - \| - \|
	\| 0.3673 \| 3200 \| 2.076 \| 2.0741 \| 0.5368 (-0.0037) \| 0.3781 (+0.0531) \| 0.5847 (+0.0841) \| 0.4999 (+0.0445) \|
	\| 0.3902 \| 3400 \| 2.0749 \| - \| - \| - \| - \| - \|
	\| 0.4132 \| 3600 \| 2.0735 \| - \| - \| - \| - \| - \|
	\| 0.4361 \| 3800 \| 2.0636 \| - \| - \| - \| - \| - \|
	\| 0.4591 \| 4000 \| 2.0749 \| - \| - \| - \| - \| - \|
	\| 0.4820 \| 4200 \| 2.0745 \| - \| - \| - \| - \| - \|
	\| 0.5050 \| 4400 \| 2.0716 \| - \| - \| - \| - \| - \|
	\| 0.5279 \| 4600 \| 2.0741 \| - \| - \| - \| - \| - \|
	\| 0.5509 \| 4800 \| 2.0724 \| 2.0735 \| 0.5633 (+0.0229) \| 0.3703 (+0.0453) \| 0.6102 (+0.1095) \| 0.5146 (+0.0592) \|
	\| 0.5739 \| 5000 \| 2.0788 \| - \| - \| - \| - \| - \|
	\| 0.5968 \| 5200 \| 2.0711 \| - \| - \| - \| - \| - \|
	\| 0.6198 \| 5400 \| 2.0708 \| - \| - \| - \| - \| - \|
	\| 0.6427 \| 5600 \| 2.0645 \| - \| - \| - \| - \| - \|
	\| 0.6657 \| 5800 \| 2.0684 \| - \| - \| - \| - \| - \|
	\| 0.6886 \| 6000 \| 2.0731 \| - \| - \| - \| - \| - \|
	\| 0.7116 \| 6200 \| 2.0745 \| - \| - \| - \| - \| - \|
	\| 0.7345 \| 6400 \| 2.067 \| 2.0722 \| 0.5510 (+0.0105) \| 0.3441 (+0.0190) \| 0.5927 (+0.0921) \| 0.4959 (+0.0405) \|
	\| 0.7575 \| 6600 \| 2.0657 \| - \| - \| - \| - \| - \|
	\| 0.7804 \| 6800 \| 2.0798 \| - \| - \| - \| - \| - \|
	\| 0.8034 \| 7000 \| 2.0693 \| - \| - \| - \| - \| - \|
	\| 0.8264 \| 7200 \| 2.074 \| - \| - \| - \| - \| - \|
	\| 0.8493 \| 7400 \| 2.0744 \| - \| - \| - \| - \| - \|
	\| 0.8723 \| 7600 \| 2.0688 \| - \| - \| - \| - \| - \|
	\| 0.8952 \| 7800 \| 2.0515 \| - \| - \| - \| - \| - \|
	\| 0.9182 \| 8000 \| 2.0765 \| 2.0723 \| 0.5545 (+0.0141) \| 0.3595 (+0.0345) \| 0.6487 (+0.1481) \| 0.5209 (+0.0655) \|
	\| 0.9411 \| 8200 \| 2.0777 \| - \| - \| - \| - \| - \|
	\| 0.9641 \| 8400 \| 2.073 \| - \| - \| - \| - \| - \|
	\| 0.9870 \| 8600 \| 2.0726 \| - \| - \| - \| - \| - \|
	\| -1 \| -1 \| - \| - \| 0.5545 (+0.0141) \| 0.3595 (+0.0345) \| 0.6487 (+0.1481) \| 0.5209 (+0.0655) \|

	* The bold row denotes the saved checkpoint.

	### Environmental Impact
	Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
	- Energy Consumed: 0.236 kWh
	- Carbon Emitted: 0.092 kg of CO2
	- Hours Used: 0.862 hours

	### Training Hardware
	- On Cloud: No
	- GPU Model: 1 x NVIDIA GeForce RTX 3090
	- CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
	- RAM Size: 31.78 GB

	### Framework Versions
	- Python: 3.11.6
	- Sentence Transformers: 3.5.0.dev0
	- Transformers: 4.48.3
	- PyTorch: 2.5.0+cu121
	- Accelerate: 1.3.0
	- Datasets: 2.20.0
	- Tokenizers: 0.21.0

	## Citation

	### BibTeX

	#### Sentence Transformers
	```bibtex
	@inproceedings{reimers-2019-sentence-bert,
	title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
	author = "Reimers, Nils and Gurevych, Iryna",
	booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
	month = "11",
	year = "2019",
	publisher = "Association for Computational Linguistics",
	url = "https://arxiv.org/abs/1908.10084",
	}
	```

	#### ListNetLoss
	```bibtex
	@inproceedings{cao2007learning,
	title={Learning to rank: from pairwise approach to listwise approach},
	author={Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang},
	booktitle={Proceedings of the 24th international conference on Machine learning},
	pages={129--136},
	year={2007}
	}
	```

	<!--
	## Glossary

	Clearly define terms in order to be accessible across audiences.
	-->

	<!--
	## Model Card Authors

	Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.
	-->

	<!--
	## Model Card Contact

	Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.
	-->