--- tags: - sentence-transformers - cross-encoder - text-classification - generated_from_trainer - dataset_size:2000000 - loss:FitMixinLoss base_model: microsoft/MiniLM-L12-H384-uncased pipeline_tag: text-classification library_name: sentence-transformers metrics: - map - mrr@10 - ndcg@10 co2_eq_emissions: emissions: 177.53259839783433 energy_consumed: 0.45673188817612037 source: codecarbon training_type: fine-tuning on_cloud: false cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K ram_total_size: 31.777088165283203 hours_used: 1.209 hardware_used: 1 x NVIDIA GeForce RTX 3090 model-index: - name: CrossEncoder based on microsoft/MiniLM-L12-H384-uncased results: - task: type: cross-encoder-reranking name: Cross Encoder Reranking dataset: name: train eval type: train-eval metrics: - type: map value: 0.6582197480100053 name: Map - type: mrr@10 value: 0.6556428571428572 name: Mrr@10 - type: ndcg@10 value: 0.7120903253175089 name: Ndcg@10 --- # CrossEncoder based on microsoft/MiniLM-L12-H384-uncased This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Cross Encoder - **Base model:** [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) - **Maximum Sequence Length:** 512 tokens - **Number of Output Labels:** 1 label ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder) ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import CrossEncoder # Download from the 🤗 Hub model = CrossEncoder("sentence_transformers_model_id") # Get scores for pairs of texts pairs = [ ['enrollment statistics at southern arkansas university', 'The University of Southern Malawi also known as the Malawi University of Science and Technology(MUST) [edit]. The Malawi University of Science and Technology was established on 17th December 2012 by the Malawi University of Science and Technology Act No. 31 of 2012 as the fourth Public University in Malawi.'], ['burgos is in what province spain', 'The province of Burgos is a province of northern Spain, in the northeastern part of the autonomous community of Castile and Leon. León it is bordered by the provinces Of, Palencia, Cantabria, Ã\x81lava, Alava álava, La, Rioja, soria Segovia. And valladolid its capital is the City. of burgoshe province of Burgos is divided into 371 municipalities, being the Spanish province with the highest number, although many of them have fewer than 100 inhabitants.'], ['most important customer service skills', 'Customer Service Skill #1: Empathy. Empathy gets thrown around a lot in support training, and for good reason: it might be the single most important customer service skill to develop. To help your customers be happy and successful, itâ\x80\x99s important to understand what happiness and success mean to them.'], ['what happens if we eat too many carbohydrates', 'What Happens If You Eat Too Many Carbs? We all know the feeling you get after eating a large bowl of pasta. Your stomach swells up and you feel like you just gained 10 pounds. Surprisingly carbohydrates are a very important fuel source for your body. Without them it would be hard to have any energy throughout the day. Even though there are risks to consuming no carbs at all, there are also risks to consuming too much! See the article below where we talk about what could happen if you eat too many carbs. You Will Gain Body Fat Sorry to say this but â\x80\x9cyesâ\x80\x9d if you consume too many carbs than you will gain body fat. This isnâ\x80\x99t all that bad though when it comes to building muscle that is. You need to be eating lots of calories throughout the day in order to spark muscle growth. Carbohydrates just happen to have a lot of calories in them.'], ['what county is wharton nj in', 'Sponsored Topics. Wharton is a Borough in Morris County, New Jersey, United States. As of the 2000 United States Census, the borough population was 6,298.'], ] scores = model.predict(pairs) print(scores.shape) # (5,) # Or rank different texts based on similarity to a single text ranks = model.rank( 'enrollment statistics at southern arkansas university', [ 'The University of Southern Malawi also known as the Malawi University of Science and Technology(MUST) [edit]. The Malawi University of Science and Technology was established on 17th December 2012 by the Malawi University of Science and Technology Act No. 31 of 2012 as the fourth Public University in Malawi.', 'The province of Burgos is a province of northern Spain, in the northeastern part of the autonomous community of Castile and Leon. León it is bordered by the provinces Of, Palencia, Cantabria, Ã\x81lava, Alava álava, La, Rioja, soria Segovia. And valladolid its capital is the City. of burgoshe province of Burgos is divided into 371 municipalities, being the Spanish province with the highest number, although many of them have fewer than 100 inhabitants.', 'Customer Service Skill #1: Empathy. Empathy gets thrown around a lot in support training, and for good reason: it might be the single most important customer service skill to develop. To help your customers be happy and successful, itâ\x80\x99s important to understand what happiness and success mean to them.', 'What Happens If You Eat Too Many Carbs? We all know the feeling you get after eating a large bowl of pasta. Your stomach swells up and you feel like you just gained 10 pounds. Surprisingly carbohydrates are a very important fuel source for your body. Without them it would be hard to have any energy throughout the day. Even though there are risks to consuming no carbs at all, there are also risks to consuming too much! See the article below where we talk about what could happen if you eat too many carbs. You Will Gain Body Fat Sorry to say this but â\x80\x9cyesâ\x80\x9d if you consume too many carbs than you will gain body fat. This isnâ\x80\x99t all that bad though when it comes to building muscle that is. You need to be eating lots of calories throughout the day in order to spark muscle growth. Carbohydrates just happen to have a lot of calories in them.', 'Sponsored Topics. Wharton is a Borough in Morris County, New Jersey, United States. As of the 2000 United States Census, the borough population was 6,298.', ] ) # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...] ``` ## Evaluation ### Metrics #### Cross Encoder Reranking * Datasets: `train-eval`, `NanoMSMARCO`, `NanoNFCorpus` and `NanoNQ` * Evaluated with [CERerankingEvaluator](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CERerankingEvaluator) | Metric | train-eval | NanoMSMARCO | NanoNFCorpus | NanoNQ | |:------------|:-----------|:---------------------|:---------------------|:---------------------| | map | 0.6582 | 0.6058 (+0.1162) | 0.3384 (+0.0680) | 0.6984 (+0.2778) | | mrr@10 | 0.6556 | 0.5982 (+0.1207) | 0.5367 (+0.0368) | 0.7111 (+0.2844) | | **ndcg@10** | **0.7121** | **0.6699 (+0.1294)** | **0.3760 (+0.0510)** | **0.7469 (+0.2462)** | #### Cross Encoder Nano BEIR * Dataset: `NanoBEIR_mean` * Evaluated with [CENanoBEIREvaluator](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CENanoBEIREvaluator) | Metric | Value | |:------------|:---------------------| | map | 0.5476 (+0.1540) | | mrr@10 | 0.6153 (+0.1473) | | **ndcg@10** | **0.5976 (+0.1422)** | ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 2,000,000 training samples * Columns: sentence_0, sentence_1, and label * Approximate statistics based on the first 1000 samples: | | sentence_0 | sentence_1 | label | |:--------|:-----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:------------------------------------------------| | type | string | string | int | | details | | | | * Samples: | sentence_0 | sentence_1 | label | |:-------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------| | enrollment statistics at southern arkansas university | The University of Southern Malawi also known as the Malawi University of Science and Technology(MUST) [edit]. The Malawi University of Science and Technology was established on 17th December 2012 by the Malawi University of Science and Technology Act No. 31 of 2012 as the fourth Public University in Malawi. | 0 | | burgos is in what province spain | The province of Burgos is a province of northern Spain, in the northeastern part of the autonomous community of Castile and Leon. León it is bordered by the provinces Of, Palencia, Cantabria, Álava, Alava álava, La, Rioja, soria Segovia. And valladolid its capital is the City. of burgoshe province of Burgos is divided into 371 municipalities, being the Spanish province with the highest number, although many of them have fewer than 100 inhabitants. | 1 | | most important customer service skills | Customer Service Skill #1: Empathy. Empathy gets thrown around a lot in support training, and for good reason: it might be the single most important customer service skill to develop. To help your customers be happy and successful, it’s important to understand what happiness and success mean to them. | 1 | * Loss: [FitMixinLoss](https://sbert.net/docs/package_reference/cross_encoder/losses.html#fitmixinloss) ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: steps - `per_device_train_batch_size`: 64 - `per_device_eval_batch_size`: 64 - `num_train_epochs`: 1 - `fp16`: True #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: steps - `prediction_loss_only`: True - `per_device_train_batch_size`: 64 - `per_device_eval_batch_size`: 64 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 5e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1 - `num_train_epochs`: 1 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.0 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: True - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: False - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: None - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `include_for_metrics`: [] - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `eval_use_gather_object`: False - `average_tokens_across_devices`: False - `prompts`: None - `batch_sampler`: batch_sampler - `multi_dataset_batch_sampler`: proportional
### Training Logs | Epoch | Step | Training Loss | train-eval_ndcg@10 | NanoMSMARCO_ndcg@10 | NanoNFCorpus_ndcg@10 | NanoNQ_ndcg@10 | NanoBEIR_mean_ndcg@10 | |:-----:|:-----:|:-------------:|:------------------:|:-------------------:|:--------------------:|:----------------:|:---------------------:| | -1 | -1 | - | 0.0488 | 0.0971 (-0.4433) | 0.2449 (-0.0802) | 0.0508 (-0.4498) | 0.1310 (-0.3244) | | 0.016 | 500 | 1.1004 | - | - | - | - | - | | 0.032 | 1000 | 0.7746 | - | - | - | - | - | | 0.048 | 1500 | 0.543 | - | - | - | - | - | | 0.064 | 2000 | 0.4508 | - | - | - | - | - | | 0.08 | 2500 | 0.4112 | - | - | - | - | - | | 0.096 | 3000 | 0.3949 | - | - | - | - | - | | 0.112 | 3500 | 0.3793 | - | - | - | - | - | | 0.128 | 4000 | 0.3584 | - | - | - | - | - | | 0.144 | 4500 | 0.3725 | - | - | - | - | - | | 0.16 | 5000 | 0.358 | 0.6634 | 0.6343 (+0.0939) | 0.3986 (+0.0735) | 0.7085 (+0.2078) | 0.5805 (+0.1251) | | 0.176 | 5500 | 0.3442 | - | - | - | - | - | | 0.192 | 6000 | 0.3355 | - | - | - | - | - | | 0.208 | 6500 | 0.3423 | - | - | - | - | - | | 0.224 | 7000 | 0.3253 | - | - | - | - | - | | 0.24 | 7500 | 0.3256 | - | - | - | - | - | | 0.256 | 8000 | 0.3231 | - | - | - | - | - | | 0.272 | 8500 | 0.3218 | - | - | - | - | - | | 0.288 | 9000 | 0.3119 | - | - | - | - | - | | 0.304 | 9500 | 0.3056 | - | - | - | - | - | | 0.32 | 10000 | 0.3125 | 0.6861 | 0.6423 (+0.1019) | 0.4197 (+0.0947) | 0.7333 (+0.2327) | 0.5985 (+0.1431) | | 0.336 | 10500 | 0.3 | - | - | - | - | - | | 0.352 | 11000 | 0.305 | - | - | - | - | - | | 0.368 | 11500 | 0.3088 | - | - | - | - | - | | 0.384 | 12000 | 0.2963 | - | - | - | - | - | | 0.4 | 12500 | 0.3068 | - | - | - | - | - | | 0.416 | 13000 | 0.299 | - | - | - | - | - | | 0.432 | 13500 | 0.2962 | - | - | - | - | - | | 0.448 | 14000 | 0.2942 | - | - | - | - | - | | 0.464 | 14500 | 0.2969 | - | - | - | - | - | | 0.48 | 15000 | 0.2956 | 0.6964 | 0.6397 (+0.0993) | 0.3773 (+0.0523) | 0.7140 (+0.2134) | 0.5770 (+0.1216) | | 0.496 | 15500 | 0.2928 | - | - | - | - | - | | 0.512 | 16000 | 0.2829 | - | - | - | - | - | | 0.528 | 16500 | 0.2794 | - | - | - | - | - | | 0.544 | 17000 | 0.2818 | - | - | - | - | - | | 0.56 | 17500 | 0.2843 | - | - | - | - | - | | 0.576 | 18000 | 0.2858 | - | - | - | - | - | | 0.592 | 18500 | 0.2801 | - | - | - | - | - | | 0.608 | 19000 | 0.2902 | - | - | - | - | - | | 0.624 | 19500 | 0.2768 | - | - | - | - | - | | 0.64 | 20000 | 0.2768 | 0.6963 | 0.6456 (+0.1052) | 0.3820 (+0.0570) | 0.7230 (+0.2224) | 0.5835 (+0.1282) | | 0.656 | 20500 | 0.2744 | - | - | - | - | - | | 0.672 | 21000 | 0.2753 | - | - | - | - | - | | 0.688 | 21500 | 0.2632 | - | - | - | - | - | | 0.704 | 22000 | 0.2818 | - | - | - | - | - | | 0.72 | 22500 | 0.2668 | - | - | - | - | - | | 0.736 | 23000 | 0.2673 | - | - | - | - | - | | 0.752 | 23500 | 0.2663 | - | - | - | - | - | | 0.768 | 24000 | 0.2612 | - | - | - | - | - | | 0.784 | 24500 | 0.2655 | - | - | - | - | - | | 0.8 | 25000 | 0.2592 | 0.7070 | 0.6614 (+0.1210) | 0.3803 (+0.0552) | 0.7482 (+0.2476) | 0.5966 (+0.1412) | | 0.816 | 25500 | 0.2661 | - | - | - | - | - | | 0.832 | 26000 | 0.2568 | - | - | - | - | - | | 0.848 | 26500 | 0.2651 | - | - | - | - | - | | 0.864 | 27000 | 0.2577 | - | - | - | - | - | | 0.88 | 27500 | 0.2579 | - | - | - | - | - | | 0.896 | 28000 | 0.2552 | - | - | - | - | - | | 0.912 | 28500 | 0.2531 | - | - | - | - | - | | 0.928 | 29000 | 0.255 | - | - | - | - | - | | 0.944 | 29500 | 0.2565 | - | - | - | - | - | | 0.96 | 30000 | 0.2534 | 0.7150 | 0.6647 (+0.1243) | 0.3745 (+0.0495) | 0.7479 (+0.2472) | 0.5957 (+0.1403) | | 0.976 | 30500 | 0.2508 | - | - | - | - | - | | 0.992 | 31000 | 0.2459 | - | - | - | - | - | | 1.0 | 31250 | - | 0.7121 | 0.6699 (+0.1294) | 0.3760 (+0.0510) | 0.7469 (+0.2462) | 0.5976 (+0.1422) | ### Environmental Impact Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon). - **Energy Consumed**: 0.457 kWh - **Carbon Emitted**: 0.178 kg of CO2 - **Hours Used**: 1.209 hours ### Training Hardware - **On Cloud**: No - **GPU Model**: 1 x NVIDIA GeForce RTX 3090 - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K - **RAM Size**: 31.78 GB ### Framework Versions - Python: 3.11.6 - Sentence Transformers: 3.5.0.dev0 - Transformers: 4.48.3 - PyTorch: 2.5.0+cu121 - Accelerate: 1.3.0 - Datasets: 2.20.0 - Tokenizers: 0.21.0 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ```