--- base_model: mixedbread-ai/mxbai-embed-large-v1 datasets: [] language: - en library_name: sentence-transformers license: apache-2.0 metrics: - cosine_accuracy@1 - cosine_accuracy@3 - cosine_accuracy@5 - cosine_accuracy@10 - cosine_precision@1 - cosine_precision@3 - cosine_precision@5 - cosine_precision@10 - cosine_recall@1 - cosine_recall@3 - cosine_recall@5 - cosine_recall@10 - cosine_ndcg@10 - cosine_mrr@10 - cosine_map@100 pipeline_tag: sentence-similarity tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:580 - loss:MatryoshkaLoss - loss:MultipleNegativesRankingLoss widget: - source_sentence: In response to hypothetical economic scenarios presented by the Federal Reserve, Wells Fargo formulated a capital action plan. This was done as a part of the CCAR (Comprehensive Capital Analysis and Review) process. The scenarios tested included a hypothetical severe global recession which, at its most stressful point, reduces our Pre-Provision Net Revenue (PPNR) to negative levels for four consecutive quarters. sentences: - What is the proposed dividend per share for the shareholders of Apple Inc. for the financial year ending in 2023? - What steps has Wells Fargo undertaken to sustain in the event of a severe global recession? - What was the total net income for Intel in 2021? - source_sentence: Microsoft Corporation has been paying consistent dividends to its shareholders on a quarterly basis. The company's Board of Directors reviews the dividend policy on a regular basis and plans to continue paying quarterly dividends, subject to capital availability and financial conditions sentences: - What did Amazon.com, Inc. anticipate regarding its free cash flows in the future? - What is Tesla's outlook for 2024 in terms of vehicle production? - What is Microsoft Corporation's dividend policy? - source_sentence: In the second quarter of 2023, Tesla's automotive revenue increased by 58% compared to the same period previous year. These results were primarily driven by increased vehicle deliveries and expansion in the China market. sentences: - What action did the Federal Reserve take to address the inflation surge in 2027? - What revenue did Apple Inc. report in the first quarter of 2021? - How did Tesla's automotive revenue perform in the second quarter of 2023? - source_sentence: Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It's primarily known for designing and manufacturing semiconductors and various technology solutions, including processors for computer systems and servers, integrated digital technology platforms, and system-on-chip units for gateways. sentences: - What is Intel's main area of business? - What was the revenue growth percentage of Amazon in the second quarter of 2024? - How much capital expenditure did Amazon.com report in 2025? - source_sentence: In 2023, EnergyCorp declared a dividend of $2.5 per share. sentences: - How did Amazon’s shift to one-day prime delivery affect its operational costs in 2023? - What dividend did the EnergyCorp pay to its shareholders in 2023? - What was the profit margin of Airbus in the year 2025? model-index: - name: Bmixedbread-ai/mxbai-embed-large-v1 Financial Matryoshka results: - task: type: information-retrieval name: Information Retrieval dataset: name: dim 1024 type: dim_1024 metrics: - type: cosine_accuracy@1 value: 0.8923076923076924 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.9692307692307692 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.9692307692307692 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.9846153846153847 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.8923076923076924 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.32307692307692304 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.1938461538461538 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09846153846153843 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.8923076923076924 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.9692307692307692 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.9692307692307692 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.9846153846153847 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.941940347600734 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.927838827838828 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.928083028083028 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 768 type: dim_768 metrics: - type: cosine_accuracy@1 value: 0.8923076923076924 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.9692307692307692 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.9692307692307692 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.9846153846153847 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.8923076923076924 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.32307692307692304 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.1938461538461538 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09846153846153843 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.8923076923076924 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.9692307692307692 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.9692307692307692 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.9846153846153847 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.9422922530434215 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.9282051282051282 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.9284418145956608 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 512 type: dim_512 metrics: - type: cosine_accuracy@1 value: 0.8923076923076924 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.9692307692307692 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.9692307692307692 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.9846153846153847 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.8923076923076924 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.32307692307692304 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.1938461538461538 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09846153846153843 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.8923076923076924 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.9692307692307692 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.9692307692307692 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.9846153846153847 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.941940347600734 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.927838827838828 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.928113553113553 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 256 type: dim_256 metrics: - type: cosine_accuracy@1 value: 0.8923076923076924 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.9692307692307692 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.9692307692307692 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.9846153846153847 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.8923076923076924 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.32307692307692304 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.1938461538461538 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09846153846153843 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.8923076923076924 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.9692307692307692 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.9692307692307692 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.9846153846153847 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.9416654482692324 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.9275641025641026 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.9278846153846154 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 128 type: dim_128 metrics: - type: cosine_accuracy@1 value: 0.8461538461538461 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.9538461538461539 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.9692307692307692 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.9846153846153847 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.8461538461538461 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.31794871794871793 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.1938461538461538 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09846153846153843 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.8461538461538461 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.9538461538461539 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.9692307692307692 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.9846153846153847 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.9221774232775186 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.9012820512820513 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.9016398330351819 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 64 type: dim_64 metrics: - type: cosine_accuracy@1 value: 0.8153846153846154 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.9692307692307692 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.9846153846153847 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.9846153846153847 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.8153846153846154 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.32307692307692304 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.19692307692307687 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09846153846153843 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.8153846153846154 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.9692307692307692 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.9846153846153847 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.9846153846153847 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.9123594012651499 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.8876923076923079 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.8879622132253712 name: Cosine Map@100 --- # Bmixedbread-ai/mxbai-embed-large-v1 Financial Matryoshka This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [mixedbread-ai/mxbai-embed-large-v1](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [mixedbread-ai/mxbai-embed-large-v1](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 1024 tokens - **Similarity Function:** Cosine Similarity - **Language:** en - **License:** apache-2.0 ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("rbhatia46/mxbai-embed-large-v1-financial-rag-matryoshka") # Run inference sentences = [ 'In 2023, EnergyCorp declared a dividend of $2.5 per share.', 'What dividend did the EnergyCorp pay to its shareholders in 2023?', 'How did Amazon’s shift to one-day prime delivery affect its operational costs in 2023?', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 1024] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Information Retrieval * Dataset: `dim_1024` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.8923 | | cosine_accuracy@3 | 0.9692 | | cosine_accuracy@5 | 0.9692 | | cosine_accuracy@10 | 0.9846 | | cosine_precision@1 | 0.8923 | | cosine_precision@3 | 0.3231 | | cosine_precision@5 | 0.1938 | | cosine_precision@10 | 0.0985 | | cosine_recall@1 | 0.8923 | | cosine_recall@3 | 0.9692 | | cosine_recall@5 | 0.9692 | | cosine_recall@10 | 0.9846 | | cosine_ndcg@10 | 0.9419 | | cosine_mrr@10 | 0.9278 | | **cosine_map@100** | **0.9281** | #### Information Retrieval * Dataset: `dim_768` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.8923 | | cosine_accuracy@3 | 0.9692 | | cosine_accuracy@5 | 0.9692 | | cosine_accuracy@10 | 0.9846 | | cosine_precision@1 | 0.8923 | | cosine_precision@3 | 0.3231 | | cosine_precision@5 | 0.1938 | | cosine_precision@10 | 0.0985 | | cosine_recall@1 | 0.8923 | | cosine_recall@3 | 0.9692 | | cosine_recall@5 | 0.9692 | | cosine_recall@10 | 0.9846 | | cosine_ndcg@10 | 0.9423 | | cosine_mrr@10 | 0.9282 | | **cosine_map@100** | **0.9284** | #### Information Retrieval * Dataset: `dim_512` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.8923 | | cosine_accuracy@3 | 0.9692 | | cosine_accuracy@5 | 0.9692 | | cosine_accuracy@10 | 0.9846 | | cosine_precision@1 | 0.8923 | | cosine_precision@3 | 0.3231 | | cosine_precision@5 | 0.1938 | | cosine_precision@10 | 0.0985 | | cosine_recall@1 | 0.8923 | | cosine_recall@3 | 0.9692 | | cosine_recall@5 | 0.9692 | | cosine_recall@10 | 0.9846 | | cosine_ndcg@10 | 0.9419 | | cosine_mrr@10 | 0.9278 | | **cosine_map@100** | **0.9281** | #### Information Retrieval * Dataset: `dim_256` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.8923 | | cosine_accuracy@3 | 0.9692 | | cosine_accuracy@5 | 0.9692 | | cosine_accuracy@10 | 0.9846 | | cosine_precision@1 | 0.8923 | | cosine_precision@3 | 0.3231 | | cosine_precision@5 | 0.1938 | | cosine_precision@10 | 0.0985 | | cosine_recall@1 | 0.8923 | | cosine_recall@3 | 0.9692 | | cosine_recall@5 | 0.9692 | | cosine_recall@10 | 0.9846 | | cosine_ndcg@10 | 0.9417 | | cosine_mrr@10 | 0.9276 | | **cosine_map@100** | **0.9279** | #### Information Retrieval * Dataset: `dim_128` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.8462 | | cosine_accuracy@3 | 0.9538 | | cosine_accuracy@5 | 0.9692 | | cosine_accuracy@10 | 0.9846 | | cosine_precision@1 | 0.8462 | | cosine_precision@3 | 0.3179 | | cosine_precision@5 | 0.1938 | | cosine_precision@10 | 0.0985 | | cosine_recall@1 | 0.8462 | | cosine_recall@3 | 0.9538 | | cosine_recall@5 | 0.9692 | | cosine_recall@10 | 0.9846 | | cosine_ndcg@10 | 0.9222 | | cosine_mrr@10 | 0.9013 | | **cosine_map@100** | **0.9016** | #### Information Retrieval * Dataset: `dim_64` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:----------| | cosine_accuracy@1 | 0.8154 | | cosine_accuracy@3 | 0.9692 | | cosine_accuracy@5 | 0.9846 | | cosine_accuracy@10 | 0.9846 | | cosine_precision@1 | 0.8154 | | cosine_precision@3 | 0.3231 | | cosine_precision@5 | 0.1969 | | cosine_precision@10 | 0.0985 | | cosine_recall@1 | 0.8154 | | cosine_recall@3 | 0.9692 | | cosine_recall@5 | 0.9846 | | cosine_recall@10 | 0.9846 | | cosine_ndcg@10 | 0.9124 | | cosine_mrr@10 | 0.8877 | | **cosine_map@100** | **0.888** | ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 580 training samples * Columns: positive and anchor * Approximate statistics based on the first 1000 samples: | | positive | anchor | |:--------|:-----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | positive | anchor | |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------| | For the fiscal year 2020, Microsoft Corporation reported a net income of $44.3 billion, showing a 13% increase from the previous year. | What was the net income of Microsoft Corporation for the fiscal year 2020? | | As of the latest financial report, Amazon has a current price to earnings ratio (P/E ratio) of 76.6. | What is Amazon's current P/E ratio according to their latest financial report? | | Microsoft Corporation posted an EBITDA (Earnings Before Interest, Taxes, Depreciation, and Amortization) margin of approximately 47% in 2021, showcasing strong profitability. | What was Microsoft Corporation's EBITDA margin in 2021? | * Loss: [MatryoshkaLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: ```json { "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: epoch - `per_device_train_batch_size`: 32 - `per_device_eval_batch_size`: 16 - `gradient_accumulation_steps`: 16 - `learning_rate`: 2e-05 - `num_train_epochs`: 4 - `lr_scheduler_type`: cosine - `warmup_ratio`: 0.1 - `bf16`: True - `tf32`: True - `load_best_model_at_end`: True - `optim`: adamw_torch_fused - `batch_sampler`: no_duplicates #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: epoch - `prediction_loss_only`: True - `per_device_train_batch_size`: 32 - `per_device_eval_batch_size`: 16 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 16 - `eval_accumulation_steps`: None - `learning_rate`: 2e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 4 - `max_steps`: -1 - `lr_scheduler_type`: cosine - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.1 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: True - `fp16`: False - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: True - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: True - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch_fused - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: False - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `batch_sampler`: no_duplicates - `multi_dataset_batch_sampler`: proportional
### Training Logs | Epoch | Step | dim_1024_cosine_map@100 | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_64_cosine_map@100 | dim_768_cosine_map@100 | |:----------:|:-----:|:-----------------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|:----------------------:| | 0.8421 | 1 | 0.9032 | 0.8846 | 0.9033 | 0.9109 | 0.8695 | 0.9186 | | 1.6842 | 2 | 0.9121 | 0.8948 | 0.9174 | 0.9199 | 0.8777 | 0.9198 | | 2.5263 | 3 | 0.9281 | 0.9013 | 0.9202 | 0.9281 | 0.8879 | 0.9204 | | **3.3684** | **4** | **0.9281** | **0.9016** | **0.9279** | **0.9281** | **0.888** | **0.9284** | * The bold row denotes the saved checkpoint. ### Framework Versions - Python: 3.10.6 - Sentence Transformers: 3.0.1 - Transformers: 4.41.2 - PyTorch: 2.1.2+cu121 - Accelerate: 0.31.0 - Datasets: 2.19.1 - Tokenizers: 0.19.1 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MatryoshkaLoss ```bibtex @misc{kusupati2024matryoshka, title={Matryoshka Representation Learning}, author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, year={2024}, eprint={2205.13147}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```