SentenceTransformer based on mixedbread-ai/mxbai-embed-large-v1
This is a sentence-transformers model finetuned from mixedbread-ai/mxbai-embed-large-v1. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: mixedbread-ai/mxbai-embed-large-v1
- Maximum Sequence Length: 128 tokens
- Output Dimensionality: 1024 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Daxtra/sbert-summaries-mxbai-24-batch")
# Run inference
sentences = [
'- Content Writer position for crafting engaging long-form content across various topics, requiring a 2:1 Degree in English History or a similar field.\n- Create captivating articles, features, and content that is informative and resonates with the target audience, utilizing research skills to ensure accuracy.\n- Manage multiple projects with strong organizational and time-management skills, ensuring clarity and accuracy in writing.\n- Excellent proofreading and editing skills to produce flawless content, with a diplomatic approach to client relationships and communication.\n- Collaborate with clients to transform visions into engaging content and conduct interviews to uncover compelling stories.\n- Must have a keen eye for detail and a passion for long-form storytelling and content orchestration.',
'- Award-nominated journalist with over 12 years of experience in journalism, content writing, and SEO.\n- Currently a freelance journalist, specializing in various sectors including health, finance, and technology.\n- Freelance for publications like The Independent, CNN, and SELF, with notable success in SEO and content creation.\n- Strong skills in SEO, content writing, and copywriting, with a track record of high engagement and click-through rates.\n- Holds a Masters of Arts in Dramaturgy and Writing, and a BA in Media Journalism and Communications.\n- Proficient in WordPress, Umbraco, and Shopify; experienced in social media, content strategy, and email marketing.',
'- Instructional Designer and Senior Project Manager with expertise in developing eLearning experiences and conducting needs analysis for global clients.\n- Expertise in writing storyboards and scripts, creating engaging instructional graphics and animations, and developing scenario-based eLearning content.\n- Proficient in utilizing trends and best practices in learning technologies, liaising between stakeholders, and managing eLearning projects to completion.\n- Holds a Master of Arts in Applied Linguistics and Bachelor of Arts in History, with a Certificate in Digital Media.\n- Skilled in Adobe Captivate, Figma, Microsoft Excel, PowerPoint, and Word.\n- Experience includes roles as a Teacher, Instructional Designer, and eLearning Developer.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Dataset:
vac-res-matcher
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@10 | 0.3939 |
cosine_precision@10 | 0.0799 |
cosine_recall@10 | 0.129 |
cosine_ndcg@10 | 0.1319 |
cosine_mrr@10 | 0.2092 |
cosine_map@10 | 0.0808 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 149,352 training samples
- Columns:
sentence_0
andsentence_1
- Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 type string string details - min: 57 tokens
- mean: 116.11 tokens
- max: 128 tokens
- min: 52 tokens
- mean: 118.95 tokens
- max: 128 tokens
- Samples:
sentence_0 sentence_1 - Staff Accountant position seeking individuals with 2-3 years of experience in AR and AP areas, ideally with a Bachelor's degree in accounting, finance, or related field, or equivalent work experience.
- Responsibilities include processing e-commerce payments, handling payables and receivables, preparing financial statements, supporting tax reports, and managing close processes.
- Requires knowledge of basic accounting principles, experience with general ledger functions, and proficiency in Microsoft Office, particularly Excel.
- Strong communication, problem-solving, and organizational skills are essential.
- Attributes include a high level of integrity, ability to multitask, and strong time management.
- Regular attendance and adherence to health, safety, and environmental policies are required.- Experienced Account Payable Specialist with 5 years in vendor management, invoice processing, and reconciliation.
- Processed 120-150 invoices daily, handling 3 V-way matching, and reconciling over 100,000 accounts.
- Skilled in using Oracle, Excel VLOOKUP, and SAP Concur for reconciliation and disbursements.
- Master’s in Accountancy; proficient in Microsoft Excel, SharePoint, and SAP applications.
- Excellent interpersonal, analytical, and organizational skills.
- Detailed work in expense invoicing, payments, and communication with vendors.- Controls Assistant Project Manager position requires 3+ years of experience, stable work history, and a bachelor's degree in mechanical or electrical engineering.
- Candidates must be familiar with AutoCAD and Visio, and have experience with BACnet and DDC controls.
- Knowledge of Siemens, Johnson Controls, and similar control systems is essential.
- EIT or PE license is preferred but not required.- Systems Integrator with 3 years of experience in Greater Seattle Area, specialized in project integration and sales.
- Currently a Systems Integrator at GE Cimplicity, responsible for sales and project integration.
- Former Systems Integrator at Siemens Tia Portal; proficient in Siemens Hardware.
- Experience in GE Proficy and Albireo Energy roles, focusing on mechanical engineering.
- Skills: Project Integrator, Bachelor's degree in Mechanical Engineering, bilingual in English and Spanish.
- Strong background in customer engagement and system sales.- Senior HVAC Service Technician, requiring 5+ years of experience in commercial HVAC service and repair.
- Key responsibilities include diagnosing, repairing, and maintaining large commercial HVAC systems, interpreting blueprints and data, and providing customer education.
- Must possess an EPA Universal Certification and extensive knowledge of commercial HVAC systems.
- Requires proficiency in interpreting technical data and blueprints, and a local work history.
- Strong communication and customer service skills essential.
- Ideal for professionals in San Diego, CA; must be passionate about HVAC craft and committed to service excellence.- Mechanical Technician with extensive experience in startup, commissioning, and mechanical trades.
- Currently employed at Countywide Mechanical Systems, Inc. in San Diego County, California.
- Skilled in mechanical systems commissioning and troubleshooting.
- Strong problem-solving and attention to detail in complex mechanical environments.
- Proficient in system design, installation, and maintenance of mechanical systems.
- Holds certifications in mechanical technology and engineering.
- Experienced in project management and collaborative teamwork.
- Educated in mechanical engineering or a related field. - Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 24per_device_eval_batch_size
: 24num_train_epochs
: 1multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 24per_device_eval_batch_size
: 24per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseeval_use_gather_object
: Falsebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | Training Loss | vac-res-matcher_cosine_map@10 |
---|---|---|---|
0.0803 | 500 | 1.2875 | - |
0.1000 | 622 | - | 0.0833 |
0.1607 | 1000 | 1.1274 | - |
0.1999 | 1244 | - | 0.0822 |
0.2410 | 1500 | 1.0646 | - |
0.2999 | 1866 | - | 0.0793 |
0.3214 | 2000 | 0.9926 | - |
0.3998 | 2488 | - | 0.0773 |
0.4017 | 2500 | 0.9651 | - |
0.4821 | 3000 | 0.9499 | - |
0.4998 | 3110 | - | 0.0798 |
0.5624 | 3500 | 0.9098 | - |
0.5997 | 3732 | - | 0.0793 |
0.6428 | 4000 | 0.8948 | - |
0.6997 | 4354 | - | 0.0831 |
0.7231 | 4500 | 0.8962 | - |
0.7996 | 4976 | - | 0.0809 |
0.8035 | 5000 | 0.8677 | - |
0.8838 | 5500 | 0.8696 | - |
0.8996 | 5598 | - | 0.0816 |
0.9642 | 6000 | 0.8718 | - |
0.9995 | 6220 | - | 0.0808 |
1.0 | 6223 | - | 0.0808 |
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.2.1
- Transformers: 4.44.2
- PyTorch: 2.4.1+cu121
- Accelerate: 0.34.2
- Datasets: 3.0.1
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 8
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for Daxtra/sbert-summaries-mxbai-24-batch
Base model
mixedbread-ai/mxbai-embed-large-v1Evaluation results
- Cosine Accuracy@10 on vac res matcherself-reported0.394
- Cosine Precision@10 on vac res matcherself-reported0.080
- Cosine Recall@10 on vac res matcherself-reported0.129
- Cosine Ndcg@10 on vac res matcherself-reported0.132
- Cosine Mrr@10 on vac res matcherself-reported0.209
- Cosine Map@10 on vac res matcherself-reported0.081