Daxtra's picture
Add new SentenceTransformer model
fbff272 verified
|
raw
history blame
35.2 kB
metadata
base_model: mixedbread-ai/mxbai-embed-large-v1
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:112464
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: >-
      Stocked the different varieties of household goods and gardening equipment

      Gave in-depth product knowledge to customers who required detailed
      information on particular products

      Promoted a welcoming environment where customers and suppliers received
      great service and strived to maintain the shop to exceptional standards
    sentences:
      - |-
        Monitor and maximize retail budgets.
        Training and maintaining the skills and well-being of current staff.
        Interviewing and selectively hire the most qualified candidates.
      - >-
        Responsible for checking and unpacking stock deliveries, reporting any
        damages or defects to the couriers and suppliers, correctly pricing
        items and displaying goods encouraging customers to make purchases

        Presented products neatly ensuring the shop floor was tidy at all times,
        keeping the area safe of hazards

        Dealt with customer enquiries and complaints face-to-face, resolving any
        issues quickly and efficiently and rarely had to escalate them to
        management to resolve
      - >-
        Operating their EPOS computer system.

        Working well on my own and as part as a team.

        Handling all incoming and outgoing phone queries.

        Dealing with all incoming and outgoing emails from other branches and
        head office.
  - source_sentence: >-
      Advising on best practices.

      Installing and maintaining Windows and Linux network systems.

      Installing, uninstalling, troubleshooting specific Software for hospital
      based equipment.
    sentences:
      - >-
        Hardware and software installation and desktop support.

        Solving I.T. issues for hospital staff.

        Backing up Data Systems, setting RAID configurations, wiring, setting up
        network and proxy servers. Fixing and troubleshooting desktop computers
        and laptops.
      - >-
        Preparing and maintaining all aspects of paperwork from accounts to
        ordering of stock, selling of animals and paying of suppliers

        All general roles associated with the day to day running of a farm

        Liaising with Agri reps in adhering to Health and Safety Standards
      - |-
        Working to deadlines daily
        Maintaining customer accounts
        Managed 4 Reps accounts after 6months
  - source_sentence: >-
      Maintained good time keeping and delivered within the required journey
      time limit

      Maintained accurate and clear paperwork and delivery records

      Developed self-discipline and good organisational skills, using initiative
      and able to solve problems effectively as they arose

      Developed a good geographical knowledge with the ability to read maps and
      plan routes

      Followed strict Road Traffic laws with regards to speed and weight limits

      Complied with strict Health and Safety and Welfare procedures, policies
      and standards along with statutory

      Coordinated schedule of pick up points and delivery addresses and planned
      the most efficient route, sorting packages into order of dropping off
      points
    sentences:
      - >-
        Purchasing of all supplies, equipment, furniture and services in
        accordance with NBK procedures. To assist with purchasing and design of
        stationary/printing.  Arrange quotations, proofs and printing of
        approved changes as necessary. Produce purchase orders.

        General ad hoc tasks.

        Research new, potential suppliers that are reliable and deliver a high
        standard of service, whilst considering the importance of cost savings.

        Answering switchboard in a polite and efficient manner and transferring
        calls to the relevant colleague.

        Maintain asset register including monthly depreciation, new assets,
        write offs, asset tagging, auditing, related accounting etc (fixed
        assets). Scanning and barcoding new assets for our register software.
        Making sure the necessary information is recorded - cost centre, product
        details, location.
      - >-
        Co-operated with despatch and receiving staff, assisting with the
        storage and removal of packages

        Maintained physical fitness levels and the ability to work in a
        stressful environment

        Collected and delivered home shopping products safely to customers in
        the shortest available time

        Meticulously inspected all lights, brakes, fuel and tyres were in good
        working condition before and during journeys, which required good
        attention to detail

        Sought the quickest route to the delivery addresses and ensured all
        packages were correctly signed for
      - >-
        Responsible for Quality Assurance of Projects according to QMS (Quality
        Management System)

        Coordinating Verification & Validation activities and resources

        Producing HW and Mechanical Verification & Validation Reports

        Supporting System, SW, Electronic, Mechanical Requirements definitions

        Ensuring traceability among HW and Mechanical requirements, Design
        Documents, test procedures and test report, using relevant configuration
        management tools and producing relevant Traceability matrix document

        Supporting Regulatory submissions, interfacing with the notified body
        for IEC 62304 compliancy

        Defining Test Plan and verification strategies

        Overseeing SW Unit tests: Formal Code Inspection Reviews, automated test
        execution in simulated SW in the loop test environment

        Defining and performing System Test Procedures and manual /automated
        test sequences to be executed in real or simulated environments (Black
        Box test)

        Supporting Device Design transfer to Manufacturing, by writing Operative
        Instructions, Procedures, Production requirements, etc.

        Verification and Validations activities:

        Requirements Change Management

        Repeating relevant tests affected by the change, according to the
        regression analysis

        Supporting identification of Functional, Usability, Risk Control
        Measures Requirements

        HW/SW Development processes: Team and Resources Coordination,
        Monitoring/Tracking Activities Status, Scheduling and coordination of
        Technical meetings, Project Reviews meetings and Status Review Meetings.
        Preparing and archiving Project revision, verification and validation
        reports. Supporting or carrying out Risk Analysis, System Engineering,
        Software and Mechanical Design and Development activities, thanks to a
        strong technical background and skills.

        Risk Management

        Suppliers Management: Allocating and transferring technical and quality
        requirements to strategic suppliers. Monitoring strategic and critical
        outsourcing activities. Seeking reliable or strategic Suppliers.
        Supporting the purchasing manager in negotiating prices and delivery
        time, reviewing technical datasheet or specifications, requesting
        quotation, determining quantity and schedule of deliveries

        Ensuring traceability among System and SW requirements, SW Design
        Documents, SW Unit Verification, System test cases and Test Results,
        using relevant configuration management tools and producing SW
        Traceability matrix document

        Starting Processes: Scope Definition and Project Charter

        Supporting System, SW, Electronic, Mechanical Design Reviews

        Supporting Test environment and Testing Tool Requirements definition

        Regulatory and Product Certification: Interaction with Notified Body
        (es. IMQ, TUV)  for CE/CB certification, according to 93/42 directive
        and EN 60601 standard Series. Interaction with Medical Testing labs
        during safety, EMC, acoustic, etc. certification compliance test

        Change Management and Configuration Control.
  - source_sentence: >-
      Maintained a high level of quality in each case that reviewed the content

      Assisted new joiners in shadowing process

      Focused on analysing, labelling and discovering patterns of suspicious
      activity - with minimal supervision

      Balanced priorities of daily workflow tasks in line with client needs
    sentences:
      - >-
        Meeting both new and existing client

        Generating new business both in face to face meetings and over the
        phone.

        Writing up sales reports, and activity reports.

        Writing up concise, value-based sales proposals.

        Replying to all customer enquiries in a timely and accurate manner.
      - >-
        Reinforced concrete construction, structural steel and MEP works

        Eastern and Western Ticket Halls connected by tunnels and platforms

        In depth exposure and management of commercial issues, programme,
        interface and handover for the contract

        Dealt closely with project management team

        Enhanced multi-tasking skills

        Rapid familiarisation with design drawings, specifications and standards

        Completion of quality audits internally and on suppliers

        Close liaison between contractor, subcontractors, design, commercial and
        client at management levels

        £ 250M

        Management of quality issues throughout the construction process
      - >-
        trained the Machine how to process data with accuracy, with excellent
        quality in artificial intelligence learning process

        Escalated violations of client policies using internal tools

        Visually navigated and reviewed images/video along with text-based
        content through internally developed applications

        Actively took part in different internal projects

        Evaluated online social media and advertising content, making sure it is
        in line with the client's policy

        Achieved weekly productivity deliverables as part of daily workflow
  - source_sentence: >-
      Responsible for lease negotiations and rent collections with the aim of
      maximising yields for rented properties.

      Dealing with freedom of information requests.

      Dealing with title issues.

      Manage a wide range of insolvency assignments such as Fixed & Floating
      charge Receiverships, Members Voluntary Liquidations, Creditors Voluntary
      Liquidations and Court Appointed Liquidations.

      Development and roll out of disposal strategies for all properties under
      management.

      Manage the build out of a number of ghost estates on behalf of NAMA. A
      current project involves remediation works to 84 residential units in Co.
      Monaghan with a value of  EUR 10 Million.

      Preparation of tenders for NAMA & other financial institutions.

      Attending meeting with borrowers and financial institutions.

      Coordinate with estate agent to ensure we are receiving maximum yields for
      rented properties. I am responsible for the management of in excess of 200
      rental properties across various Receiverships under my remit.
    sentences:
      - >-
        Preparation of budgets in order to manage cash flow throughout the
        various assignments.

        Preparation and submission of tax and CRO returns.

        Communicate with Solicitors, Estate Agents and Assets Managers on a
        daily basis to ensure properties are brought to market and sold in a
        timely manner.

        Drafting monthly/quarterly reports for case managers.

        Reviewing tenders received and appointment of professional service
        firms.

        Liaising with NAMA case managers and our internal tax department in
        order to determine the most tax efficient manner to dispose of
        properties.
      - >-
        Retail businesses - high street shops

        Trades - electricians, joiners, printers

        Oil industry supply companies including, fabricators, machinists and
        designers for offshore and onshore applications.

        Supporting business owners to grow their businesses, by providing them
        with strategies to develop effectiveness and improve profitability
        through all operational activities including Purchasing, Stock Control,
        Production Planning, Sales, Marketing, Distribution and Customer
        Services.

        Worked with 26 different businesses in 2010/11 and achieved an average
        increase in Gross Profit of 39% for those businesses plus additional
        benefits of business efficiency and team effectiveness.
      - >-
        Realization of bedside rounds and teaching.

        Program implementation and development which include: administrative and
        HR management; conception and implementation of information system;
        Conception, implementation and coordination of PMTCT program.

        Monthly report of activities.

        Planning and Supervision of mortality and morbidity review (MMR).

        Responsible for communication with the pediatric Saint Damien Hospital
        and other existing programs in the same hospital.

        Note: This program run by NPFS/Saint Damien and funded by Francesca Rava
        foundation at

        Supervision of the staff (12 Obstetricians, 7 anesthetists, 16 nurse
        midwives, 6 auxiliary midwives, 1 administrative assistant , 1 data
        clerk etc.)

        Performance of ultrasound

        Clinical work according to day time schedule

        Performance of surgical procedures

SentenceTransformer based on mixedbread-ai/mxbai-embed-large-v1

This is a sentence-transformers model finetuned from mixedbread-ai/mxbai-embed-large-v1. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: mixedbread-ai/mxbai-embed-large-v1
  • Maximum Sequence Length: 128 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Daxtra/sbert-trained-on-whs")
# Run inference
sentences = [
    'Responsible for lease negotiations and rent collections with the aim of maximising yields for rented properties.\nDealing with freedom of information requests.\nDealing with title issues.\nManage a wide range of insolvency assignments such as Fixed & Floating charge Receiverships, Members Voluntary Liquidations, Creditors Voluntary Liquidations and Court Appointed Liquidations.\nDevelopment and roll out of disposal strategies for all properties under management.\nManage the build out of a number of ghost estates on behalf of NAMA. A current project involves remediation works to 84 residential units in Co. Monaghan with a value of  EUR 10 Million.\nPreparation of tenders for NAMA & other financial institutions.\nAttending meeting with borrowers and financial institutions.\nCoordinate with estate agent to ensure we are receiving maximum yields for rented properties. I am responsible for the management of in excess of 200 rental properties across various Receiverships under my remit.',
    'Preparation of budgets in order to manage cash flow throughout the various assignments.\nPreparation and submission of tax and CRO returns.\nCommunicate with Solicitors, Estate Agents and Assets Managers on a daily basis to ensure properties are brought to market and sold in a timely manner.\nDrafting monthly/quarterly reports for case managers.\nReviewing tenders received and appointment of professional service firms.\nLiaising with NAMA case managers and our internal tax department in order to determine the most tax efficient manner to dispose of properties.',
    'Realization of bedside rounds and teaching.\nProgram implementation and development which include: administrative and HR management; conception and implementation of information system; Conception, implementation and coordination of PMTCT program.\nMonthly report of activities.\nPlanning and Supervision of mortality and morbidity review (MMR).\nResponsible for communication with the pediatric Saint Damien Hospital and other existing programs in the same hospital.\nNote: This program run by NPFS/Saint Damien and funded by Francesca Rava foundation at\nSupervision of the staff (12 Obstetricians, 7 anesthetists, 16 nurse midwives, 6 auxiliary midwives, 1 administrative assistant , 1 data clerk etc.)\nPerformance of ultrasound\nClinical work according to day time schedule\nPerformance of surgical procedures',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 112,464 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 8 tokens
    • mean: 64.94 tokens
    • max: 128 tokens
    • min: 6 tokens
    • mean: 64.91 tokens
    • max: 128 tokens
  • Samples:
    sentence_0 sentence_1
    Co-authored standalone Moodle site on sustainability that was marketed by the college and sold to various 3rd parties, making in excess of £ 20,000.
    Expert-level knowledge in eLearning and Virtual Learning Environments through daily use, including Moodle and Mahara with the requirement to produced tailored courses on the technology to a diverse range of teaching professionals;
    Delivered in excess of 20 training sessions to over 600 lecturers on a range of new innovative practices in education including innovations within the VLE which supported their continued professional development and enhanced their classroom performance with a participant satisfaction rate regularly in excess of 95%;
    Administered Moodle across college, supporting learner management and championing the use of core modules to support the tracking and assessment of in excess of 15000 learners.
    Coordinated and managed multiple syllabuses and competency tests for advanced software development course, emphasising quality of resources to promote self-guided learning in addition to more traditional approaches, increasing intake by 400% over a 3 year period, achieving pass and completion rate to in excess of 97%;
    Improved quality of eLearning resources through the delivery of training on the use of screen-recording and simulation software in content development, increasing the use of the Virtual Learning Environment from something that would serve as a repository of worksheets to a more interactive and engaging application, appearing as the top visited pages in weekly reports;
    Promoted to the unique position of Head Judge in Web Design by the Government-backed National Apprenticeship Service, project managing and collaborating with a team of Expert Judges nationally setting up the timetabling and resourcing of events attended by in excess of 100 student competitors;
    Advising on best practices.
    Installing and maintaining Windows and Linux network systems.
    Installing, uninstalling, troubleshooting specific Software for hospital based equipment.
    Hardware and software installation and desktop support.
    Solving I.T. issues for hospital staff.
    Backing up Data Systems, setting RAID configurations, wiring, setting up network and proxy servers. Fixing and troubleshooting desktop computers and laptops.
    Analysis of data from the manufacture of finished goods, distribution of materials in the production of finished products;
    Preparation of cost of production calculations for finished products;
    Full maintenance of accounting, tax, and management accounting in accordance with the current legislation of Ukraine.
    Accounting for cash transactions;
    Maintenance of personnel documents (orders, contracts, employment records);
    Work with primary documents (billing, acts, account invoices, tax invoices, work in Client Bank and Privat 24, carrying out banking operations, preparation of acts of reconciliation, payroll, accounting of goods and materials);
    Preparation and submission of financial, statistical, and tax reporting.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 24
  • per_device_eval_batch_size: 24
  • num_train_epochs: 1
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 24
  • per_device_eval_batch_size: 24
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.0999 468 -
0.1067 500 0.432
0.1997 936 -
0.2134 1000 0.2153
0.2996 1404 -
0.3201 1500 0.1997
0.3995 1872 -
0.4268 2000 0.1635
0.4994 2340 -
0.5335 2500 0.1573
0.5992 2808 -
0.6402 3000 0.1518
0.6991 3276 -
0.7469 3500 0.1359
0.7990 3744 -
0.8536 4000 0.1351
0.8988 4212 -
0.9603 4500 0.1187
0.9987 4680 -
1.0 4686 -

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.2.0
  • Transformers: 4.44.2
  • PyTorch: 2.4.1+cu121
  • Accelerate: 0.34.2
  • Datasets: 3.0.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}