Ananthu357's picture
Add new SentenceTransformer model.
2891bbf verified
metadata
base_model: BAAI/bge-large-en
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:616
  - loss:CosineSimilarityLoss
widget:
  - source_sentence: Fulfilment of contractual obligations
    sentences:
      - >-
        Should a tenderer find discrepancies in or omissions from the drawings
        or any of the Tender Forms or should he be in doubt as to their meaning
      - >-
        Period of Maintenance shall mean the specified period of maintenance
        from the date of completion of the works, as certified by the Engineer.
      - >-
        A copy of certificate stating that they are not liable to be
        disqualified and all their statements/documents
  - source_sentence: Is time is of essence in the contract?
    sentences:
      - >-
        In exceptional cases where accommodation is provided to the Contractor
        at the Railway's discretion, recoveries shall be made at such rates
      - and the works must be completed not later than the dates
      - >-
        The successful bidder shall submit the Performance Guarantee (PG) in any
        of the following forms, amounting to 5% of the contract value
  - source_sentence: Is there a way to claim consequential losses?
    sentences:
      - "provision has been made in Clauses 7(j), 8, 18, 22(5), 39, 43(2), 45(i)(a), 55, 55-A(5), 57, 57A,61(1), 61(2) and 62(1) of Standard General Conditions of Contract or in any Clause (stated as excepted matter) of the Special Conditions of the Contract, shall be deemed as \x91excepted matters\x92 (matters not arbitrable) and decisions of the Railway authority"
      - >-
        All sums payable by way of compensation under any of these conditions
        shall be considered as reasonable compensation
      - Third party liability relationship is present in this contract.
  - source_sentence: Valuables found during works
    sentences:
      - >-
        The contractor will indemnify, defend, save and hold harmless the
        Authority and its officers, servants, agents, Government
        INstrumentalities and Government owned and/or controlled
        entities/enterprises, against any and all suits, proceedings, actions,
        demands and third party claims for any loss, damage, cost and expense of
        whatever kind and nature, whether arising out of any breach by the
        contractor of any its obligations inder this agrreement, including any
        errors or deficiencies in the design documents, or tort or on any other
        ground whatsoever, except to the extent that any such suits,
        proceedings, actions, demands and claims have arisen due to any
        negligent act or omission, or breach or default of this agreement on the
        part of the authority Indemnified persons.
      - >-
        his position as an independent contractor specifying engineering
        organization available with details of partners / staff / engineers
        employed with qualifications and experience
      - >-
        All gold, silver, oil, other minerals of any description, all precious
        stones, coins, treasures relics antiquities and other similar things
        which shall be found in or upon the site shall be the property of the
        Railway
  - source_sentence: Project schedules like Bar chart, CPM, PERT
    sentences:
      - "\_All temporary works necessary for the proper execution of the works shall be provided and maintained by the Contractor"
      - Can the excavated material be directly used in construction.
      - >-
        Nothing stated herein shall preclude the Contractor in achieving earlier
        completion of item or whole of the works than indicated in the
        programme.

SentenceTransformer based on BAAI/bge-large-en

This is a sentence-transformers model finetuned from BAAI/bge-large-en. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-large-en
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Ananthu357/Ananthus-BAAI-for-contracts8.0")
# Run inference
sentences = [
    'Project schedules like Bar chart, CPM, PERT',
    '\xa0All temporary works necessary for the proper execution of the works shall be provided and maintained by the Contractor',
    'Nothing stated herein shall preclude the Contractor in achieving earlier completion of item or whole of the works than indicated in the programme.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 15
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 15
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss
2.4615 100 0.0629 0.0440
4.9231 200 0.012 0.0504
7.3333 300 0.0052 0.0462
9.7949 400 0.0031 0.0489
12.2051 500 0.0016 0.0479

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.4
  • PyTorch: 2.3.1+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}