metadata
base_model: BAAI/bge-large-en
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:616
- loss:CosineSimilarityLoss
widget:
- source_sentence: Fulfilment of contractual obligations
sentences:
- >-
Should a tenderer find discrepancies in or omissions from the drawings
or any of the Tender Forms or should he be in doubt as to their meaning
- >-
Period of Maintenance shall mean the specified period of maintenance
from the date of completion of the works, as certified by the Engineer.
- >-
A copy of certificate stating that they are not liable to be
disqualified and all their statements/documents
- source_sentence: Is time is of essence in the contract?
sentences:
- >-
In exceptional cases where accommodation is provided to the Contractor
at the Railway's discretion, recoveries shall be made at such rates
- and the works must be completed not later than the dates
- >-
The successful bidder shall submit the Performance Guarantee (PG) in any
of the following forms, amounting to 5% of the contract value
- source_sentence: Is there a way to claim consequential losses?
sentences:
- "provision has been made in Clauses 7(j), 8, 18, 22(5), 39, 43(2), 45(i)(a), 55, 55-A(5), 57, 57A,61(1), 61(2) and 62(1) of Standard General Conditions of Contract or in any Clause (stated as excepted matter) of the Special Conditions of the Contract, shall be deemed as \x91excepted matters\x92 (matters not arbitrable) and decisions of the Railway authority"
- >-
All sums payable by way of compensation under any of these conditions
shall be considered as reasonable compensation
- Third party liability relationship is present in this contract.
- source_sentence: Valuables found during works
sentences:
- >-
The contractor will indemnify, defend, save and hold harmless the
Authority and its officers, servants, agents, Government
INstrumentalities and Government owned and/or controlled
entities/enterprises, against any and all suits, proceedings, actions,
demands and third party claims for any loss, damage, cost and expense of
whatever kind and nature, whether arising out of any breach by the
contractor of any its obligations inder this agrreement, including any
errors or deficiencies in the design documents, or tort or on any other
ground whatsoever, except to the extent that any such suits,
proceedings, actions, demands and claims have arisen due to any
negligent act or omission, or breach or default of this agreement on the
part of the authority Indemnified persons.
- >-
his position as an independent contractor specifying engineering
organization available with details of partners / staff / engineers
employed with qualifications and experience
- >-
All gold, silver, oil, other minerals of any description, all precious
stones, coins, treasures relics antiquities and other similar things
which shall be found in or upon the site shall be the property of the
Railway
- source_sentence: Project schedules like Bar chart, CPM, PERT
sentences:
- "\_All temporary works necessary for the proper execution of the works shall be provided and maintained by the Contractor"
- Can the excavated material be directly used in construction.
- >-
Nothing stated herein shall preclude the Contractor in achieving earlier
completion of item or whole of the works than indicated in the
programme.
SentenceTransformer based on BAAI/bge-large-en
This is a sentence-transformers model finetuned from BAAI/bge-large-en. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-large-en
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 1024 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Ananthu357/Ananthus-BAAI-for-contracts8.0")
# Run inference
sentences = [
'Project schedules like Bar chart, CPM, PERT',
'\xa0All temporary works necessary for the proper execution of the works shall be provided and maintained by the Contractor',
'Nothing stated herein shall preclude the Contractor in achieving earlier completion of item or whole of the works than indicated in the programme.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 16per_device_eval_batch_size
: 16num_train_epochs
: 15warmup_ratio
: 0.1fp16
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 15max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falsebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | loss |
---|---|---|---|
2.4615 | 100 | 0.0629 | 0.0440 |
4.9231 | 200 | 0.012 | 0.0504 |
7.3333 | 300 | 0.0052 | 0.0462 |
9.7949 | 400 | 0.0031 | 0.0489 |
12.2051 | 500 | 0.0016 | 0.0479 |
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.0.1
- Transformers: 4.42.4
- PyTorch: 2.3.1+cu121
- Accelerate: 0.32.1
- Datasets: 2.21.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}