metadata
base_model: BAAI/bge-small-en-v1.5
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:29545
- loss:MultipleNegativesSymmetricRankingLoss
widget:
- source_sentence: >-
In terms of audited accounts submission for an Applicant, could you
clarify the scenarios in which the Regulator might agree that a reviewed
pro forma statement of financial position is not needed, and what factors
would be considered in making that determination?
sentences:
- "DocumentID: 1 | PassageID: 4.2.1.(3) | Passage: Where the regulator in another jurisdiction does not permit the implementation of policies, procedures, systems and controls consistent with these Rules, the Relevant Person must:\n(a)\tinform the Regulator in writing immediately; and\n(b)\tapply appropriate additional measures to manage the money laundering risks posed by the relevant branch or subsidiary."
- "DocumentID: 11 | PassageID: 2.3.15.(4) | Passage: The Applicant must submit to the Regulator the following records, as applicable:\n(a)\tAudited accounts, for the purposes of this Rule and Rule 2.3.2(1), for the last three full financial years, noting that:\n(i)\tif the Applicant applies for admission less than ninety days after the end of its last financial year, unless the Applicant has audited accounts for its latest full financial year, the accounts may be for the three years to the end of the previous financial year, but must also include audited or reviewed accounts for its most recent semi-annual financial reporting period; and\n(ii)\tif the Applicant applies for admission more than six months and seventy-five days after the end of its last financial year, audited or reviewed accounts for its most recent semi-annual financial reporting period (or longer period if available).\n(b)\tUnless the Regulator agrees it is not needed, a reviewed pro forma statement of financial position. The review must be conducted by an accredited professional auditor of the company or an independent accountant."
- >
DocumentID: 36 | PassageID: D.1.3. | Passage: Principle 1 – Oversight
and responsibility of climate-related financial risk exposures.Certain
functions related to the management of climate-related financial risks
may be delegated, but, as with other risks, the board is ultimately
responsible and accountable for monitoring, managing and overseeing
climate-related risks for the financial firm.
- source_sentence: >-
A financial institution is interested in multiple designations, including
the ADGM Green Fund and ADGM Green Bond. For each application, what fee
will the institution incur?
sentences:
- >
DocumentID: 31 | PassageID: 63) | Passage: INITIAL DISCLOSURE OF
MATERIAL ESTIMATES.
Disclosure of material estimates of Contingent Resources
Section 2.3 of the PRMS Guidelines states that Contingent Resources may
be assigned for Petroleum Projects that are dependent on ‘technology
under development’, and further recommended that a number of guidelines
are followed in order to distinguish these estimates from those that
should be classified as Unrecoverable Petroleum. By way of Rule
12.10.1(3), the FSRA fully supports and requires compliance with what is
set out in the PRMS Guidelines.
- >
DocumentID: 19 | PassageID: 40) | Passage: REGULATORY REQUIREMENTS FOR
AUTHORISED PERSONS ENGAGED IN REGULATED ACTIVITIES IN RELATION TO
VIRTUAL ASSETS
Anti-Money Laundering and Countering Financing of Terrorism
On 21 June 2019, FATF released a revised Guidance for a Risk-Based
Approach (RBA) for VAs and VASPs, as well as an Interpretative Note for
Recommendation 15. This built upon previous FATF statements by
clarifying a RBA for Anti-Money Laundering and Countering the Financing
of Terrorism (“AML/CFT”) purposes. The basic principle underlying the
FATF Guidelines is that VASPs are expected to “identify, assess, and
take effective action to mitigate their ML/TF risks” with respect to
VAs.
- "DocumentID: 4 | PassageID: 10.1.1 | Passage: A Person applying to the Regulator for any of the following designations:\n(a)\tADGM Green Fund;\n(b)\tADGM Climate Transition Fund;\n(c)\tADGM Green Portfolio;\n(d)\tADGM Climate Transition Portfolio;\n(e)\tADGM Green Bond; or\n(f)\tADGM Sustainability Linked Bond\nmust pay to the Regulator an application fee of $2,000."
- source_sentence: >-
How does the ADGM expect Authorised Persons to incorporate the eligibility
of collateral types into their overall risk management framework,
particularly concerning Islamic finance principles?
sentences:
- >-
DocumentID: 17 | PassageID: Schedule 1.Part 2.Chapter 5.42.(2) |
Passage: In determining for the purposes of sub-paragraph (1)(b)
whether Deposits are accepted only on particular occasions, regard is to
be had to the frequency of those occasions and to any characteristics
distinguishing them from each other.
- "DocumentID: 9 | PassageID: 6.8.5 | Passage: \n(a)\tA Fund Manager of an Islamic REIT may obtain financing either directly or through its Special Purpose Vehicle up to 65% of the total gross asset value of the Fund provided that such financing is provided in a Shari'a-compliant manner.\n(b)\tUpon becoming aware that the borrowing limit set out in 6.8.5(a) has been exceeded, the Fund Manager shall:\n(c)\timmediately inform Unitholders and the Regulator of the details of the breach and the proposed remedial action;\n(d)\tuse its best endeavours to reduce the excess borrowings;\n(e)\tnot permit the Fund to engage in additional borrowing; and\n(f)\tinform Unitholders and the Regulator on a regular basis as to the progress of the remedial action."
- >-
DocumentID: 9 | PassageID: 5.1.1.Guidance.(ii) | Passage: The prudential
Category for Islamic Financial Institutions and other Authorised Persons
(acting through an Islamic Window) undertaking the Regulated Activity of
Managing PSIAs (which may be either a Restricted PSIA or an Unrestricted
PSIA) is determined in accordance with PRU Rule 1.3. An Authorised
Person which Manages PSIAs (whether as an Islamic Financial Institution
or through an Islamic Window) must comply with the requirements in PRU
in relation to specific prudential requirements relating to Trading Book
and Non-Trading Book activities, including Credit Risk, Market Risk,
Liquidity Risk and Group Risk.
- source_sentence: >-
Can you please detail the specific Anti-Money Laundering (AML) and
Countering Financing of Terrorism (CFT) measures and controls that our
firm must have in place when dealing with Spot Commodities as per the
FSRA's requirements?
sentences:
- >
DocumentID: 34 | PassageID: 65) | Passage: REGULATORY REQUIREMENTS -
SPOT COMMODITY ACTIVITIES
Sanctions
Pursuant to AML Rule 11.2.1(1), an Authorised Person must have
arrangements in place to ensure that only Spot Commodities that are not
subject to sanctions or associated with an entity in the supply chain
that is itself subject to a sanction, are used as part of its Regulated
Activities, or utilised as part of a delivery and/or storage facility
operated by itself (or by any third parties it uses). In demonstrating
compliance with the Rule, an Authorised Person must have powers to
resolve any breach in a timely fashion, such as taking emergency action
itself or by compelling the delivery and/or storage facility to take
appropriate action. The FSRA expects this to include the Authorised
Person having the ability to sanction a Member, market participant or
the delivery and/or storage facility for acts or omissions that
compromise compliance with applicable sanctions.
- "DocumentID: 18 | PassageID: 3.2 | Passage: Financial Services Permissions. VC Managers operating in ADGM require a Financial Services Permission (“FSP”) to undertake any Regulated Activity pertaining to VC Funds and/or co-investments by third parties in VC Funds. The Regulated Activities covered by the FSP will be dependent on the VC Managers’ investment strategy and business model.\n(a)\tManaging a Collective Investment Fund: this includes carrying out fund management activities in respect of a VC Fund.\n(b)\tAdvising on Investments or Credit : for VC Managers these activities will be restricted to activities related to co-investment alongside a VC Fund which the VC Manager manages, such as recommending that a client invest in an investee company alongside the VC Fund and on the strategy and structure required to make the investment.\n(c)\tArranging Deals in Investments: VC Managers may also wish to make arrangements to facilitate co-investments in the investee company.\nAuthorisation fees and supervision fees for a VC Manager are capped at USD 10,000 regardless of whether one or both of the additional Regulated Activities in b) and c) above in relation to co-investments are included in its FSP. The FSP will include restrictions appropriate to the business model of a VC Manager."
- >-
DocumentID: 24 | PassageID: 3.9 | Passage: Principle 2 – High Standards
for Authorisation. This discerning approach is shown by the FSRA’s power
to only permit VAs that it deems ‘acceptable’, as determined by risk
factors such as security and traceability, in order to prevent the
build-up of risk from illiquid or immature assets. Additionally, we do
not permit stablecoins based on the algorithmic model of valuation to
the underlying fiat currency.
- source_sentence: >-
What are the common scenarios or instances where assets and liabilities
are not covered by the bases of accounting in Rule 5.3.2, and how should
an Insurer address these in their reporting?
sentences:
- >-
DocumentID: 1 | PassageID: 14.4.1.Guidance.1. | Passage: Relevant
Persons are reminded that in accordance with Federal AML Legislation,
Relevant Persons or any of their Employees must not tip off any Person,
that is, inform any Person that he is being scrutinised, or investigated
by any other competent authority, for possible involvement in suspicious
Transactions or activity related to money laundering or terrorist
financing.
- "DocumentID: 12 | PassageID: 5.3.1.Guidance | Passage: \nThe exceptions provided in this Chapter relate to the following:\na.\tspecific Rules in respect of certain assets and liabilities, intended to achieve a regulatory objective not achieved by application of either or both of the bases of accounting set out in Rule 5.3.2;\nb.\tassets and liabilities that are not dealt with in either or both of the bases of accounting set out in Rule 5.3.2; and\nc.\tthe overriding power of the Regulator, set out in Rule 5.1.6, to require an Insurer to adopt a particular measurement for a specific asset or liability."
- >+
DocumentID: 1 | PassageID: 6.2.1.Guidance.2. | Passage: The risk
assessment under Rule 6.2.1(c) should identify actions to mitigate
risks associated with undertaking NFTF business generally, and the use
of eKYC specifically. This is because distinct risks are often likely to
arise where business is conducted entirely in an NFTF manner, compared
to when the business relationship includes a mix of face-to-face and
NFTF interactions. The assessment should make reference to risk
mitigation measures recommended by the Regulator, a competent authority
of the U.A.E., FATF, and other relevant bodies.
SentenceTransformer based on BAAI/bge-small-en-v1.5
This is a sentence-transformers model finetuned from BAAI/bge-small-en-v1.5 on the csv dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-small-en-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 384 tokens
- Similarity Function: Cosine Similarity
- Training Dataset:
- csv
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("jebish7/bge-small-en-v1.5_MNSR_15")
# Run inference
sentences = [
'What are the common scenarios or instances where assets and liabilities are not covered by the bases of accounting in Rule 5.3.2, and how should an Insurer address these in their reporting?',
'DocumentID: 12 | PassageID: 5.3.1.Guidance | Passage: \nThe exceptions provided in this Chapter relate to the following:\na.\tspecific Rules in respect of certain assets and liabilities, intended to achieve a regulatory objective not achieved by application of either or both of the bases of accounting set out in Rule \u200e5.3.2;\nb.\tassets and liabilities that are not dealt with in either or both of the bases of accounting set out in Rule \u200e5.3.2; and\nc.\tthe overriding power of the Regulator, set out in Rule \u200e5.1.6, to require an Insurer to adopt a particular measurement for a specific asset or liability.',
'DocumentID: 1 | PassageID: 14.4.1.Guidance.1. | Passage: Relevant Persons are reminded that in accordance with Federal AML Legislation, Relevant Persons or any of their Employees must not tip off any Person, that is, inform any Person that he is being scrutinised, or investigated by any other competent authority, for possible involvement in suspicious Transactions or activity related to money laundering or terrorist financing.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
csv
- Dataset: csv
- Size: 29,545 training samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 16 tokens
- mean: 34.95 tokens
- max: 68 tokens
- min: 35 tokens
- mean: 132.0 tokens
- max: 512 tokens
- Samples:
anchor positive If a financial institution offers Money Remittance as one of its services, under what circumstances is it deemed to be holding Relevant Money and therefore subject to regulatory compliance (a)?
DocumentID: 13
What are the consequences for a Recognised Body or Authorised Person if they fail to comply with ADGM's requirements regarding severance payments?
DocumentID: 7
If a Public Fund is structured as an Investment Trust, to whom should the Fund Manager report the review findings regarding delegated Regulated Activities or outsourced functions?
DocumentID: 6
- Loss:
MultipleNegativesSymmetricRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size
: 32learning_rate
: 2e-05warmup_ratio
: 0.1batch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: noprediction_loss_only
: Trueper_device_train_batch_size
: 32per_device_eval_batch_size
: 8per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 3max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falsebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss |
---|---|---|
0.2165 | 100 | 1.4357 |
0.4329 | 200 | 0.9589 |
0.6494 | 300 | 0.9193 |
0.8658 | 400 | 0.8542 |
1.0823 | 500 | 0.8643 |
1.2987 | 600 | 0.8135 |
1.5152 | 700 | 0.7658 |
1.7316 | 800 | 0.7454 |
1.9481 | 900 | 0.7477 |
2.1645 | 1000 | 0.7586 |
2.3810 | 1100 | 0.6978 |
2.5974 | 1200 | 0.7152 |
2.8139 | 1300 | 0.6866 |
0.2165 | 100 | 0.7049 |
0.4329 | 200 | 0.6651 |
0.6494 | 300 | 0.6942 |
0.8658 | 400 | 0.6695 |
1.0823 | 500 | 0.7048 |
1.2987 | 600 | 0.636 |
1.5152 | 700 | 0.5984 |
1.7316 | 800 | 0.6001 |
1.9481 | 900 | 0.6096 |
2.1645 | 1000 | 0.6313 |
2.3810 | 1100 | 0.5437 |
2.5974 | 1200 | 0.5716 |
2.8139 | 1300 | 0.5634 |
0.2165 | 100 | 0.5708 |
0.4329 | 200 | 0.5263 |
0.6494 | 300 | 0.5716 |
0.8658 | 400 | 0.5547 |
1.0823 | 500 | 0.5922 |
1.2987 | 600 | 0.5306 |
1.5152 | 700 | 0.4802 |
1.7316 | 800 | 0.4948 |
1.9481 | 900 | 0.512 |
2.1645 | 1000 | 0.532 |
2.3810 | 1100 | 0.4349 |
2.5974 | 1200 | 0.465 |
2.8139 | 1300 | 0.4657 |
Framework Versions
- Python: 3.10.14
- Sentence Transformers: 3.1.1
- Transformers: 4.45.2
- PyTorch: 2.4.0
- Accelerate: 0.34.2
- Datasets: 3.0.1
- Tokenizers: 0.20.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}