metadata
base_model: Snowflake/snowflake-arctic-embed-m-long
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:29547
- loss:MultipleNegativesRankingLoss
widget:
- source_sentence: >-
According to the Client Money Auditor's Report, how did the Authorised
Person manage Client Money—was it pooled in a single client Account or
segregated into individual Client Accounts as per COBS Chapter 14?
sentences:
- "The written notice in Rule 6.2.1(a)(i) must make it explicit that, if an Employee is prohibited from undertaking a Personal Account Transaction, he must not, except in the proper course of his employment:\n(a)\tprocure another Person to enter into such a Transaction; or\n(b)\tcommunicate any information or opinion to another Person if he knows, or ought to know, that the Person will as a result, enter into such a Transaction or procure some other Person to do so."
- "Client Money Auditor's Report:An Authorised Person must, in procuring the production of a Client Money Auditor's Report, ensure that an Auditor states, as at the date of which the Authorised Person's audited statement of financial position was prepared:\n(1)\tthe amount of Client Money an Authorised Person was holding and controlling in accordance with COBS Chapter 14; and\n(2)\twhether:\n(a)\tthe Authorised Person has maintained throughout the year systems and controls to enable it to comply with the relevant provisions of COBS Chapter 14;\n(b)\tthe Authorised Person's controls are such as to ensure that Client Money is identifiable and secure at all times;\n(c)\tany of the requirements in COBS Chapter 14 have not been met;\n(d)\tClient Money has been pooled in a single client Account or segregated in Client Accounts maintained for individual Clients in accordance with COBS Chapter 14;\n(e)\tif applicable, the Authorised Person as holding and controlling the appropriate amount of Client Money in accordance with COBS Chapter 14 as at the date on which the Authorised Person's audited statement of financial position was prepared;\n(f)\tthe Auditor has received all necessary information and explanations for the purposes of preparing the report to the Regulator; and\n(g)\tif applicable, there have been any material discrepancies in the reconciliation of Client Money."
- "CRS Options\n/Table Start\nNo.\tOPTIONS\tCOMMENTS\n1.\tAlternative approach to calculating account balances\tNO\n2.\tUse of other reporting period\tNO\n3.\tFiling deadlines\t30th June\n4.\tFiling Nil returns\tYES\n5.\tAllowing third party service providers to fulfil the obligations on behalf of\nthe Financial Institutions\tYES\n6.\tAllowing the due diligence procedures for New Accounts to be used for\nPre-existing Accounts\tYES\n7.\tAllowing the due diligence procedures for High Value Accounts to be used\nfor Lower Value Accounts\tYES\n8.\tResidence address test for Lower Value Accounts\tYES\n9.\tExclusion from Due Diligence for Pre-existing Entity Accounts not exceeding $250,000\tYES\n10.\tAlternative documentation procedure for certain employer-sponsored\ngroup insurance contracts or annuity contracts\tYES\n11.\tAllowing Financial Institutions to make greater use of existing\nstandardised industry coding systems for the due diligence process\tYES\n12.\tCurrency translation\tUSE USD$\n\n13.\tAllow a Financial Institution to treat certain New Accounts held by pre-existing customers as a Pre-existing Account for due diligence purposes\tYES\n14.\tExpanded definition of Related Entity for Investment Entities\tYES\n15.\tGrandfathering rule for bearer shares issued by Exempt Collective\nInvestment Vehicle\tRemoved\n16.\tPhasing in the requirements to report gross proceeds\tNO\n/Table End\n\n"
- source_sentence: >-
What reporting and disclosure requirements are FinTech Participants
expected to comply with when operating within the ADGM RegLab?
sentences:
- >+
INTRODUCTION
For more details on the requirements, and process, for making ensuring
compliance with the Continuous Disclosure framework, please contact the
Listing Authority at [email protected].
- >-
An Authorised Person or Recognised Body must perform an internal Shari'a
review to assess the extent to which the Authorised Person or Recognised
Body complies with fatwa, rulings and guidelines issued by its Shari'a
Supervisory Board.
- >-
Similarly, in using a new or developing technology, such as those
associated with the Regulated Activity of Developing Financial
Technology Services within the RegLab or when undertaking NFTF business,
a Relevant Person should pay specific attention to assessing the
potential for risks associated with Financial Crime that might arise as
a result of implementing that innovative technology. For example, while
the use of eKYC Systems may reduce the risk of impersonation fraud at
customer onboarding, NFTF interaction with the customer may increase the
risk of Financial Crime after a business relationship has been
established, through transaction fraud, money laundering or theft of
digitally stored CDD documentation.
- source_sentence: >-
How does the ADGM expect an Authorised Person to document and demonstrate
adherence to the lines of authority and responsibility established by the
Governing Body for managing Liquidity Risk in compliance with Rule
9.2.2(2)(b)(b)?
sentences:
- >-
An Authorised Person or a Recognised Body must ensure that its internal
audit function undertakes regular reviews and assessments of the
effectiveness of the Authorised Person or Recognised Body's money
laundering policies, procedures, systems and controls, and its
compliance with its obligations in the AML Rulebook.
- "If a Fund intends to change its annual or interim accounting period, the Fund Manager must:\n(a)\tobtain written confirmation from its auditor that the change of its annual accounting period would not result in any significant distortion of the financial position of the Fund; and\n(b)\tobtain the Regulator's prior consent before implementing the change."
- "Guidance on risks to be covered as part of the IRAP. An Authorised Person should consider the following risks, where relevant, in its IRAP:\na.\tCredit Risk, including Large Exposures and concentration risks;\nb.\tMarket Risk;\nc.\tLiquidity Risk;\nd.\tfor Islamic Financial Business involving PSIAs, displaced commercial risk;\ne.\tinterest rate risk in the Non Trading Book;\nf.\tOperational Risk;\ng.\tinternal controls and systems; and\nh.\treputational risk."
- source_sentence: >-
If a Recognised Body receives a notification from the Regulator regarding
an application, which of the following actions would allow the Recognised
Body to avoid the application of section 268 of the Insolvency Regulations
to Market Contracts of a Member or designated non-Member?
sentences:
- "The procedure is that the Regulator must notify the Recognised Body of the application and unless the Recognised Body:\n(a)\ttakes action under its Default Rules;\n(b)\tnotifies the Regulator that it proposes to take action forthwith; or\n(c)\tis directed to take action by the Regulator,\nwithin three Business Days after receipt of that notice section 268 of the Insolvency Regulations will not apply in relation to Market Contracts to which the Member or designated non-Member is a party or to anything done by the Recognised Body for the purpose of, or in connection with, the settlement of Market Contracts."
- >-
The Regulator shall have the power to designate a Regulated Activity or
specified category of Regulated Activity as not being in compliance with
Shari'a in the event that the Regulator believes that such Regulated
Activity or specified category of Regulated Activity involves matters
that are contrary to the aims of Shari'a.
- "An Authorised Person and Recognised Body must:\n(a)\twhen it sends or receives a wire transfer on behalf of a customer, ensure that the wire transfer and any related messages contain accurate originator and beneficiary information;\n(b)\tensure that, while the wire transfer is under its control, the information in (a) remains with the wire transfer and any related message throughout the payment chain;\n(c)\tmonitor wire transfers for the purpose of detecting those wire transfers that do not contain both originator and beneficiary information and take appropriate measures to identify any money laundering risks; and\n(d)\tnot effect wire transfers without the information required under (3) and (4)."
- source_sentence: >-
How should a Relevant Person ensure and demonstrate compliance with both
UNSC Sanctions and U.A.E.-administered Sanctions, specifically Targeted
Financial Sanctions, within the ADGM jurisdiction?
sentences:
- >
REGULATORY REQUIREMENTS - SPOT COMMODITY ACTIVITIES
RIEs operating an MTF or OTF using Accepted Spot Commodities
Authorised Persons that are operating an MTF or OTF wishing to also
operate a RIE will be required to relinquish their FSP upon obtaining a
Recognition Order (to operate the RIE). If licensed by the FSRA to
carry out both Regulated Activities (e.g., operating an MTF and
operating an RIE), the Recognition Order will include a stipulation to
that effect pursuant to MIR Rule 3.4.1.
- "Where a Relevant Person seeks to rely on a Person in (1) it may only do so if and to the extent that:\n(a)\tit immediately obtains the necessary CDD information from the third party in (1);\n(b)\tit takes adequate steps to satisfy itself that certified copies of the documents used to undertake the relevant elements of CDD will be available from the third party on request without delay;\n(c)\tthe Person in (1)(b) to (d) is subject to regulation, including AML/TFS compliance requirements, by a Non-ADGM Financial Services Regulator or other competent authority in a country with AML/TFS regulations which are equivalent to the standards set out in the FATF Recommendations and it is supervised for compliance with such regulations;\n(d)\tthe Person in (1) has not relied on any exception from the requirement to conduct any relevant elements of CDD which the Relevant Person seeks to rely on; and\n(e)\tin relation to (2), the information is up to date."
- "Financial Services Permissions. VC Managers operating in ADGM require a Financial Services Permission (“FSP”) to undertake any Regulated Activity pertaining to VC Funds and/or co-investments by third parties in VC Funds. The Regulated Activities covered by the FSP will be dependent on the VC Managers’ investment strategy and business model.\n(a)\tManaging a Collective Investment Fund: this includes carrying out fund management activities in respect of a VC Fund.\n(b)\tAdvising on Investments or Credit : for VC Managers these activities will be restricted to activities related to co-investment alongside a VC Fund which the VC Manager manages, such as recommending that a client invest in an investee company alongside the VC Fund and on the strategy and structure required to make the investment.\n(c)\tArranging Deals in Investments: VC Managers may also wish to make arrangements to facilitate co-investments in the investee company.\nAuthorisation fees and supervision fees for a VC Manager are capped at USD 10,000 regardless of whether one or both of the additional Regulated Activities in b) and c) above in relation to co-investments are included in its FSP. The FSP will include restrictions appropriate to the business model of a VC Manager."
SentenceTransformer based on Snowflake/snowflake-arctic-embed-m-long
This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-m-long on the csv dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: Snowflake/snowflake-arctic-embed-m-long
- Maximum Sequence Length: 8192 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
- Training Dataset:
- csv
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NomicBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("jebish7/snowflake-arctic-embed-m-long_MNR_1")
# Run inference
sentences = [
'How should a Relevant Person ensure and demonstrate compliance with both UNSC Sanctions and U.A.E.-administered Sanctions, specifically Targeted Financial Sanctions, within the ADGM jurisdiction?',
'Where a Relevant Person seeks to rely on a Person in (1) it may only do so if and to the extent that:\n(a)\tit immediately obtains the necessary CDD information from the third party in (1);\n(b)\tit takes adequate steps to satisfy itself that certified copies of the documents used to undertake the relevant elements of CDD will be available from the third party on request without delay;\n(c)\tthe Person in (1)(b) to (d) is subject to regulation, including AML/TFS compliance requirements, by a Non-ADGM Financial Services Regulator or other competent authority in a country with AML/TFS regulations which are equivalent to the standards set out in the FATF Recommendations and it is supervised for compliance with such regulations;\n(d)\tthe Person in (1) has not relied on any exception from the requirement to conduct any relevant elements of CDD which the Relevant Person seeks to rely on; and\n(e)\tin relation to (2), the information is up to date.',
'REGULATORY REQUIREMENTS - SPOT COMMODITY ACTIVITIES\nRIEs operating an MTF or OTF using Accepted Spot Commodities\nAuthorised Persons that are operating an MTF or OTF wishing to also operate a RIE will be required to relinquish their FSP upon obtaining a Recognition Order (to operate the RIE). If licensed by the FSRA to carry out both Regulated Activities (e.g., operating an MTF and operating an RIE), the Recognition Order will include a stipulation to that effect pursuant to MIR Rule 3.4.1.\n',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
csv
- Dataset: csv
- Size: 29,547 training samples
- Columns:
Question
andpositive
- Approximate statistics based on the first 1000 samples:
Question positive type string string details - min: 18 tokens
- mean: 34.91 tokens
- max: 83 tokens
- min: 13 tokens
- mean: 118.51 tokens
- max: 1090 tokens
- Samples:
Question positive Under which circumstances is a Mining Reporting Entity exempt from immediate disclosure of material information about its mining activities according to the FSRA guidelines?
INTERACTION OF CHAPTER 11 WITH OTHER RULE DISCLOSURE OBLIGATIONS. Prior to a Mining Reporting Entity having all the information available to it, the FSRA considers that whatever material information it may have about the mining activity will generally be insufficiently definite to warrant disclosure under the Rules. Therefore, provided the material information is and remains confidential, and the FSRA has not formed the view that the information ceases to remain confidential (e.g., where there are exceptions from disclosing the information), the material information is not immediately required to be disclosed under Rule 7.2.1. For more information, please refer to Chapter 7 of the Rules, and any relevant Guidance that the FSRA may publish from time in relation to the FSRA’s expectations as to how Reporting Entities are to comply with Chapter 7.
What specific IAASB standards or other standards acceptable to the Regulator are required for the audit of a Public Listed Company's financial statements?
Where an Authorised Person does not hold or control any Client Money as at the date on which the Authorised Person's audited statement of financial position was prepared, the Regulator expects that a nil balance be stated to comply with Rule 6.6.6.
How does the ADGM monitor compliance with the principles of effective dialogue with shareholders, and what are the consequences for companies that fail to establish such a dialogue?
Audit committee. The Board as a whole has responsibility for ensuring that a satisfactory dialogue with Shareholders takes place. Such dialogue should be based on the mutual understanding of objectives and provision of adequate information relating to the Reporting Entity including financial information, and how the business and affairs of the Reporting Entity are carried out.
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size
: 4learning_rate
: 2e-05num_train_epochs
: 1warmup_ratio
: 0.1batch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: noprediction_loss_only
: Trueper_device_train_batch_size
: 4per_device_eval_batch_size
: 8per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falsebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss |
---|---|---|
0.0271 | 100 | 0.6411 |
0.0541 | 200 | 0.3289 |
0.0812 | 300 | 0.2395 |
0.1083 | 400 | 0.2711 |
0.1354 | 500 | 0.2746 |
0.1624 | 600 | 0.2602 |
0.1895 | 700 | 0.285 |
0.2166 | 800 | 0.2965 |
0.2436 | 900 | 0.2772 |
0.2707 | 1000 | 0.3043 |
0.2978 | 1100 | 0.3059 |
0.3249 | 1200 | 0.316 |
0.3519 | 1300 | 0.2765 |
0.3790 | 1400 | 0.249 |
0.4061 | 1500 | 0.2601 |
0.4331 | 1600 | 0.2538 |
0.4602 | 1700 | 0.2443 |
0.4873 | 1800 | 0.2151 |
0.5143 | 1900 | 0.2335 |
0.5414 | 2000 | 0.2611 |
0.5685 | 2100 | 0.2557 |
0.5956 | 2200 | 0.2793 |
0.0694 | 100 | 0.2141 |
0.1389 | 200 | 0.273 |
0.2083 | 300 | 0.295 |
0.2778 | 400 | 0.2079 |
0.3472 | 500 | 0.2556 |
0.4167 | 600 | 0.252 |
0.4861 | 700 | 0.2142 |
0.5556 | 800 | 0.2181 |
0.625 | 900 | 0.2347 |
0.6944 | 1000 | 0.1754 |
0.7639 | 1100 | 0.2313 |
0.8333 | 1200 | 0.2104 |
0.9028 | 1300 | 0.2435 |
0.9722 | 1400 | 0.2399 |
Framework Versions
- Python: 3.10.14
- Sentence Transformers: 3.1.1
- Transformers: 4.45.2
- PyTorch: 2.4.0
- Accelerate: 0.34.2
- Datasets: 3.0.1
- Tokenizers: 0.20.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}