AdamLucek's picture
Add new SentenceTransformer model
1aaa6f3 verified
metadata
language:
  - en
license: apache-2.0
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:1440
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
base_model: nomic-ai/modernbert-embed-base
widget:
  - source_sentence: What section of the Code of Federal Regulations is quoted?
    sentences:
      - >-
        and other legal relations of any interested party seeking such
        declaration.”  28 U.S.C. § 2201(a).  

        This statute “is not an independent source of federal jurisdiction”;
        rather, “the availability of 

        such relief presupposes the existence of a judicially remediable
        right.”  Schilling v. Rogers, 363 

        U.S. 666, 677 (1960).  The Court independently has jurisdiction here
        under the mandamus
      - >-
        appropriate only when the nature of the work is sporadic and
        unpredictable so that a tour of duty 

        cannot be regularly scheduled in advance.”  Pl.’s Mem. at 18 (quoting 5
        C.F.R. § 340.403(a)).  

        This regulation explicitly distinguishes “intermittent” status from
        “part-time” status, as it says 

        that “[w]hen an agency is able to schedule work in advance on a regular
        basis, it has an
      - >-
        its discretion, a reviewing court looks to the trial court’s “stated
        justification for refusing to 

        modify” the order. Skolnick, 191 Ill. 2d at 226.  
         
         
        In the case at bar, the one-sentence April 25 order did not provide any
        reasons at all. The 

        losing party drafted the order without any stated reasons, although a
        lack of stated reasons may
  - source_sentence: Which office was determined to be an agency in the Soucie case?
    sentences:
      - >-
        inquiry”); Doe v. Skyline Automobiles, Inc., 375 F. Supp. 3d 401, 405-06
        (S.D.N.Y. 2019) 

        (“other factors must be taken into consideration and analyzed in
        comparison to the public’s 

        interest and the interests of the opposing parties”). 
         
         
        Illinois has taken steps to protect individuals’ private information.
        Examples include the
      - >-
        Aside from whether the Department’s “approach to artificial intelligence
        development and 

        implementation” should be considered “critical infrastructure,” the
        Department’s affidavit is 
         
         
        5

        deficient in showing that its withholdings qualify as “critical
        infrastructure security information” 

        in other ways.  For example, the affidavit fails to explain how the
        disclosure of the withheld infor-
      - >-
        whether an entity wields “substantial independent authority”: 
        investigative power and authority 

        to make final and binding decisions. 

        Consider first Soucie.  The Circuit held that the Office of Science and
        Technology 

        (“OST”) was an agency because, beyond advising the President, it had the
        “independent function
  - source_sentence: What is the appellant's burden on appeal?
    sentences:
      - >-
        Defs.’ Reply at 7–8, 8 n.1.  It cites Judicial Watch, Inc. v. Department
        of Energy, 412 F.3d 125 

        (D.C. Cir. 2005), which dealt with the records of employees that the
        Department of Energy 

        (“DOE”) had detailed to the National Energy Policy Development Group
        (“NEPDG”).  Id. at 

        132.  The Government quotes the court’s statement that “the records
        those employees created or
      - >-
        records available for inspection and copying is a violation of 5 U.S.C.
        app. 2 § 10(b) and 

        constitutes a failure to perform a duty owed to EPIC within the meaning
        of 28 U.S.C. § 1361.”  

        Id. .  Both counts seek “a writ of mandamus” compelling the Commission
        and its officers to 

        comply with FACA.  Id. , 139.  These counts make clear that EPIC seeks
        mandamus relief
      - >-
        counsel now cannot fairly contend that the trial court did not consider
        all the facts, especially 

        when [d]efendant’s counsel offers no court transcript to show
        otherwise.” On appeal, it is 

        generally the appellant’s burden to provide the reviewing court with a
        sufficient record to 

        establish the error that he complains of. Webster v. Hartman, 195 Ill.
        2d 426, 436 (2001). “[A]
  - source_sentence: What does the text refer to as a 'statutory distinction'?
    sentences:
      - >-
        inconsistency in deeming the same entity an advisory committee and an
        agency.”  Defs.’ Reply 

        at 8.  The problem, according to the Government, is that FACA generally
        requires disclosure of 

        records, yet Exemption 5 would shield a portion of these records from
        public view, which would 

        undermine FACA’s “purpose.”  Id. at 8–9.   Gates, Wolfe, and the 1988
        OLC opinion echo this
      - >-
        agencies are operating arms of government characterized by ‘substantial
        independent authority in 

        the exercise of specific functions.’”  Disclosure of Advisory Comm.
        Deliberative Materials, 12 

        Op. O.L.C. 73, 81 (1988).  This “statutory distinction,” it concludes,
        signifies that “advisory 

        committees are not agencies.”  Id.
      - |-
        the Hon. Israel A. Desierto, Judge, presiding. 
         
         
        Judgment 
        Affirmed. 
         
        Counsel on 
        Appeal 
         
        Victor P. Henderson and Colin Quinn Commito, of Henderson Parks, 
        LLC, of Chicago, for appellant. 
         
        Tamara N. Holder, Law Firm of Tamara N. Holder LLC, of Chicago, 
        for appellee. 
         
         
         
        Panel 
         
        PRESIDING JUSTICE ODEN JOHNSON delivered the judgment of 
        the court, with opinion.
  - source_sentence: >-
      What do the newly enacted laws prohibit hospitals from doing regarding
      sexual assault victims?
    sentences:
      - >-
        exclusion for committees “composed wholly of . . . permanent part-time .
        . . employees.”  5 

        U.S.C. app. 2 § 3(2). 

        32 

        A second, independent reason why the Commission does not fall within
        this exclusion is 

        that its members are not “part-time” federal employees.  Instead, they
        are “intermittent” 

        employees.  EPIC points to a regulation stating that “[a]n intermittent
        work schedule is
      - >-
        committee, board, commission, council, conference, panel, task force, or
        other similar group, or 

        any subcommittee or other subgroup thereof.”  Id. § 3(2).  Second, it
        must be “established by 

        statute or reorganization plan,” “established or utilized by the
        President,” or “established or 

        utilized by one or more agencies.”  Id.  Third, it must be “established”
        or “utilized” “in the
      - >-
        confidential advisors (735 ILCS 5/8-804(c) (West 2022)) and prohibit
        hospitals treating sexual 

        assault victims from directly billing the victims for the services,
        communicating with victims 

        about a bill, or referring overdue bills to collection agencies or
        credit reporting agencies. 410 

        ILCS 70/7.5(a)(1)-(4) (West 2022). These recently enacted laws encourage
        victims to report
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
model-index:
  - name: Fine-tuned with [QuicKB](https://github.com/ALucek/QuicKB)
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 768
          type: dim_768
        metrics:
          - type: cosine_accuracy@1
            value: 0.51875
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.69375
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.75
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.83125
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.51875
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.23125
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.14999999999999997
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.08312499999999999
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.51875
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.69375
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.75
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.83125
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.671534966140965
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.6211160714285715
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.6261949467277568
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 512
          type: dim_512
        metrics:
          - type: cosine_accuracy@1
            value: 0.49375
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.7
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.73125
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.825
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.49375
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.2333333333333333
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.14625
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.08249999999999999
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.49375
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.7
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.73125
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.825
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.6607544642083831
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.6085367063492064
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.6146313607229802
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 256
          type: dim_256
        metrics:
          - type: cosine_accuracy@1
            value: 0.4375
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.6875
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.725
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.79375
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.4375
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.22916666666666666
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.145
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.079375
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.4375
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.6875
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.725
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.79375
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.6224957341997419
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.566939484126984
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.5740997074969412
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 128
          type: dim_128
        metrics:
          - type: cosine_accuracy@1
            value: 0.40625
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.625
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.69375
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.775
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.40625
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.20833333333333331
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.13874999999999998
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.07749999999999999
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.40625
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.625
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.69375
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.775
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.5931742895464828
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.5348859126984128
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.5417826806767716
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 64
          type: dim_64
        metrics:
          - type: cosine_accuracy@1
            value: 0.30625
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.4875
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.6
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.6875
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.30625
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.16249999999999998
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.12
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.06875
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.30625
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.4875
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.6
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.6875
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.4854299754851493
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.42175347222222237
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.4326739799760461
            name: Cosine Map@100

Fine-tuned with QuicKB

This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: nomic-ai/modernbert-embed-base
  • Maximum Sequence Length: 1024 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("AdamLucek/modernbert-embed-quickb-video")
# Run inference
sentences = [
    'What do the newly enacted laws prohibit hospitals from doing regarding sexual assault victims?',
    'confidential advisors (735 ILCS 5/8-804(c) (West 2022)) and prohibit hospitals treating sexual \nassault victims from directly billing the victims for the services, communicating with victims \nabout a bill, or referring overdue bills to collection agencies or credit reporting agencies. 410 \nILCS 70/7.5(a)(1)-(4) (West 2022). These recently enacted laws encourage victims to report',
    'exclusion for committees “composed wholly of . . . permanent part-time . . . employees.”  5 \nU.S.C. app. 2 § 3(2). \n32 \nA second, independent reason why the Commission does not fall within this exclusion is \nthat its members are not “part-time” federal employees.  Instead, they are “intermittent” \nemployees.  EPIC points to a regulation stating that “[a]n intermittent work schedule is',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric dim_768 dim_512 dim_256 dim_128 dim_64
cosine_accuracy@1 0.5188 0.4938 0.4375 0.4062 0.3063
cosine_accuracy@3 0.6937 0.7 0.6875 0.625 0.4875
cosine_accuracy@5 0.75 0.7312 0.725 0.6937 0.6
cosine_accuracy@10 0.8313 0.825 0.7937 0.775 0.6875
cosine_precision@1 0.5188 0.4938 0.4375 0.4062 0.3063
cosine_precision@3 0.2313 0.2333 0.2292 0.2083 0.1625
cosine_precision@5 0.15 0.1462 0.145 0.1387 0.12
cosine_precision@10 0.0831 0.0825 0.0794 0.0775 0.0688
cosine_recall@1 0.5188 0.4938 0.4375 0.4062 0.3063
cosine_recall@3 0.6937 0.7 0.6875 0.625 0.4875
cosine_recall@5 0.75 0.7312 0.725 0.6937 0.6
cosine_recall@10 0.8313 0.825 0.7937 0.775 0.6875
cosine_ndcg@10 0.6715 0.6608 0.6225 0.5932 0.4854
cosine_mrr@10 0.6211 0.6085 0.5669 0.5349 0.4218
cosine_map@100 0.6262 0.6146 0.5741 0.5418 0.4327

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,440 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 7 tokens
    • mean: 15.14 tokens
    • max: 29 tokens
    • min: 57 tokens
    • mean: 97.82 tokens
    • max: 161 tokens
  • Samples:
    anchor positive
    What must the advisory committee make available for public inspection? advisory committee shall be available for public inspection and copying . . . until the advisory
    committee ceases to exist.” Id. § 10(b). Unlike FOIA, this provision looks forward. It requires
    committees to take affirmative steps to make their records are public, even absent a request.
    FACA’s definition of “advisory committee” has four parts. First, it includes “any
    What did the landlords fail to alert the court about? court documents containing fake citations, we conclude that
    imposing monetary sanctions or dismissing this appeal would be
    disproportionate to Al-Hamim’s violation of the Appellate Rules.

    23
    Further, in their answer brief, the landlords failed to alert this court
    to the hallucinations in Al-Hamim’s opening brief and did not
    request an award of attorney fees against Al-Hamim. Under the
    On what date was the motion served on the plaintiff’s counsel? also alleged (1) that plaintiff violated section 2-401(e) and (2) that she lacked good cause to
    file anonymously because she signed an affidavit in her own name in another case with similar
    allegations. The April 13 motion contains a “Certificate of Service” stating that it was served
    on plaintiff’s counsel by e-mail on April 13.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
1.0 3 0.6493 0.6372 0.5987 0.5536 0.4520
2.0 6 0.6685 0.6514 0.6208 0.5916 0.4859
2.7111 8 0.6715 0.6608 0.6225 0.5932 0.4854
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.4.0
  • Transformers: 4.48.1
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}