SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("brilan/procedure-tool-matching_10_epochs")
# Run inference
sentences = [
    'list running processes',
    'Displays information about services and drivers on a local or remote computer.',
    'Displays the directory structure of a path or of the disk in a drive graphically.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.8614
dot_accuracy 0.1386
manhattan_accuracy 0.856
euclidean_accuracy 0.8614
max_accuracy 0.8614

Triplet

Metric Value
cosine_accuracy 1.0
dot_accuracy 0.0
manhattan_accuracy 1.0
euclidean_accuracy 1.0
max_accuracy 1.0

Triplet

Metric Value
cosine_accuracy 1.0
dot_accuracy 0.0
manhattan_accuracy 1.0
euclidean_accuracy 1.0
max_accuracy 1.0

Training Details

Training Dataset

Unnamed Dataset

  • Size: 7,385 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 5 tokens
    • mean: 9.6 tokens
    • max: 17 tokens
    • min: 5 tokens
    • mean: 18.53 tokens
    • max: 47 tokens
    • min: 5 tokens
    • mean: 17.9 tokens
    • max: 57 tokens
  • Samples:
    anchor positive negative
    added user accounts to the User and Admin groups use to create a new local user account on a Windows system. Adds a new subkey or entry to the registry.
    get cached credentials manipulate privilege on process. Use to display a list of computers and shared resources on a network.
    used compromised domain accounts to gain access to the target environment allows users to execute commands remotely on target systems using various methods including WMI, SMB, SSH, RDP, and PowerShell Copies files and directories including subdirectories.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 1,847 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 5 tokens
    • mean: 9.74 tokens
    • max: 17 tokens
    • min: 5 tokens
    • mean: 18.21 tokens
    • max: 47 tokens
    • min: 5 tokens
    • mean: 18.3 tokens
    • max: 57 tokens
  • Samples:
    anchor positive negative
    obtain information about the domain It retrieves a list of current network connections. Saves a copy of specified subkeys, entries, and values of the registry in a specified file.
    obtain credentials from Vault files retrieve stored passwords from various software and operating systems allows users to execute commands remotely on target systems using various methods including WMI, SMB, SSH, RDP, and PowerShell
    obtain information about the domain Get user name and group information along with the respective security identifiers (SID) claims privileges logon identifier (logon ID) for the current user on the local system. Creates a new shadow copy of a specified volume.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 10
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss dev_cosine_accuracy dev_max_accuracy test_cosine_accuracy
0 0 - - 0.8614 0.8614 -
0.2165 100 2.3461 1.3114 0.9995 - -
0.4329 200 1.5881 1.2268 0.9995 - -
0.6494 300 1.5293 1.2106 0.9995 - -
0.8658 400 1.4955 1.1909 1.0 - -
1.0823 500 0.8778 1.2624 0.9978 - -
1.2987 600 0.0 1.2644 0.9989 - -
1.2143 700 1.4556 1.1650 1.0 - -
1.4307 800 1.4347 1.1479 1.0 - -
1.6472 900 1.4549 1.1127 1.0 - -
1.8636 1000 1.4315 1.1446 1.0 - -
2.0801 1100 0.8624 1.1487 1.0 - -
2.2965 1200 0.0004 1.1511 0.9984 - -
2.2121 1300 1.3961 1.1081 1.0 - -
2.4286 1400 1.4033 1.1076 1.0 - -
2.6450 1500 1.4211 1.1022 1.0 - -
2.8615 1600 1.4044 1.1364 1.0 - -
3.0779 1700 0.864 1.1135 1.0 - -
3.2944 1800 0.0005 1.1156 1.0 - -
3.2100 1900 1.353 1.0914 1.0 - -
3.4264 2000 1.3805 1.0958 1.0 - -
3.6429 2100 1.4068 1.0925 1.0 - -
3.8593 2200 1.3874 1.1184 1.0 - -
4.0758 2300 0.8734 1.0992 1.0 - -
4.2922 2400 0.0011 1.1007 1.0 - -
4.2078 2500 1.3287 1.0853 1.0 - -
4.4242 2600 1.3691 1.0944 1.0 - -
4.6407 2700 1.4026 1.0906 1.0 - -
4.8571 2800 1.3816 1.0926 1.0 - -
5.0736 2900 0.8775 1.0915 1.0 - -
5.2900 3000 0.0007 1.0924 1.0 - -
5.2056 3100 1.3095 1.0838 1.0 - -
5.4221 3200 1.3669 1.0875 1.0 - -
5.6385 3300 1.389 1.0869 1.0 - -
5.8550 3400 1.3741 1.0835 1.0 - -
6.0714 3500 0.8852 1.0864 1.0 - -
6.2879 3600 0.0005 1.0866 1.0 - -
6.2035 3700 1.2937 1.0793 1.0 - -
6.4199 3800 1.3618 1.0852 1.0 - -
6.6364 3900 1.3848 1.0847 1.0 - -
6.8528 4000 1.3722 1.0799 1.0 - -
7.0693 4100 0.8995 1.0827 1.0 - -
7.2857 4200 0.0006 1.0826 1.0 - -
7.2013 4300 1.2766 1.0775 1.0 - -
7.4177 4400 1.3559 1.0791 1.0 - -
7.6342 4500 1.3806 1.0793 1.0 - -
7.8506 4600 1.3636 1.0788 1.0 - -
7.8939 4620 - - - - 1.0

Framework Versions

  • Python: 3.11.5
  • Sentence Transformers: 3.1.0
  • Transformers: 4.44.2
  • PyTorch: 2.4.1+cu121
  • Accelerate: 1.0.0
  • Datasets: 3.0.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
85
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for brilan/procedure-tool-matching_10_epochs

Finetuned
(208)
this model

Evaluation results