SentenceTransformer based on lufercho/my-finetuned-bert-mlm

This is a sentence-transformers model finetuned from lufercho/my-finetuned-bert-mlm. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: lufercho/my-finetuned-bert-mlm
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("lufercho/my-finetuned-sentence-bert")
# Run inference
sentences = [
    'Maximin affinity learning of image segmentation',
    '  Images can be segmented by first using a classifier to predict an affinity\ngraph that reflects the degree to which image pixels must be grouped together\nand then partitioning the graph to yield a segmentation. Machine learning has\nbeen applied to the affinity classifier to produce affinity graphs that are\ngood in the sense of minimizing edge misclassification rates. However, this\nerror measure is only indirectly related to the quality of segmentations\nproduced by ultimately partitioning the affinity graph. We present the first\nmachine learning algorithm for training a classifier to produce affinity graphs\nthat are good in the sense of producing segmentations that directly minimize\nthe Rand index, a well known segmentation performance measure. The Rand index\nmeasures segmentation performance by quantifying the classification of the\nconnectivity of image pixel pairs after segmentation. By using the simple graph\npartitioning algorithm of finding the connected components of the thresholded\naffinity graph, we are able to train an affinity classifier to directly\nminimize the Rand index of segmentations resulting from the graph partitioning.\nOur learning algorithm corresponds to the learning of maximin affinities\nbetween image pixel pairs, which are predictive of the pixel-pair connectivity.\n',
    '  Changes in the UK electricity market mean that domestic users will be\nrequired to modify their usage behaviour in order that supplies can be\nmaintained. Clustering allows usage profiles collected at the household level\nto be clustered into groups and assigned a stereotypical profile which can be\nused to target marketing campaigns. Fuzzy C Means clustering extends this by\nallowing each household to be a member of many groups and hence provides the\nopportunity to make personalised offers to the household dependent on their\ndegree of membership of each group. In addition, feedback can be provided on\nhow user\'s changing behaviour is moving them towards more "green" or cost\neffective stereotypical usage.\n',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 5,000 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 4 tokens
    • mean: 13.41 tokens
    • max: 38 tokens
    • min: 37 tokens
    • mean: 201.32 tokens
    • max: 512 tokens
    • min: 24 tokens
    • mean: 204.09 tokens
    • max: 512 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    Clustering with Transitive Distance and K-Means Duality Recent spectral clustering methods are a propular and powerful technique for
    data clustering. These methods need to solve the eigenproblem whose
    computational complexity is $O(n^3)$, where $n$ is the number of data samples.
    In this paper, a non-eigenproblem based clustering method is proposed to deal
    with the clustering problem. Its performance is comparable to the spectral
    clustering algorithms but it is more efficient with computational complexity
    $O(n^2)$. We show that with a transitive distance and an observed property,
    called K-means duality, our algorithm can be used to handle data sets with
    complex cluster shapes, multi-scale clusters, and noise. Moreover, no
    parameters except the number of clusters need to be set in our algorithm.
    We show that the log-likelihood of several probabilistic graphical models is
    Lipschitz continuous with respect to the lp-norm of the parameters. We discuss
    several implications of Lipschitz parametrization. We present an upper bound of
    the Kullback-Leibler divergence that allows understanding methods that penalize
    the lp-norm of differences of parameters as the minimization of that upper
    bound. The expected log-likelihood is lower bounded by the negative lp-norm,
    which allows understanding the generalization ability of probabilistic models.
    The exponential of the negative lp-norm is involved in the lower bound of the
    Bayes error rate, which shows that it is reasonable to use parameters as
    features in algorithms that rely on metric spaces (e.g. classification,
    dimensionality reduction, clustering). Our results do not rely on specific
    algorithms for learning the structure or parameters. We show preliminary
    results for activity recognition and temporal segmentation.
    Clustering Dynamic Web Usage Data Most classification methods are based on the assumption that data conforms to
    a stationary distribution. The machine learning domain currently suffers from a
    lack of classification techniques that are able to detect the occurrence of a
    change in the underlying data distribution. Ignoring possible changes in the
    underlying concept, also known as concept drift, may degrade the performance of
    the classification model. Often these changes make the model inconsistent and
    regular updatings become necessary. Taking the temporal dimension into account
    during the analysis of Web usage data is a necessity, since the way a site is
    visited may indeed evolve due to modifications in the structure and content of
    the site, or even due to changes in the behavior of certain user groups. One
    solution to this problem, proposed in this article, is to update models using
    summaries obtained by means of an evolutionary approach based on an intelligent
    clustering approach. We carry out various clustering str...
    Exponential family extensions of principal component analysis (EPCA) have
    received a considerable amount of attention in recent years, demonstrating the
    growing need for basic modeling tools that do not assume the squared loss or
    Gaussian distribution. We extend the EPCA model toolbox by presenting the first
    exponential family multi-view learning methods of the partial least squares and
    canonical correlation analysis, based on a unified representation of EPCA as
    matrix factorization of the natural parameters of exponential family. The
    models are based on a new family of priors that are generally usable for all
    such factorizations. We also introduce new inference strategies, and
    demonstrate how the methods outperform earlier ones when the Gaussianity
    assumption does not hold.
    Trading USDCHF filtered by Gold dynamics via HMM coupling We devise a USDCHF trading strategy using the dynamics of gold as a filter.
    Our strategy involves modelling both USDCHF and gold using a coupled hidden
    Markov model (CHMM). The observations will be indicators, RSI and CCI, which
    will be used as triggers for our trading signals. Upon decoding the model in
    each iteration, we can get the next most probable state and the next most
    probable observation. Hopefully by taking advantage of intermarket analysis and
    the Markov property implicit in the model, trading with these most probable
    values will produce profitable results.
    Most existing machine learning classifiers are highly vulnerable to
    adversarial examples. An adversarial example is a sample of input data which
    has been modified very slightly in a way that is intended to cause a machine
    learning classifier to misclassify it. In many cases, these modifications can
    be so subtle that a human observer does not even notice the modification at
    all, yet the classifier still makes a mistake. Adversarial examples pose
    security concerns because they could be used to perform an attack on machine
    learning systems, even if the adversary has no access to the underlying model.
    Up to now, all previous work have assumed a threat model in which the adversary
    can feed data directly into the machine learning classifier. This is not always
    the case for systems operating in the physical world, for example those which
    are using signals from cameras and other sensors as an input. This paper shows
    that even in such physical world scenarios, machine learning systems are
    vul...
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 2
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
1.5974 500 0.8647

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.1
  • Transformers: 4.46.2
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.1.1
  • Datasets: 3.1.0
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
15
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for lufercho/my-finetuned-sentence-bert

Finetuned
(3)
this model