ModernBERT Embed base Legal Matryoshka
This is a sentence-transformers model finetuned from nomic-ai/nomic-embed-text-v2-moe on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: nomic-ai/nomic-embed-text-v2-moe
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- json
- Language: en
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NomicBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("tsss1/expressvpn_embeddingmodel")
# Run inference
sentences = [
'Last updated: October 21, 2024\n\nThis guide is for users who are having issues streaming Max (formerly HBO Max) while connected to the VPN.\n\nTo comply with the Max Terms of Use and ExpressVPN Terms of Service, you should connect to a server location that matches the country where you are currently located.\n\nJump to…\n\n1. Change to a different VPN server location\n2. Sign out of the Max app, then sign in again\n3. Watch Max using your browser\n4. Contact ExpressVPN Support\n\n1. Change to a different VPN server location\n\nIf you are a U.S. user having issues streaming Max, try changing to these VPN server locations in the following order:\n\nUSA – San Francisco\nUSA – Washington DC\nUSA – New York\nUSA – Los Angeles – 1\n\nBelow are instructions for changing your VPN server location on:\n\nWindows\nMac\niOS\nAndroid\nAndroid TV\nApple TV\nLinux\nRouters\nIf you are streaming via the Max app, you should force-close it and reopen it each time you change location. Below are instructions for force-closing an app on:iOS: Swipe up from the bottom of the homescreen, keeping your finger pressed until app previews appear at left. Swipe to find the Max app preview, then swipe up to close the app.\n\nAndroid: On your Android device, open your multitasking interface. The way to do this varies depending on your device:\n\nIf your device has three icons at the bottom of the screen, tap either the three vertical lines icon or the square icon.\nIf your device features a single horizontal line at the bottom of the screen, swipe up from the bottom to the middle of the screen, hold for a second, then release.\n\nNext, swipe to find the Max app preview, then swipe to force-close the app. The direction you need to swipe will vary depending on your device.\n\nAndroid TV: Go to Settings, select Apps, and scroll to find the Max app. Select the app, then select Force Stop.\n\nFire TV/Fire Stick: Go to Settings, select Applications, select Manage Installed Applications. Scroll to find the Max app. Select the app, then select Force Stop.\n\nApple TV: Double-click the TV icon on your remote to see the apps currently running. Swipe to find the Max app preview, then swipe up to close the app.\n\nIf you are a non-U.S. user having issues streaming Max, proceed to the next step.\n\nNeed help?\xa0Contact the ExpressVPN Support Team for immediate assistance.\n\nBack to top\n\n2. Sign out of the Max app, then sign in again\n\nIf you are using the Max app, sign out of it, restart your device, and then sign back in.\n\nNeed help?\xa0Contact the ExpressVPN Support Team for immediate assistance.\n\nBack to top\n\n3. Watch Max on your browser\n\nTry streaming Max via your browser by going to https://www.max.com/login and signing in with your Max account details.\n\nIf you are having issues streaming Max from your browser while connected to the VPN:\n\nGet the ExpressVPN browser extension (available for Windows, Mac, and Linux). To use the browser extension, you must also have the ExpressVPN app installed on your computer.\nU.S. users should try connecting to these server locations in the following order:\nUSA – San Francisco\nUSA – Washington DC\nUSA – New York\nUSA – Los Angeles – 1\n\nNon-U.S. users should proceed to the next step.\n\nTry using a different browser. The ExpressVPN browser extension is available on Windows, Mac, and Linux, and it works with Chrome, Firefox, Vivaldi, Chromium, Brave, and Microsoft Edge. The ExpressVPN app must also be installed.\n\nNeed help?\xa0Contact the ExpressVPN Support Team for immediate assistance.\n\nBack to top\n\n4. Contact Support\n\nIf you are still unable to stream Max while connected to the VPN, contact the ExpressVPN Support Team.\n\nBack to top\n\nExpressVPN is optimized to work with Max so you can enjoy online privacy and security all the time, without the VPN interfering. It should never be used as a means of copyright circumvention, which is strictly against our Terms of Service. As we cannot see or control what you do while connected to our VPN, you are responsible at all times for complying with our terms, the Max Terms of Use, and any applicable laws. Compliance requires you to be located in the U.S. while streaming Max with ExpressVPN.\nWas this article helpful?\nYes No',
'Troubleshooting steps for streaming Max',
'I can help you with various questions and issues related to ExpressVPN. What do you need assistance with: \n\n',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Datasets:
dim_768
,dim_512
,dim_256
,dim_128
anddim_64
- Evaluated with
InformationRetrievalEvaluator
Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 |
---|---|---|---|---|---|
cosine_accuracy@1 | 0.6 | 0.6 | 0.625 | 0.625 | 0.625 |
cosine_accuracy@3 | 0.775 | 0.775 | 0.775 | 0.725 | 0.7 |
cosine_accuracy@5 | 0.825 | 0.875 | 0.85 | 0.775 | 0.75 |
cosine_accuracy@10 | 0.875 | 0.9 | 0.875 | 0.875 | 0.8 |
cosine_precision@1 | 0.6 | 0.6 | 0.625 | 0.625 | 0.625 |
cosine_precision@3 | 0.2667 | 0.2667 | 0.2667 | 0.25 | 0.2417 |
cosine_precision@5 | 0.17 | 0.18 | 0.175 | 0.16 | 0.155 |
cosine_precision@10 | 0.09 | 0.0925 | 0.09 | 0.09 | 0.0825 |
cosine_recall@1 | 0.5875 | 0.5875 | 0.6125 | 0.6125 | 0.6125 |
cosine_recall@3 | 0.775 | 0.775 | 0.775 | 0.725 | 0.7 |
cosine_recall@5 | 0.825 | 0.875 | 0.85 | 0.775 | 0.75 |
cosine_recall@10 | 0.875 | 0.9 | 0.875 | 0.875 | 0.8 |
cosine_ndcg@10 | 0.7465 | 0.7552 | 0.751 | 0.7407 | 0.7078 |
cosine_mrr@10 | 0.7042 | 0.7083 | 0.7108 | 0.6992 | 0.6793 |
cosine_map@100 | 0.7103 | 0.7124 | 0.7161 | 0.7051 | 0.6907 |
Training Details
Training Dataset
json
- Dataset: json
- Size: 357 training samples
- Columns:
positive
andanchor
- Approximate statistics based on the first 357 samples:
positive anchor type string string details - min: 21 tokens
- mean: 322.47 tokens
- max: 512 tokens
- min: 6 tokens
- mean: 32.25 tokens
- max: 512 tokens
- Samples:
positive anchor I'd like to discuss common issues that users face when using ExpressVPN.
1. Slow speeds and connectivity issues.
2. Difficulty in setting up ExpressVPN on various devices such as routers, smart TVs, and gaming consoles.
3. Issues with unblocking geo-restricted content on popular streaming services like Netflix, Hulu, and BBC iPlayer.
4. Troubleshooting failed connections and unable to connect to a VPN server.
Which one of these topics would you like to discuss further, or is there something else you'd like to bring up?I'd be happy to help with any questions or concerns you have about ExpressVPN. What would you like to know or discuss?
I'd like to provide information about ExpressVPN, but I think it would be more helpful to get some assistance from you.
I'd like to know more about the process of setting up ExpressVPN on a router. Could you explain the general steps to follow and any potential issues that users may encounter during the setup process? Additionally, are there any specific router models that are known to be compatible with ExpressVPN?I can help you with any question you have about ExpressVPN. What is it that you need help with?
Last updated: January 11, 2023
Important: If your ExpressVPN free trial or subscription was initiated via the iOS App Store, refer to this guide.
This guide will explain how to get or extend an ExpressVPN subscription for iOS users who did not get a free trial or subscription via the App Store.
Note: Upgrades and renewals are not currently available within the ExpressVPN app for iOS.ExpressVPN iOS free trial or subscription expiring
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: epochper_device_train_batch_size
: 2per_device_eval_batch_size
: 2gradient_accumulation_steps
: 4learning_rate
: 2e-05num_train_epochs
: 7lr_scheduler_type
: cosinewarmup_ratio
: 0.1bf16
: Truetf32
: Falseload_best_model_at_end
: Trueoptim
: adamw_torch_fusedbatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: epochprediction_loss_only
: Trueper_device_train_batch_size
: 2per_device_eval_batch_size
: 2per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 4eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 7max_steps
: -1lr_scheduler_type
: cosinelr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Truefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Falselocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torch_fusedoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
---|---|---|---|---|---|---|---|
0.2235 | 10 | 2.9921 | - | - | - | - | - |
0.4469 | 20 | 0.9824 | - | - | - | - | - |
0.6704 | 30 | 0.6762 | - | - | - | - | - |
0.8939 | 40 | 0.0133 | - | - | - | - | - |
0.9832 | 44 | - | 0.7669 | 0.7701 | - | - | - |
0.2235 | 10 | 0.0179 | - | - | - | - | - |
0.4469 | 20 | 0.2714 | - | - | - | - | - |
0.6704 | 30 | 0.0104 | - | - | - | - | - |
0.8939 | 40 | 0.0015 | - | - | - | - | - |
0.9832 | 44 | - | 0.7442 | 0.7594 | 0.7465 | 0.7149 | 0.7046 |
1.1341 | 50 | 0.2207 | - | - | - | - | - |
1.3575 | 60 | 0.48 | - | - | - | - | - |
1.5810 | 70 | 0.003 | - | - | - | - | - |
1.8045 | 80 | 0.2985 | - | - | - | - | - |
1.9832 | 88 | - | 0.7751 | 0.774 | 0.7821 | 0.7746 | 0.7365 |
2.0447 | 90 | 0.0168 | - | - | - | - | - |
2.2682 | 100 | 0.0698 | - | - | - | - | - |
2.4916 | 110 | 0.0054 | - | - | - | - | - |
2.7151 | 120 | 0.0112 | - | - | - | - | - |
2.9385 | 130 | 0.0031 | - | - | - | - | - |
2.9832 | 132 | - | 0.7569 | 0.7537 | 0.7565 | 0.7588 | 0.7251 |
3.1788 | 140 | 0.1794 | - | - | - | - | - |
3.4022 | 150 | 0.3266 | - | - | - | - | - |
3.6257 | 160 | 0.0006 | - | - | - | - | - |
3.8492 | 170 | 0.0003 | - | - | - | - | - |
3.9832 | 176 | - | 0.7491 | 0.7613 | 0.7526 | 0.7513 | 0.7206 |
4.0894 | 180 | 0.2622 | - | - | - | - | - |
4.3128 | 190 | 0.0004 | - | - | - | - | - |
4.5363 | 200 | 0.0392 | - | - | - | - | - |
4.7598 | 210 | 0.3312 | - | - | - | - | - |
4.9832 | 220 | 0.0021 | 0.7548 | 0.7527 | 0.7466 | 0.7568 | 0.7101 |
5.2235 | 230 | 0.7593 | - | - | - | - | - |
5.4469 | 240 | 0.0004 | - | - | - | - | - |
5.6704 | 250 | 0.0003 | - | - | - | - | - |
5.8939 | 260 | 0.0154 | - | - | - | - | - |
5.9832 | 264 | - | 0.7498 | 0.7545 | 0.7510 | 0.7407 | 0.7147 |
6.1341 | 270 | 0.0162 | - | - | - | - | - |
6.3575 | 280 | 0.447 | - | - | - | - | - |
6.5810 | 290 | 0.001 | - | - | - | - | - |
6.8045 | 300 | 0.1628 | - | - | - | - | - |
6.9832 | 308 | - | 0.7465 | 0.7552 | 0.7510 | 0.7407 | 0.7078 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.4.1
- Transformers: 4.48.3
- PyTorch: 2.3.1+cu121
- Accelerate: 1.3.0
- Datasets: 3.3.2
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for tsss1/expressvpn_embeddingmodel
Base model
FacebookAI/xlm-roberta-base
Finetuned
nomic-ai/nomic-xlm-2048
Finetuned
nomic-ai/nomic-embed-text-v2-moe
Evaluation results
- Cosine Accuracy@1 on dim 768self-reported0.600
- Cosine Accuracy@3 on dim 768self-reported0.775
- Cosine Accuracy@5 on dim 768self-reported0.825
- Cosine Accuracy@10 on dim 768self-reported0.875
- Cosine Precision@1 on dim 768self-reported0.600
- Cosine Precision@3 on dim 768self-reported0.267
- Cosine Precision@5 on dim 768self-reported0.170
- Cosine Precision@10 on dim 768self-reported0.090
- Cosine Recall@1 on dim 768self-reported0.588
- Cosine Recall@3 on dim 768self-reported0.775