SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2 on the bigbio/pubhealth dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("vladargunov/pubhealth-sentence-similarity")
# Run inference
sentences = [
    '"""A chain message circulating on messaging apps claims the United States is about to enter a period of federally mandated quarantine. The source: """"my aunt’s friend"""" who works for the government. There is no evidence of this. The message, which a reader sent us a screenshot of on March 16, appears in a group chat on iMessage. The sender claims to have information from """"my aunt\'s friend"""" who works for the Centers for Disease Control and Prevention and """"just got out of a meeting with Trump."""" """"He’s announcing tomorrow that the U.S. is going into quarantine for the next 14 days,"""" the message reads. """"Meaning everyone needs to stay in their homes/where they are."""" We’ve seen screenshots of similar messages circulating on WhatsApp, a private messaging app that’s popular abroad. Misinformation tends to get passed around via chain messages during major news events, so we looked into this one. (Screenshots) There is no evidence that the federal government is set to announce a nationwide lockdown like the ones seen in France, Italy and Spain. President Donald Trump and the National Security Council have both refuted the claim. So far, officials have advised Americans to practice """"social distancing,"""" or avoiding crowded public spaces. In a press conference March 16, Trump outlined several recommendations to prevent the spread of the coronavirus. Among them is avoiding gatherings of 10 or more people. """"My administration is recommending that all Americans, including the young and healthy, work to engage in schooling from home when possible, avoid gathering in groups of more than 10 people, avoid discretionary travel and avoid eating and drinking in bars, restaurants and public food courts,"""" he said. In response to a question, he said the administration is not considering a national curfew or quarantine. He reiterated that point in another press conference March 17. """"It’s a very big step. It’s something we talk about, but we haven’t decided to do that,"""" he said. Andrew Cuomo ordered a one-mile containment zone on March 10. Large gathering spots were closed for 14 days and National Guard troops are delivering food to people. In the San Francisco Bay Area, local officials on March 16 announced sweeping measures to try to contain the coronavirus. Residents of six counties have been ordered to """"shelter in place"""" in their homes and stay away from others as much as possible for the next three weeks. The move falls short of a total lockdown. At the federal level, the CDC does have the power to quarantine people who may have come in contact with someone infected by the coronavirus, but most quarantines are done voluntarily. And decisions are usually left up to states and localities. We reached out to the CDC for comment on the chain message, but we haven’t heard back. The chain message is inaccurate. If you receive a chain message that you want us to fact-check, send a screenshot to [email\xa0protected]."""',
    'Drug overdoses are now the second-most common cause of death in New Hampshire.',
    'Treadmill classes mix it up with workhorse of the gym.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

bigbio/pubhealth

  • Dataset: bigbio/pubhealth
  • Size: 16,158 training samples
  • Columns: sentence2, sentence1, and score
  • Approximate statistics based on the first 1000 samples:
    sentence2 sentence1 score
    type string string int
    details
    • min: 91 tokens
    • mean: 246.21 tokens
    • max: 256 tokens
    • min: 6 tokens
    • mean: 21.43 tokens
    • max: 96 tokens
    • 0: 100.00%
  • Samples:
    sentence2 sentence1 score
    """Hillary Clinton is in the political crosshairs as the author of a new book alleges improper financial ties between her public and personal life. At issue in conservative author Peter Schweizer’s forthcoming book Clinton Cash are donations from foreign governments to the Clinton Foundation during the four years she served as secretary of state. George Stephanopoulos used an interview with Schweizer on ABC This Week to point out what other nonpartisan journalists have found: There is no """"smoking gun"""" showing that donations to the foundation influenced her foreign policy decisions. Still, former Republican House Speaker Newt Gingrich says the donations are """"clearly illegal"""" under federal law. In his view, a donation by a foreign government to the Clinton Foundation while Clinton was secretary of state is the same as money sent directly to her, he said, even though she did not join the foundation’s board until she left her post. """"The Constitution of the United States says you cannot take money from foreign governments without explicit permission of the Congress. They wrote that in there because they knew the danger of corrupting our system by foreign money is enormous,"""" Gingrich said. """"You had a sitting secretary of state whose husband radically increased his speech fees, you have a whole series of dots on the wall now where people gave millions of dollars — oh, by the way, they happen to get taken care of by the State Department."""" He continued, """"My point is they took money from foreign governments while she was secretary of State. That is clearly illegal."""" PunditFact wanted to know if a criminal case against Clinton is that open and shut. Is what happened """"clearly illegal""""? A spokesman for the Clinton Foundation certainly disagreed, calling Gingrich’s accusation """"a baseless leap"""" because Clinton was not part of her husband’s foundation while serving as a senator or secretary of state. We did not hear from Gingrich by our deadline. Foundation basics Former President Clinton started the William J. Clinton Foundation in 2001, the year after Hillary Clinton won her first term as a New York senator. The foundation works with non-governmental organizations, the private sector and governments around the world on health, anti-poverty, HIV/AIDS and climate change initiatives. Spokesman Craig Minassian said it’s reasonable for the foundation to accept money from foreign governments because of the global scope of its programs, and the donations are usually in the form of tailored grants for specific missions. Hillary Clinton was not part of her husband’s foundation while she was a senator or secretary of state. Her appointment to the latter post required Senate confirmation and came with an agreement between the White House and Clinton Foundation that the foundation would be more transparent about its donors. According to the 2008 memorandum of understanding, the foundation would release information behind new donations and could continue to collect donations from countries with which it had existing relationships or running grant programs. If countries with existing contributions significantly stepped up their contributions, or if a new foreign government wanted to donate, the State Department would have to approve. Clinton took an active role in fundraising when she left the State Department and the foundation became the Bill, Hillary & Chelsea Clinton Foundation in 2013. But she left the board when she announced her run for the presidency in April 2015. The Emoluments Clause So how does Gingrich come up with the claim that Clinton Foundation donations are """"clearly illegal"""" and unconstitutional? The answer is something known as the Emoluments Clause. A few conservative websites have made similar arguments in recent days, including the Federalist blog. The Emoluments Clause, found in Article 1, Section 9 of the Constitution, reads in part: """"No Title of Nobility shall be granted by the United States: And no Person holding any Office of Profit or Trust under them, shall, without the Consent of the Congress, accept of any present, Emolument, Office, or Title, of any kind whatever, from any King, Prince, or foreign State."""" The framers came up with this clause to prevent the government and leaders from granting or receiving titles of nobility and to keep leaders free of external influence. (An emolument, per Merriam-Webster Dictionary, is """"the returns arising from office or employment usually in the form of compensation or perquisites."""") Lest you think the law is no longer relevant, the Pentagon ethics office in 2013 warned employees the """"little known provision"""" applies to all federal employees and military retirees. There’s no mention of spouses in the memo. J. Peter Pham, director of the Atlantic Council’s Africa Center, said interpretation of the clause has evolved since its adoption at the Constitutional Convention, when the primary concern was about overseas diplomats not seeking gifts from foreign powers they were dealing with. The Defense Department memo, in his view, goes beyond what the framers envisioned for the part of the memo dealing with gifts. """"I think that, aside from the unambiguous parts, the burden would be on those invoking the clause to show actual causality that would be in violation of the clause,"""" Pham said. Expert discussion We asked seven different constitutional law experts on whether the Clinton Foundation foreign donations were """"clearly illegal"""" and a violation of the Emoluments Clause. We did not reach a consensus with their responses, though a majority thought the layers of separation between the foundation and Hillary Clinton work against Gingrich. The American system often distinguishes between public officers and private foundations, """"even if real life tends to blur some of those distinctions,"""" said American University law professor Steve Vladeck. Vladeck added that the Emoluments Clause has never been enforced. """"I very much doubt that the first case in its history would be because a foreign government made charitable donations to a private foundation controlled by a government employee’s relative,"""" he said. """"Gingrich may think that giving money to the Clinton Foundation and giving money to then-Secretary Clinton are the same thing. Unfortunately for him, for purposes of federal regulations, statutes, and the Constitution, they’re formally — and, thus, legally — distinct."""" Robert Delahunty, a University of St. Thomas constitutional law professor who worked in the Justice Department’s Office of Legal Counsel from 1989 to 2003, also called Gingrich’s link between Clinton and the foreign governments’ gifts to the Clinton Foundation as """"implausible, and in any case I don’t think we have the facts to support it."""" """"The truth is that we establish corporate bodies like the Clinton Foundation because the law endows these entities with a separate and distinct legal personhood,"""" Delahunty said. John Harrison, University of Virginia law professor and former deputy assistant attorney general in the Office of Legal Counsel from 1990 to 1993, pointed to the Foreign Gifts Act, 5 U.S.C. 7432, which sets rules for how the Emoluments Clause should work in practice. The statute spells out the minimal value for acceptable gifts, and says it applies to spouses of the individuals covered, but """"it doesn’t say anything about receipt of foreign gifts by other entities such as the Clinton Foundation."""" """"I don’t know whether there’s any other provision of federal law that would treat a foreign gift to the foundation as having made to either of the Clintons personally,"""" Harrison said, who added that agencies have their own supplemental rules for this section, and he did not know if the State Department addressed this. Other experts on the libertarian side of the scale thought Gingrich was more right in his assertion. Clinton violates the clause because of its intentionally broad phrasing about gifts of """"any kind whatever,"""" which would cover indirect gifts via the foundation, said Dave Kopel, a constitutional law professor at Denver University and research director at the libertarian Independence Institute. Kopel also brought up bribery statutes, which would require that a gift had some influence in Clinton’s decision while secretary of state. Delahunty thought Kopel’s reasoning would have """"strange consequences,"""" such as whether a state-owned airline flying Bill Clinton to a conference of former heads of state counted as a gift to Hillary Clinton. Our ruling Gingrich said the Clinton Foundation """"took money from from foreign governments while (Hillary Clinton) was secretary of state. It is clearly illegal. … The Constitution says you can’t take this stuff."""" A clause in the Constitution does prohibit U.S. officials such as former Secretary of State Hillary Clinton from receiving gifts, or emoluments, from foreign governments. But the gifts in this case were donations from foreign governments that went to the Clinton Foundation, not Hillary Clinton. She was not part of the foundation her husband founded while she was secretary of state. Does that violate the Constitution? Some libertarian-minded constitutional law experts say it very well could. Others are skeptical. What’s clear is there is room for ambiguity, and the donations are anything but """"clearly illegal."""" The reality is this a hazy part of U.S. constitutional law." Britain plans for opt-out organ donation scheme to save lives. 0
    The story does discuss costs, but the framing is problematic. The story, based on a conversation with one source, the study’s lead investigator, says, “It’s difficult at this point to predict costs. However, he expects costs will not approach those for Provenge, the pricey treatment vaccine for prostate cancer approved by the FDA in 2010. Provenge costs $93,000 for the one-month, three-dose treatment. Medicare covers it.” This tells readers that, no matter what the drug costs, Medicare likely will cover it. We appreciate the effort to bring cost information into the story, but this type of information is misleading. The story does explain that only one patient remains cancer free following the study. It then details how for most of the patients cancer continued to progress after 2 months. It says that the median overall survival in both the breast cancer and ovarian cancer patients was less than 16 months. But the story is framed in such a way to highlight the one potentially positive outcome of the study and to downplay the negative. We read more sooner about the one patient who may have responded well to the vaccine than we do about the 25 other patients who did not. The story mentions side effects in a satisfactory way. Technically, the story provides readers with much of the information they would need to assess the validity of the study, but it comes out in bits and pieces. For example, we only find out near the end of the story that “The woman, who remains disease-free, had a previous treatment with a different treatment vaccine. ‘That might have primed her immune system,’ Gulley speculates. She also had only one regimen of chemotherapy, perhaps keeping her immune system stronger.” This casts much doubt on the study’s design, and it would have been nice to have seen some outside expertise brought in to either discuss those design problems or to torpedo the story altogether. Again, the story deserves high marks for being very specific in the lead and throughout the story. It says, that the vaccine is “for breast and ovarian cancer that has spread to other parts of the body” in the lead and later details the particular circumstances of the study cohort. It says, “The patients had already undergone a variety of treatments but the cancer was progressing. Twenty one of the 26 had undergone three or more chemotherapy regimens.” This is the root of the story’s main shortcoming. Almost all of the information in the story comes from one source: Dr. James Gulley, who oversaw the study. Gulley is quite enthusiastic about this vaccine, despite the evidence, and the story needed more perspectives to put this vaccine into a broader context. At the very end, there are a few comments from Dr. Vincent K. Tuohy, who also is working on a breast cancer vaccine. Because of his competing research, he seems to have a conflict, but even putting that aside, his comments were not used to their best effect. There was no comparison in the story to existing alternatives. The median survival, for example, is presented without the context of how long these patients might have lived had they been undergoing standard chemotherapy and radiation treatments. We give high marks to the story for saying right in the lead that the findings are from “a preliminary study in 26 patients.” That tells readers both that the findings need to be interpreted with caution and that the treatment is not available to most people. The concept of vaccines for breast/ovarian cancer is indeed novel, and the story acknowledges that other vaccines are being studied. The story does not rely on a news release. Virus raises specter of gravest attacks in modern US times. 0
    """Although the story didn’t cite the cost of appendectomy – emergency or urgent surgery – and we wish it had, we nonetheless will give it a satisfactory score because it at least cited what the editorial writer wrote, """"A secondary benefit is the savings to the hospital generated by minimizing staff and anesthesiologist presence late in the evening and during the wee hours of the morning."""" As with our harms score above, although the story didn’t give absolute numbers, in this case we think it was sufficient for it to report that """"The scientists found no significant difference among the groups in the patients’ condition 30 days after surgery or in the length of their operation or hospital stay."""" Although the story didn’t give absolute numbers, in this case we think it was sufficient for it to report that """"The scientists found no significant difference among the groups in the patients’ condition 30 days after surgery or in the length of their operation or hospital stay."""" Despite running less than 300 words, this story did an adequate job in explaining the quality of the evidence, including pointing out limitations. No disease-mongering here. The story meets the bare minimum requirement for this criterion in that it at least cited what an editorial stated. The focus of the story was on a study comparing emergency appendectomy with surgery done up to 12 hours later or beyond. This is the whole focus of the story – and one we applaud – when it begins:  """"Appendectomy is the most common emergency surgery in the world, but it doesn’t have to be."""" There were no claims made about the novelty of this research, and we may have wished for a bit more context on this. Nonetheless, the potential for guiding future care decisions was made clear. Not applicable. Given that the story only pulled excerpts from the journal article and the accompanying editorial, and didn’t include any fresh quotes from interviews, we can’t be sure of the extent to which it may have been influenced by a news release.""" Legionnaires’ case identified at Quincy veterans’ home. 0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 128
  • learning_rate: 2e-05
  • num_train_epochs: 10
  • warmup_ratio: 0.1
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss
0.7874 100 0.0603
1.5748 200 0.131
2.3622 300 0.1188
3.1496 400 0.1173
3.9370 500 0.0551
4.7244 600 0.0622
5.5118 700 0.0454
6.2992 800 0.0521
7.0866 900 0.0478
7.8740 1000 0.0403
8.6614 1100 0.035
9.4488 1200 0.0386

Framework Versions

  • Python: 3.10.13
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.1.2
  • Accelerate: 0.30.1
  • Datasets: 2.19.2
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
34
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for vladargunov/pubhealth-sentence-similarity

Finetuned
(181)
this model

Dataset used to train vladargunov/pubhealth-sentence-similarity