SetFit with sentence-transformers/all-mpnet-base-v2
This is a SetFit model that can be used for Text Classification. This SetFit model uses sentence-transformers/all-mpnet-base-v2 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.
The model has been trained using an efficient few-shot learning technique that involves:
- Fine-tuning a Sentence Transformer with contrastive learning.
- Training a classification head with features from the fine-tuned Sentence Transformer.
Model Details
Model Description
Model Sources
Model Labels
Label |
Examples |
yes |
- 'MS: Invests $10B into ChatGPT and then immediately lays off 10,000 workers to pay for it.\n'
- 'Skepticism aside, it's way too late to stop or even realistically control A.I. The genie is literally out of the bottle, with more sophisticated iterations of A.I. to come. There's too much financial momentum behind it. OpenAI, the research lab behind the viral ChatGPT chatbot, is in talks to sell existing shares in a tender offer that would value the company at around $29 billion, making it one of the most valuable U.S. startups on paper. Microsoft Corp. has also been in advanced talks to increase its investment in OpenAI. In 2019, Microsoft invested $1 billion in OpenAI and became its preferred partner for commercializing new technologies for services like the search engine Bing and the design app Microsoft Design. Other backers include Tesla CEO Elon Musk, LinkedIn co-founder Reid Hoffman. There are over 100 AI companies developing various Machine learning tasks, new features coming daily. ChatGPT is a genuine productivity boost and a technological wonder. It can write code in Python, TypeScript, and many other languages at my command. It does have bugs in the code, but they are fixable. The possibilities are endless. I can't imagine what version 2.0 or 3.0 would look like. For better and/or worse, this is the future. It is incredible, even at this early stage. This technology is mind-blowing and will unquestionably change the world. As Victor Hugo said, " A force more powerful than all of the armies in the world is an idea whose time has come." Indeed it has.\n'
- 'Microsoft Bets Big on the Creator of ChatGPT in Race to Dominate A.I. As a new chatbot wows the world with its conversational talents, a resurgent tech giant is poised to reap the benefits while doubling down on a relationship with the start-up OpenAI. When a chatbot called ChatGPT hit the internet late last year, executives at a number of Silicon Valley companies worried they were suddenly dealing with new artificial intelligence technology that could disrupt their businesses. As a new chatbot wows the world with its conversational talents, a resurgent tech giant is poised to reap the benefits while doubling down on a relationship with the start-up OpenAI.\n'
|
no |
- "The tragedy of this war, any war, is overwhelming. A city of 100,000 reduced to ruble and the smell of corpses. One can easily imagine all the families who went about their lives prior to the invasion. Schools ringing with children sounds. Shops and eateries filled with patrons, exchanging smiles, saying hello, friends getting together. Homes secure, places of family warmth, humor, love. All gone. Gone in this lifetime. Gone in the blink of a mad man's perverted notion of his needs. We have our mad men and women too - in our congress. We just saw their shameful show. Just the appetizer for a lousy meal to come. In response to the brave Ukrainians who resist, who fight and die, will the mad ones in the new congress stand for freedom or turn away?Will they do as the French did 250 years ago when they came to our aid against a king or will they allow King Putin to have his way?Americans have freedom in their blood. Make that blood boil if this congress forgets that and turns its back on the fight against a king.\n"
- 'The dangers of gas stoves are found in only a few studies funded by anti-fossil fuel groups. Anyone who distrusts studies by Exxon, big pharma, big tobacco, should be skeptical of these as well."The science" (tm) does not support these studies that proport to say that gas stoves are a specific problem. NO(x) forms at 2800 F under high pressure, and typically from Nitrogen in the fuel, not the air, where it is relatively stable, being bound to another Nitrogen as N2. Natural gas does not contain Nitrogen, and cooktops do not operate at high pressure. Likewise, natural gas, burning in excess air (open flame) does not produce significant CO. It is indeed a clean burning fuel.Cooking does release particulates and gasses, smoke and smells, but that does not depend on how the food is heated. Cooking bacon smells the same on electric or gas or charcoal or wood (may actually smell better on wood and charcoal) or dung (well maybe not dung).\n'
- 'When my electricity goes down due to winter storms, I still have hot water for showers, a place to cook food and heat all via my gas water heater, gas fireplace and gas cooktop. Easy to ignite with a match. We can briefly open windows to air out fumes. I’ll never willingly go all electric.\n'
|
Evaluation
Metrics
Uses
Direct Use for Inference
First install the SetFit library:
pip install setfit
Then you can load this model and run inference.
from setfit import SetFitModel
model = SetFitModel.from_pretrained("davidadamczyk/my-awesome-setfit-model")
preds = model("“Amid this dynamic environment, we delivered record results in fiscal year 2022: We reported $198 billion in revenue and $83 billion in operating income. And the Microsoft Cloud surpassed $100 billion in annualized revenue for the first time.”- From Microsoft’s 2022 Annual Report Shareholder’s Letter
")
Training Details
Training Set Metrics
Training set |
Min |
Median |
Max |
Word count |
13 |
132.875 |
296 |
Label |
Training Sample Count |
no |
18 |
yes |
22 |
Training Hyperparameters
- batch_size: (16, 16)
- num_epochs: (1, 1)
- max_steps: -1
- sampling_strategy: oversampling
- num_iterations: 20
- body_learning_rate: (2e-05, 2e-05)
- head_learning_rate: 2e-05
- loss: CosineSimilarityLoss
- distance_metric: cosine_distance
- margin: 0.25
- end_to_end: False
- use_amp: False
- warmup_proportion: 0.1
- l2_weight: 0.01
- seed: 42
- eval_max_steps: -1
- load_best_model_at_end: False
Training Results
Epoch |
Step |
Training Loss |
Validation Loss |
0.01 |
1 |
0.3469 |
- |
0.5 |
50 |
0.0603 |
- |
1.0 |
100 |
0.0011 |
- |
Framework Versions
- Python: 3.10.13
- SetFit: 1.1.0
- Sentence Transformers: 3.0.1
- Transformers: 4.45.2
- PyTorch: 2.4.0+cu124
- Datasets: 2.21.0
- Tokenizers: 0.20.0
Citation
BibTeX
@article{https://doi.org/10.48550/arxiv.2209.11055,
doi = {10.48550/ARXIV.2209.11055},
url = {https://arxiv.org/abs/2209.11055},
author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Efficient Few-Shot Learning Without Prompts},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}