SetFit

This is a SetFit model that can be used for Text Classification. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

  • Model Type: SetFit
  • Classification head: a LogisticRegression instance
  • Maximum Sequence Length: 512 tokens
  • Number of Classes: 2 classes

Model Sources

Model Labels

Label Examples
0.0
  • 'A Jewish student at McGill University has been kicked off the student government board for having “conflicts of interest” due to his pro-Israel activism.\n'
  • 'How else to describe the decision by Big Brother USA and junior sidekick South Korea to stage major air force exercises on North Korea’s border.\n'
  • 'DB: It was hysterical to watch these four armed guards who kept shouting “Stop resisting, stop resisting!” and they are beating the hell out of him!\n'
1.0
  • 'The UK should never become a stage for inflammatory speakers who promote hate."\n'
  • 'In a nation guided by fairness and law, a person is innocent until proven guilty.\n'
  • 'Speaking of Mastercard, the David Horowitz Freedom Center just recently won a major battle with the credit card, defeating well-financed leftwing groups that are trying to run the Center out of business and suffocate free speech in America.\n'

Evaluation

Metrics

Label F1
all 0.7514

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("anismahmahi/G1-setfit-model")
# Run inference
preds = model("Are you people serious?
")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 1 26.2775 129
Label Training Sample Count
0 3919
1 240

Training Hyperparameters

  • batch_size: (16, 16)
  • num_epochs: (2, 2)
  • max_steps: -1
  • sampling_strategy: oversampling
  • num_iterations: 5
  • body_learning_rate: (2e-05, 1e-05)
  • head_learning_rate: 0.01
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: False
  • use_amp: False
  • warmup_proportion: 0.1
  • seed: 42
  • eval_max_steps: -1
  • load_best_model_at_end: True

Training Results

Epoch Step Training Loss Validation Loss
0.0004 1 0.3542 -
0.0192 50 0.2957 -
0.0385 100 0.2509 -
0.0577 150 0.1691 -
0.0769 200 0.2145 -
0.0962 250 0.0861 -
0.1154 300 0.0677 -
0.1346 350 0.0554 -
0.1538 400 0.0169 -
0.1731 450 0.0621 -
0.1923 500 0.0024 -
0.2115 550 0.0405 -
0.2308 600 0.0724 -
0.25 650 0.0557 -
0.2692 700 0.0007 -
0.2885 750 0.0011 -
0.3077 800 0.0005 -
0.3269 850 0.0103 -
0.3462 900 0.0618 -
0.3654 950 0.0003 -
0.3846 1000 0.0046 -
0.4038 1050 0.0006 -
0.4231 1100 0.0003 -
0.4423 1150 0.0004 -
0.4615 1200 0.0006 -
0.4808 1250 0.0002 -
0.5 1300 0.0001 -
0.5192 1350 0.0002 -
0.5385 1400 0.0003 -
0.5577 1450 0.0002 -
0.5769 1500 0.0002 -
0.5962 1550 0.0003 -
0.6154 1600 0.0001 -
0.6346 1650 0.0067 -
0.6538 1700 0.0003 -
0.6731 1750 0.0001 -
0.6923 1800 0.0003 -
0.7115 1850 0.0001 -
0.7308 1900 0.0001 -
0.75 1950 0.0006 -
0.7692 2000 0.0001 -
0.7885 2050 0.0001 -
0.8077 2100 0.0 -
0.8269 2150 0.0 -
0.8462 2200 0.0 -
0.8654 2250 0.0 -
0.8846 2300 0.0002 -
0.9038 2350 0.0001 -
0.9231 2400 0.0001 -
0.9423 2450 0.0003 -
0.9615 2500 0.0001 -
0.9808 2550 0.0005 -
1.0 2600 0.0 0.1875
1.0192 2650 0.0 -
1.0385 2700 0.0003 -
1.0577 2750 0.0 -
1.0769 2800 0.0001 -
1.0962 2850 0.0472 -
1.1154 2900 0.0 -
1.1346 2950 0.0 -
1.1538 3000 0.0001 -
1.1731 3050 0.0001 -
1.1923 3100 0.0 -
1.2115 3150 0.0003 -
1.2308 3200 0.0 -
1.25 3250 0.0 -
1.2692 3300 0.0245 -
1.2885 3350 0.0 -
1.3077 3400 0.0 -
1.3269 3450 0.0 -
1.3462 3500 0.0001 -
1.3654 3550 0.0 -
1.3846 3600 0.0 -
1.4038 3650 0.0 -
1.4231 3700 0.0 -
1.4423 3750 0.0 -
1.4615 3800 0.0 -
1.4808 3850 0.0 -
1.5 3900 0.0 -
1.5192 3950 0.0 -
1.5385 4000 0.0 -
1.5577 4050 0.0 -
1.5769 4100 0.0 -
1.5962 4150 0.0 -
1.6154 4200 0.0 -
1.6346 4250 0.0001 -
1.6538 4300 0.0 -
1.6731 4350 0.0 -
1.6923 4400 0.0 -
1.7115 4450 0.0 -
1.7308 4500 0.0 -
1.75 4550 0.0 -
1.7692 4600 0.0 -
1.7885 4650 0.0 -
1.8077 4700 0.0 -
1.8269 4750 0.0 -
1.8462 4800 0.0001 -
1.8654 4850 0.0 -
1.8846 4900 0.0 -
1.9038 4950 0.0 -
1.9231 5000 0.0 -
1.9423 5050 0.0 -
1.9615 5100 0.0 -
1.9808 5150 0.0 -
2.0 5200 0.0 0.1393
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • SetFit: 1.0.1
  • Sentence Transformers: 2.2.2
  • Transformers: 4.35.2
  • PyTorch: 2.1.0+cu121
  • Datasets: 2.16.1
  • Tokenizers: 0.15.0

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
9
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Evaluation results