SetFit

This is a SetFit model that can be used for Text Classification. A SVC instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

  • Model Type: SetFit
  • Classification head: a SVC instance
  • Maximum Sequence Length: 384 tokens
  • Number of Classes: 2 classes

Model Sources

Model Labels

Label Examples
1
  • 'Gone are the days when they led the world in recession-busting'
  • 'Who so mean that he will not himself be taxed, who so mindful of wealth that he will not favor increasing the popular taxes, in aid of these defective children?'
  • 'That state has sixty-two counties and sixty cities … In addition there are 932 towns, 507 villages, and, at the last count, 9,600 school districts … Just try to render efficient service … amid the diffused identities and inevitable jealousies of, roughly, 11,000 independent administrative officers or boards!'
0
  • 'Is this a warning of what’s to come?'
  • 'This unique set of circumstances has brought PCL back into focus as the safe haven of choice for global players seeking somewhere to stash their cash.'
  • 'Socialists believe that, if everyone cannot have something, no one shall.'

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("SOUMYADEEPSAR/Setfit_designed_sample_svm_head")
# Run inference
preds = model("What could possibly go wrong?")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 3 36.5327 97
Label Training Sample Count
0 100
1 114

Training Hyperparameters

  • batch_size: (8, 8)
  • num_epochs: (1, 1)
  • max_steps: -1
  • sampling_strategy: oversampling
  • body_learning_rate: (2e-05, 2e-05)
  • head_learning_rate: 2e-05
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: False
  • use_amp: False
  • warmup_proportion: 0.1
  • seed: 42
  • eval_max_steps: -1
  • load_best_model_at_end: False

Training Results

Epoch Step Training Loss Validation Loss
0.0003 1 0.3597 -
0.0161 50 0.2693 -
0.0323 100 0.2501 -
0.0484 150 0.2691 -
0.0645 200 0.063 -
0.0806 250 0.0179 -
0.0968 300 0.0044 -
0.1129 350 0.0003 -
0.1290 400 0.0005 -
0.1452 450 0.0002 -
0.1613 500 0.0003 -
0.1774 550 0.0001 -
0.1935 600 0.0001 -
0.2097 650 0.0001 -
0.2258 700 0.0001 -
0.2419 750 0.0001 -
0.2581 800 0.0 -
0.2742 850 0.0001 -
0.2903 900 0.0002 -
0.3065 950 0.0 -
0.3226 1000 0.0 -
0.3387 1050 0.0002 -
0.3548 1100 0.0 -
0.3710 1150 0.0001 -
0.3871 1200 0.0001 -
0.4032 1250 0.0 -
0.4194 1300 0.0 -
0.4355 1350 0.0 -
0.4516 1400 0.0001 -
0.4677 1450 0.0 -
0.4839 1500 0.0 -
0.5 1550 0.0001 -
0.5161 1600 0.0001 -
0.5323 1650 0.0 -
0.5484 1700 0.0 -
0.5645 1750 0.0 -
0.5806 1800 0.0 -
0.5968 1850 0.0 -
0.6129 1900 0.0 -
0.6290 1950 0.0001 -
0.6452 2000 0.0 -
0.6613 2050 0.0 -
0.6774 2100 0.0 -
0.6935 2150 0.0001 -
0.7097 2200 0.0 -
0.7258 2250 0.0 -
0.7419 2300 0.0001 -
0.7581 2350 0.0001 -
0.7742 2400 0.0001 -
0.7903 2450 0.0 -
0.8065 2500 0.0 -
0.8226 2550 0.0 -
0.8387 2600 0.0 -
0.8548 2650 0.0001 -
0.8710 2700 0.0001 -
0.8871 2750 0.0 -
0.9032 2800 0.0 -
0.9194 2850 0.0 -
0.9355 2900 0.0001 -
0.9516 2950 0.0 -
0.9677 3000 0.0001 -
0.9839 3050 0.0 -
1.0 3100 0.0 -
0.0003 1 0.326 -
0.0172 50 0.2514 -
0.0345 100 0.434 -
0.0517 150 0.1265 -
0.0689 200 0.125 -
0.0861 250 0.2375 -
0.1034 300 0.0014 -
0.1206 350 0.1192 -
0.1378 400 0.0166 -
0.1551 450 0.0002 -
0.1723 500 0.0001 -
0.1895 550 0.0 -
0.2068 600 0.0 -
0.2240 650 0.0001 -
0.2412 700 0.0 -
0.2584 750 0.0 -
0.2757 800 0.0 -
0.2929 850 0.0 -
0.3101 900 0.0 -
0.3274 950 0.0001 -
0.3446 1000 0.0 -
0.3618 1050 0.0001 -
0.3790 1100 0.0 -
0.3963 1150 0.0001 -
0.4135 1200 0.0 -
0.4307 1250 0.0001 -
0.4480 1300 0.0 -
0.4652 1350 0.0 -
0.4824 1400 0.0 -
0.4997 1450 0.0 -
0.5169 1500 0.0 -
0.5341 1550 0.0001 -
0.5513 1600 0.0 -
0.5686 1650 0.0 -
0.5858 1700 0.0 -
0.6030 1750 0.0 -
0.6203 1800 0.0 -
0.6375 1850 0.0 -
0.6547 1900 0.0001 -
0.6720 1950 0.0001 -
0.6892 2000 0.0 -
0.7064 2050 0.0 -
0.7236 2100 0.0 -
0.7409 2150 0.0 -
0.7581 2200 0.0 -
0.7753 2250 0.0 -
0.7926 2300 0.0 -
0.8098 2350 0.0 -
0.8270 2400 0.0 -
0.8442 2450 0.0001 -
0.8615 2500 0.0 -
0.8787 2550 0.0 -
0.8959 2600 0.0 -
0.9132 2650 0.0 -
0.9304 2700 0.0 -
0.9476 2750 0.0 -
0.9649 2800 0.0 -
0.9821 2850 0.0 -
0.9993 2900 0.0 -

Framework Versions

  • Python: 3.10.12
  • SetFit: 1.0.3
  • Sentence Transformers: 3.0.1
  • Transformers: 4.39.0
  • PyTorch: 2.3.0+cu121
  • Datasets: 2.20.0
  • Tokenizers: 0.15.2

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
12
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.