SetFit with sentence-transformers/paraphrase-mpnet-base-v2

This is a SetFit model that can be used for Text Classification. This SetFit model uses sentence-transformers/paraphrase-mpnet-base-v2 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Sources

Model Labels

Label Examples
RequestMoveToFloor
  • 'Please go to the 3rd floor.'
  • 'Can you take me to floor 5?'
  • 'I need to go to the 8th floor.'
RequestMoveUp
  • 'Go one floor up'
  • 'Take me up two floors'
  • 'Go up three floors, please'
RequestMoveDown
  • 'Move me down one level'
  • 'Can you take me down two floors?'
  • 'Go down three levels'
Confirm
  • "Yes, that's right."
  • 'Sure.'
  • 'Exactly.'
RequestEmployeeLocation
  • 'Where is Erik Velldal’s office?'
  • 'Which floor is Andreas Austeng on?'
  • 'Can you tell me where Birthe Soppe’s office is?'
CurrentFloor
  • 'Which floor are we on?'
  • 'What floor is this?'
  • 'Are we on the 5th floor?'
Stop
  • 'Stop the elevator.'
  • "Wait, don't go to that floor."
  • 'No, not that floor.'
OutOfCoverage
  • "What's the capital of France?"
  • 'How many floors does this building have?'
  • 'Can you make a phone call for me?'

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("victomoe/setfit-intent-classifier-3")
# Run inference
preds = model("Okay, go ahead.")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 1 5.2118 9
Label Training Sample Count
Confirm 22
CurrentFloor 21
OutOfCoverage 22
RequestEmployeeLocation 22
RequestMoveDown 20
RequestMoveToFloor 23
RequestMoveUp 20
Stop 20

Training Hyperparameters

  • batch_size: (32, 32)
  • num_epochs: (10, 10)
  • max_steps: -1
  • sampling_strategy: oversampling
  • body_learning_rate: (2e-05, 1e-05)
  • head_learning_rate: 0.01
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: False
  • use_amp: False
  • warmup_proportion: 0.1
  • l2_weight: 0.01
  • seed: 42
  • eval_max_steps: -1
  • load_best_model_at_end: False

Training Results

Epoch Step Training Loss Validation Loss
0.0013 1 0.195 -
0.0633 50 0.1877 -
0.1266 100 0.1592 -
0.1899 150 0.1141 -
0.2532 200 0.0603 -
0.3165 250 0.0283 -
0.3797 300 0.0104 -
0.4430 350 0.0043 -
0.5063 400 0.0027 -
0.5696 450 0.0021 -
0.6329 500 0.0017 -
0.6962 550 0.0015 -
0.7595 600 0.0011 -
0.8228 650 0.001 -
0.8861 700 0.0011 -
0.9494 750 0.0008 -
1.0127 800 0.0007 -
1.0759 850 0.0006 -
1.1392 900 0.0006 -
1.2025 950 0.0005 -
1.2658 1000 0.0005 -
1.3291 1050 0.0005 -
1.3924 1100 0.0004 -
1.4557 1150 0.0004 -
1.5190 1200 0.0004 -
1.5823 1250 0.0004 -
1.6456 1300 0.0004 -
1.7089 1350 0.0003 -
1.7722 1400 0.0003 -
1.8354 1450 0.0003 -
1.8987 1500 0.0003 -
1.9620 1550 0.0003 -
2.0253 1600 0.0003 -
2.0886 1650 0.0003 -
2.1519 1700 0.0003 -
2.2152 1750 0.0003 -
2.2785 1800 0.0003 -
2.3418 1850 0.0002 -
2.4051 1900 0.0002 -
2.4684 1950 0.0002 -
2.5316 2000 0.0002 -
2.5949 2050 0.0002 -
2.6582 2100 0.0002 -
2.7215 2150 0.0002 -
2.7848 2200 0.0002 -
2.8481 2250 0.0002 -
2.9114 2300 0.0002 -
2.9747 2350 0.0002 -
3.0380 2400 0.0002 -
3.1013 2450 0.0009 -
3.1646 2500 0.0003 -
3.2278 2550 0.0002 -
3.2911 2600 0.0002 -
3.3544 2650 0.0002 -
3.4177 2700 0.0002 -
3.4810 2750 0.0002 -
3.5443 2800 0.0002 -
3.6076 2850 0.0002 -
3.6709 2900 0.0002 -
3.7342 2950 0.0002 -
3.7975 3000 0.0002 -
3.8608 3050 0.0002 -
3.9241 3100 0.0001 -
3.9873 3150 0.0002 -
4.0506 3200 0.0001 -
4.1139 3250 0.0001 -
4.1772 3300 0.0001 -
4.2405 3350 0.0001 -
4.3038 3400 0.0001 -
4.3671 3450 0.0001 -
4.4304 3500 0.0005 -
4.4937 3550 0.0001 -
4.5570 3600 0.0001 -
4.6203 3650 0.0001 -
4.6835 3700 0.0001 -
4.7468 3750 0.0001 -
4.8101 3800 0.0001 -
4.8734 3850 0.0001 -
4.9367 3900 0.0001 -
5.0 3950 0.0001 -
5.0633 4000 0.0001 -
5.1266 4050 0.0001 -
5.1899 4100 0.0001 -
5.2532 4150 0.0001 -
5.3165 4200 0.0001 -
5.3797 4250 0.0001 -
5.4430 4300 0.0001 -
5.5063 4350 0.0001 -
5.5696 4400 0.0001 -
5.6329 4450 0.0001 -
5.6962 4500 0.0001 -
5.7595 4550 0.0001 -
5.8228 4600 0.0001 -
5.8861 4650 0.0001 -
5.9494 4700 0.0001 -
6.0127 4750 0.0001 -
6.0759 4800 0.0001 -
6.1392 4850 0.0001 -
6.2025 4900 0.0001 -
6.2658 4950 0.0001 -
6.3291 5000 0.0001 -
6.3924 5050 0.0001 -
6.4557 5100 0.0001 -
6.5190 5150 0.0001 -
6.5823 5200 0.0001 -
6.6456 5250 0.0001 -
6.7089 5300 0.0001 -
6.7722 5350 0.0001 -
6.8354 5400 0.0001 -
6.8987 5450 0.0001 -
6.9620 5500 0.0001 -
7.0253 5550 0.0001 -
7.0886 5600 0.0001 -
7.1519 5650 0.0001 -
7.2152 5700 0.0001 -
7.2785 5750 0.0001 -
7.3418 5800 0.0001 -
7.4051 5850 0.0001 -
7.4684 5900 0.0001 -
7.5316 5950 0.0001 -
7.5949 6000 0.0001 -
7.6582 6050 0.0001 -
7.7215 6100 0.0001 -
7.7848 6150 0.0001 -
7.8481 6200 0.0001 -
7.9114 6250 0.0001 -
7.9747 6300 0.0001 -
8.0380 6350 0.0001 -
8.1013 6400 0.0001 -
8.1646 6450 0.0001 -
8.2278 6500 0.0001 -
8.2911 6550 0.0001 -
8.3544 6600 0.0001 -
8.4177 6650 0.0001 -
8.4810 6700 0.0001 -
8.5443 6750 0.0001 -
8.6076 6800 0.0001 -
8.6709 6850 0.0001 -
8.7342 6900 0.0001 -
8.7975 6950 0.0001 -
8.8608 7000 0.0001 -
8.9241 7050 0.0001 -
8.9873 7100 0.0001 -
9.0506 7150 0.0001 -
9.1139 7200 0.0001 -
9.1772 7250 0.0001 -
9.2405 7300 0.0001 -
9.3038 7350 0.0001 -
9.3671 7400 0.0001 -
9.4304 7450 0.0001 -
9.4937 7500 0.0001 -
9.5570 7550 0.0001 -
9.6203 7600 0.0001 -
9.6835 7650 0.0001 -
9.7468 7700 0.0001 -
9.8101 7750 0.0001 -
9.8734 7800 0.0001 -
9.9367 7850 0.0001 -
10.0 7900 0.0001 -

Framework Versions

  • Python: 3.10.8
  • SetFit: 1.1.0
  • Sentence Transformers: 3.1.1
  • Transformers: 4.38.2
  • PyTorch: 2.1.2
  • Datasets: 2.17.1
  • Tokenizers: 0.15.0

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
5
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for victomoe/setfit-intent-classifier-3

Finetuned
(260)
this model