metadata
library_name: setfit
tags:
- setfit
- sentence-transformers
- text-classification
- generated_from_setfit_trainer
base_model: avsolatorio/GIST-Embedding-v0
metrics:
- accuracy
widget:
- text: >-
CON - Conversion - Socure Integration and Optimization: The project aims
to integrate and optimize the Socure identity verification system into the
existing platform. This includes updating the user interface, improving
the mobile verification flow, and addressing issues related to the
transition from the Berbix system to Socure. The project also involves
enhancing the admin experience, refining the verification status polling
route, and ensuring the system handles errors and failures effectively.
- text: >-
Webhook Optimization and Error Handling Improvement: This project focuses
on enhancing the reliability and efficiency of webhook integrations within
a software platform. The main objectives include preventing webhooks from
retrying upon failure, addressing specific webhook issues, improving error
message clarity, and reducing error-related noise. Additionally, the
project aims to ensure critical errors are monitored effectively, and
error displays in sync operations are corrected. This initiative will lead
to a more stable and user-friendly integration experience, minimizing
disruptions and improving overall system performance.
- text: >-
Groups and Home: Build group feature with group chat, and important chat
features (reactions, reply, repost).
Added calendaring and polling capability (related to scheduling).
Ability to create a group, invite users to a group, manage the group
(change settings, remove members, admin functions, invitations process,
etc.).
Built split bill, photo/video sharing.
Also built DM/Subchat capability (for users to communicate separate from
the group).
Build engagement features such as typing indicators, "who read message"
read status, delivered status and more.
- text: >-
CX - Customer Experience - Enhanced Security and User Experience: This
project focuses on enhancing the reliability and quality of the user
experience and security across the platform. This involved updating the
2FA system to increase security for users with new reset request features
and enhance the usability of them. Further work improved the KYC process
which is central to the customer use of the product by streamlining KYC
reminder and review queues and ensuring customer information is compliant
and up-to-date with changing regulations.
Other aspects focused on developing functionality for managing documents
securely. The system will also provide secure document transfer and a
dashboard for both users and admins. It ensures appropriate audit logging
and includes the development of routes, handlers, and data models for
efficient workflow. Finally the project improved search and feedback
functionality in the help center to improve reliability for users.
- text: >-
AI Content Generation Engine: Company needed to invent a programmatically
usable, reliable way to generate learning content for professionals. No
existing solutions satisfy the requirements of being able to ask a
question, and repeatedly product reliable content tailored towards a
target audience.
pipeline_tag: text-classification
inference: true
SetFit with avsolatorio/GIST-Embedding-v0
This is a SetFit model that can be used for Text Classification. This SetFit model uses avsolatorio/GIST-Embedding-v0 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.
The model has been trained using an efficient few-shot learning technique that involves:
- Fine-tuning a Sentence Transformer with contrastive learning.
- Training a classification head with features from the fine-tuned Sentence Transformer.
Model Details
Model Description
- Model Type: SetFit
- Sentence Transformer body: avsolatorio/GIST-Embedding-v0
- Classification head: a LogisticRegression instance
- Maximum Sequence Length: 512 tokens
- Number of Classes: 2 classes
Model Sources
- Repository: SetFit on GitHub
- Paper: Efficient Few-Shot Learning Without Prompts
- Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts
Model Labels
Label | Examples |
---|---|
1 |
|
0 |
|
Uses
Direct Use for Inference
First install the SetFit library:
pip install setfit
Then you can load this model and run inference.
from setfit import SetFitModel
# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("setfit_model_id")
# Run inference
preds = model("AI Content Generation Engine: Company needed to invent a programmatically usable, reliable way to generate learning content for professionals. No existing solutions satisfy the requirements of being able to ask a question, and repeatedly product reliable content tailored towards a target audience.")
Training Details
Training Set Metrics
Training set | Min | Median | Max |
---|---|---|---|
Word count | 18 | 75.5789 | 397 |
Label | Training Sample Count |
---|---|
0 | 146 |
1 | 82 |
Training Hyperparameters
- batch_size: (8, 8)
- num_epochs: (3, 3)
- max_steps: -1
- sampling_strategy: oversampling
- num_iterations: 20
- body_learning_rate: (2e-05, 1e-05)
- head_learning_rate: 0.01
- loss: CosineSimilarityLoss
- distance_metric: cosine_distance
- margin: 0.25
- end_to_end: False
- use_amp: True
- warmup_proportion: 0.1
- seed: 42
- eval_max_steps: -1
- load_best_model_at_end: False
Training Results
Epoch | Step | Training Loss | Validation Loss |
---|---|---|---|
0.0009 | 1 | 0.2391 | - |
0.8772 | 1000 | 0.0011 | - |
1.7544 | 2000 | 0.0009 | - |
2.6316 | 3000 | 0.0008 | - |
Framework Versions
- Python: 3.9.16
- SetFit: 1.0.3
- Sentence Transformers: 3.1.1
- Transformers: 4.39.0
- PyTorch: 2.4.1+cu121
- Datasets: 3.0.0
- Tokenizers: 0.15.2
Citation
BibTeX
@article{https://doi.org/10.48550/arxiv.2209.11055,
doi = {10.48550/ARXIV.2209.11055},
url = {https://arxiv.org/abs/2209.11055},
author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Efficient Few-Shot Learning Without Prompts},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}