metadata

library_name: setfit
tags:
  - setfit
  - sentence-transformers
  - text-classification
  - generated_from_setfit_trainer
base_model: avsolatorio/GIST-Embedding-v0
metrics:
  - accuracy
widget:
  - text: >-
      CON - Conversion - Socure Integration and Optimization: The project aims
      to integrate and optimize the Socure identity verification system into the
      existing platform. This includes updating the user interface, improving
      the mobile verification flow, and addressing issues related to the
      transition from the Berbix system to Socure. The project also involves
      enhancing the admin experience, refining the verification status polling
      route, and ensuring the system handles errors and failures effectively.
  - text: >-
      Webhook Optimization and Error Handling Improvement: This project focuses
      on enhancing the reliability and efficiency of webhook integrations within
      a software platform. The main objectives include preventing webhooks from
      retrying upon failure, addressing specific webhook issues, improving error
      message clarity, and reducing error-related noise. Additionally, the
      project aims to ensure critical errors are monitored effectively, and
      error displays in sync operations are corrected. This initiative will lead
      to a more stable and user-friendly integration experience, minimizing
      disruptions and improving overall system performance.
  - text: >-
      Groups and Home: Build group feature with group chat, and important chat
      features (reactions, reply, repost). 

      Added calendaring and polling capability (related to scheduling). 

      Ability to create a group, invite users to a group, manage the group
      (change settings, remove members, admin functions, invitations process,
      etc.). 

      Built split bill, photo/video sharing.  

      Also built DM/Subchat capability (for users to communicate separate from
      the group).  

      Build engagement features such as typing indicators, "who read message"
      read status, delivered status and more.  
  - text: >-
      CX - Customer Experience - Enhanced Security and User Experience: This
      project focuses on enhancing the reliability and quality of the user
      experience and security across the platform. This involved updating the
      2FA system to increase security for users with new reset request features
      and enhance the usability of them. Further work improved the KYC process
      which is central to the customer use of the product by streamlining KYC
      reminder and review queues and ensuring customer information is compliant
      and up-to-date with changing regulations. 


      Other aspects focused on developing functionality for managing documents
      securely. The system will also provide secure document transfer and a
      dashboard for both users and admins. It ensures appropriate audit logging
      and includes the development of routes, handlers, and data models for
      efficient workflow. Finally the project improved search and feedback
      functionality in the help center to improve reliability for users.
  - text: >-
      AI Content Generation Engine: Company needed to invent a programmatically
      usable, reliable way to generate learning content for professionals. No
      existing solutions satisfy the requirements of being able to ask a
      question, and repeatedly product reliable content tailored towards a
      target audience.
pipeline_tag: text-classification
inference: true

SetFit with avsolatorio/GIST-Embedding-v0

This is a SetFit model that can be used for Text Classification. This SetFit model uses avsolatorio/GIST-Embedding-v0 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Sentence Transformer body: avsolatorio/GIST-Embedding-v0
Classification head: a LogisticRegression instance
Maximum Sequence Length: 512 tokens
Number of Classes: 2 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label	Examples
1	'Tax System Revamp: Enhancing the tax filing experience by overhauling the system with new functionalities, including state-specific updates, direct deposit, and advanced error detection. Integrates cutting-edge features like data prefill, OCR for form uploads, and improved PDF management for a seamless, secure, and efficient filing process across devices.' 'CHREC - Choice Reconciliation - Financial Reconciliation System: This project involves the development of a system to reconcile financial transactions. The system will handle various types of transactions such as wires, checks, and ACH transfers. It will identify and resolve discrepancies in transaction amounts, ensuring the accuracy and integrity of financial data.' 'DPROIC Electronics: Creation of a composite focal plane array (CFPA) using multiple Digital Pixel Readout Integrated Circuits (DPROICs), capable of yielding the performance of a very large imaging chip but comprised of multiple readily available smaller chips. This CFPA design is an innovative custom product able to be substituted for large format ROICs using existing hardware.'
0	"Infrastructure and Data Environment Enhancement: This project focuses on enhancing the company's data processing and storage infrastructure across various environments (development, QA, stage, production). It involves creating and managing resources like Amazon ECR repositories, Kubernetes namespaces, and AWS S3 buckets to support various services such as Cerebrum, Reconciler, and Graphcast. Additionally, it includes setting up OpenSearch clusters for improved search capabilities, configuring access permissions, and ensuring seamless deployment and integration of services like neo4j, PostgreSQL databases, and FastAPI applications. The goal is to optimize data management, search functionality, and application performance, facilitating better risk analysis and compliance monitoring." 'eBay Seller Refurbishment Receive and Grade: A program that secret shops sellers that want to be part of the eBay Seller Refurbished program. Additionally items are re-listed on the eBay platform to be sold.' 'Quality Automation Resources MVP: This project aims to significantly improve the quality and efficiency of software testing processes. By expanding test scenario capabilities using advanced AI, enhancing educational content for automation, conducting performance/load and API testing, improving documentation, and implementing data collection with dashboards, the project seeks to provide a comprehensive upgrade to the current testing framework. This will enable more robust, efficient, and insightful testing practices, ensuring higher software quality for users.'

Label

Examples

'Tax System Revamp: Enhancing the tax filing experience by overhauling the system with new functionalities, including state-specific updates, direct deposit, and advanced error detection. Integrates cutting-edge features like data prefill, OCR for form uploads, and improved PDF management for a seamless, secure, and efficient filing process across devices.'
'CHREC - Choice Reconciliation - Financial Reconciliation System: This project involves the development of a system to reconcile financial transactions. The system will handle various types of transactions such as wires, checks, and ACH transfers. It will identify and resolve discrepancies in transaction amounts, ensuring the accuracy and integrity of financial data.'
'DPROIC Electronics: Creation of a composite focal plane array (CFPA) using multiple Digital Pixel Readout Integrated Circuits (DPROICs), capable of yielding the performance of a very large imaging chip but comprised of multiple readily available smaller chips. This CFPA design is an innovative custom product able to be substituted for large format ROICs using existing hardware.'

"Infrastructure and Data Environment Enhancement: This project focuses on enhancing the company's data processing and storage infrastructure across various environments (development, QA, stage, production). It involves creating and managing resources like Amazon ECR repositories, Kubernetes namespaces, and AWS S3 buckets to support various services such as Cerebrum, Reconciler, and Graphcast. Additionally, it includes setting up OpenSearch clusters for improved search capabilities, configuring access permissions, and ensuring seamless deployment and integration of services like neo4j, PostgreSQL databases, and FastAPI applications. The goal is to optimize data management, search functionality, and application performance, facilitating better risk analysis and compliance monitoring."
'eBay Seller Refurbishment Receive and Grade: A program that secret shops sellers that want to be part of the eBay Seller Refurbished program. Additionally items are re-listed on the eBay platform to be sold.'
'Quality Automation Resources MVP: This project aims to significantly improve the quality and efficiency of software testing processes. By expanding test scenario capabilities using advanced AI, enhancing educational content for automation, conducting performance/load and API testing, improving documentation, and implementing data collection with dashboards, the project seeks to provide a comprehensive upgrade to the current testing framework. This will enable more robust, efficient, and insightful testing practices, ensuring higher software quality for users.'

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("setfit_model_id")
# Run inference
preds = model("AI Content Generation Engine: Company needed to invent a programmatically usable, reliable way to generate learning content for professionals. No existing solutions satisfy the requirements of being able to ask a question, and repeatedly product reliable content tailored towards a target audience.")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	18	75.5789	397

Label	Training Sample Count
0	146
1	82

Training Hyperparameters

batch_size: (8, 8)
num_epochs: (3, 3)
max_steps: -1
sampling_strategy: oversampling
num_iterations: 20
body_learning_rate: (2e-05, 1e-05)
head_learning_rate: 0.01
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: True
warmup_proportion: 0.1
seed: 42
eval_max_steps: -1
load_best_model_at_end: False

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0009	1	0.2391	-
0.8772	1000	0.0011	-
1.7544	2000	0.0009	-
2.6316	3000	0.0008	-

Framework Versions

Python: 3.9.16
SetFit: 1.0.3
Sentence Transformers: 3.1.1
Transformers: 4.39.0
PyTorch: 2.4.1+cu121
Datasets: 3.0.0
Tokenizers: 0.15.2

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}