metadata

base_model: BAAI/bge-base-en-v1.5
library_name: setfit
metrics:
  - accuracy
pipeline_tag: text-classification
tags:
  - setfit
  - sentence-transformers
  - text-classification
  - generated_from_setfit_trainer
widget:
  - text: >-
      Reasoning:


      **Why the answer may be good:**

      1. **Context Grounding**: The answer lists different types of toys
      suitable for rabbits, such as small animal toys, wooden blocks, and
      non-toxic plastic balls, which aligns well with the document's mentions of
      "wooden rabbit chew toys," "cat jingles balls," and other similar options.

      2. **Relevance**: The answer is directly related to the question of how to
      choose toys for a rabbit, providing clear examples and suggestions on
      suitable toys and their benefits for the rabbit.

      3. **Conciseness**: The answer is generally concise and informative,
      discussing specific toy types and why they are beneficial, in a
      straightforward manner.


      **Why the answer may be bad:**

      1. **Context Grounding**: While the answer is generally aligned with the
      suggestions from the document, it does not mention some crucial points
      from the document such as avoiding toxic materials, small parts, or the
      importance of regularly checking the toys for wear and tear.

      2. **Relevance**: The answer omits some key advice from the document
      regarding safety considerations (e.g., materials and types of wood to
      avoid).

      3. **Conciseness**: The answer is mostly concise; however, it could be
      more comprehensive by including some of the safety tips and other
      precautions mentioned in the document, without making it overly lengthy.


      Final Result:

      ****
  - text: >-
      **Reasoning:**


      ### Why the Answer May Be Good:

      1. **Context Grounding**: The answer correctly references Martin Allen
      signing Kieron Freeman when he went on loan to Notts County, which is
      supported by the document.

      2. **Relevance**: It directly names the manager who signed Kieron Freeman,
      which is related to the document’s content.


      ### Why the Answer May Be Bad:

      1. **Context Grounding**: The provided document discusses Kieron Freeman
      and his football career; however, the document does not mention Aaron
      Pryor or any details about his boxing career.

      2. **Relevance**: The question specifically asks about Aaron Pryor’s
      manager during his boxing career, not about Kieron Freeman's football
      career or the managers associated with Freeman.

      3. **Conciseness**: The answer, although addressing the content correctly,
      does not respond to the specific question asked.


      ### Final Result:

      **Bad**


      The answer does not address the question about Aaron Pryor's boxing career
      and his manager, making it irrelevant despitebeing correct within its own
      footballing context.
  - text: >-
      Reasoning for why the answer may be good:

      1. The provided answer does touch on a general concern associated with
      online casinos, which is user data security.

      2. Security concerns, especially loss or compromise of data, are common
      issues related to online activities including online gambling.


      Reasoning for why the answer may be bad:

      1. The document provided does not seem to contain any content from July
      10, 2011, or any specific message related to that date.

      2. The answer does not address the specific question about the concern of
      the husband of the person who wrote the message on July 10, 2011.

      3. There is no indication or context from the provided document that
      discusses the husband or a message from July 10, 2011.


      Final result:
  - text: >-
      Reasoning why the answer may be good:


      1. The answer attempts to provide steps on how to train a dog from running
      out of the house, which is directly related to the question.

      2. It suggests using a command ("fly") and a toy as a reward, aligning
      with typical training methods where behavior is rewarded.

      3. It touches on the aspect of preventing the dog from running out of the
      house, addressing the core issue.


      Reasoning why the answer may be bad:


      1. Context Grounding: The answer is not well-supported by the provided
      document. The document mentions commands such as "sit" and "stay," and
      strategies like using treats, clickers, and physical barriers, but does
      not mention a "fly" command or holding hands above the dog's tail.

      2. Relevance: The answer deviates by suggesting a "fly" command, holding
      toys above the tail, and the use of a magic spell, which are not grounded
      in the document or part of standard dog training techniques.

      3. Conciseness: The mention of using a magic spell and providing minimal
      exercise at night are unnecessary and irrelevant, making the answer less
      clear and to the point.


      Final Result:
  - text: >-
      Reasoning:


      The answer is directly taken from the provided document, specifically from
      the line "Allan Cox's First Class Delivery on a H128-10W for his Level 1
      certification flight." This indicates that the information is
      well-supported and context-grounded.


      The answer is relevant as it directly addresses the specific question
      about the type of engine used for Allan Cox's Level 1 certification
      flight.


      The answer is concise and to the point without any unnecessary
      information.


      Final Result:
inference: true

SetFit with BAAI/bge-base-en-v1.5

This is a SetFit model that can be used for Text Classification. This SetFit model uses BAAI/bge-base-en-v1.5 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Sentence Transformer body: BAAI/bge-base-en-v1.5
Classification head: a LogisticRegression instance
Maximum Sequence Length: 512 tokens
Number of Classes: 2 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label	Examples
0	"Reasoning:\n\nWhy the Answer May Be Good:\n1. Context Grounding: The answer references the points made in the document, such as Coach Brian Shaw's strategy of pushing the ball after makes and misses as well as encouraging players to take the first available shot within the rhythm of the offense.\n2. Relevance: The answer directly addresses why the Nuggets are having an offensive outburst, highlighting the coaching strategy and players' adaptation.\n3. Conciseness: The answer is mostly to the point and focuses on the main question.\n\nWhy the Answer May Be Bad:\n1. Context Grounding: The mention of a new training technique using virtual reality is not supported by any information within the document provided.\n2. Conciseness: The additional detail about the virtual reality training is unnecessary given that it is not referenced in the document and does not contribute to answering the specific question about the offensive outburst.\n \nFinal Result:\nBased on the evaluation criteria, the inclusion of fictitious or unsupported information about the virtual reality training significantly detracts from the answer’s credibility and relevance.\n\n" 'Reasoning why the answer may be good:\n1. Context Grounding: The provided answer cites specific information about film and digital photography directly from the provided document, showing a good grounding.\n2. Relevance: The answer addresses the specific question by discussing different aspects such as exposure tolerance, color capture, and overall image resolution between film and digital photography.\n3. Conciseness: The answer is relatively concise and sticks to the main points relevant to the question without unnecessary elaboration.\n\nReasoning why the answer may be bad:\n1. Overly Detailed: The answer could be seen as too detailed in certain segments, which might slightly detract from conciseness.\n2. Possible Confusion: The mention of specific technical details like "5MP digital sensors" could confuse readers who are not familiar with the technical specifications, detracting from clarity.\n3. Omission of Key Comparison Points: The answer does not touch upon some of the more subjective observations made by the author, like the practical advantages in using film forcertain types of photography.\n\nFinal Result:' 'Reasoning:\n1. Context Grounding: The answer provided does not reference the third book of the Arcana Chronicles by Kresley Cole or even discuss any content relevant to it. Instead, it discusses an MMA event in Calgary, Alberta, Canada.\n2. Relevance: The answer is entirely irrelevant to the question. The question is about the main conflict in the third book of a specific book series, but the answer describes an MMA fight event.\n3. Conciseness: While the answer is concise in its context, it is entirely off-topic and therefore does not satisfy the conciseness criterion in a meaningful way.\n\nThe answer may be deemed bad because it does not address the question about the Arcana Chronicles at all and instead provides unrelated information about an MMA event.\n\nFinal result:'
1	'Reasoning:\n\n1. Context Grounding:\n - Good: The answer is supported by the document. The suggestions mentioned (getting to know the client, signing a contract, and showcasing honesty and diplomacy) are directly referenced in the text provided.\n - Bad: There is no significant bad aspect in terms of context grounding; the answer sticks closely to the source material.\n\n2. Relevance:\n - Good: The answer is highly relevant to the question about best practices to avoid unnecessary revisions and conflicts. It addresses client understanding, contractual agreements, and the handling of extra charges—all crucial for minimizing conflicts.\n - Bad: There is no deviation from the topic. The answer is focused solely on the best practices, as asked in the question.\n\n3. Conciseness:\n - Good: The answer is concise and to the point, effectively summarizing the practices without unnecessary details.\n - Bad: The level of detail might be too succinct for some readers looking for more in-depth discussion, but this is minor given the criteria.\n\nFinal Result:' "Reasoning for why the answer may be good:\n- The answer references the author’s emphasis on drawing from personal experiences of pain and emotion to create genuine and relatable characters, which is well-supported by the document.\n- It highlights the importance of genuineness and relatability, which aligns directly with the content provided in the document.\n- The answer stays focused on the specific question about creating a connection between the reader and the characters.\n\nReasoning for why the answer may be bad:\n- The answer could be seen as slightly verbose and might include more detail than necessary, rather than being extremely concise.\n- It does not explicitly mention the document's use of pain for romance authors specifically, which might add to the context.\n\nFinal result:" "Reasoning:\n\nPros:\n1. Context Grounding: The document explicitly states that Mauro Rubin is the CEO of JoinPad and mentions that he was speaking at the event, which directly supports the answer.\n2. Relevance: The answer directly and correctly responds to the question about the CEO's identity during the event.\n3. Conciseness: The answer is brief and to the point, providing only the necessary information.\n\nCons:\n- There are no significant cons as the answer fulfillsall criteria effectively.\n\nFinal Result:"

Label

Examples

"Reasoning:\n\nWhy the Answer May Be Good:\n1. Context Grounding: The answer references the points made in the document, such as Coach Brian Shaw's strategy of pushing the ball after makes and misses as well as encouraging players to take the first available shot within the rhythm of the offense.\n2. Relevance: The answer directly addresses why the Nuggets are having an offensive outburst, highlighting the coaching strategy and players' adaptation.\n3. Conciseness: The answer is mostly to the point and focuses on the main question.\n\nWhy the Answer May Be Bad:\n1. Context Grounding: The mention of a new training technique using virtual reality is not supported by any information within the document provided.\n2. Conciseness: The additional detail about the virtual reality training is unnecessary given that it is not referenced in the document and does not contribute to answering the specific question about the offensive outburst.\n \nFinal Result:\nBased on the evaluation criteria, the inclusion of fictitious or unsupported information about the virtual reality training significantly detracts from the answer’s credibility and relevance.\n\n"
'Reasoning why the answer may be good:\n1. Context Grounding: The provided answer cites specific information about film and digital photography directly from the provided document, showing a good grounding.\n2. Relevance: The answer addresses the specific question by discussing different aspects such as exposure tolerance, color capture, and overall image resolution between film and digital photography.\n3. Conciseness: The answer is relatively concise and sticks to the main points relevant to the question without unnecessary elaboration.\n\nReasoning why the answer may be bad:\n1. Overly Detailed: The answer could be seen as too detailed in certain segments, which might slightly detract from conciseness.\n2. Possible Confusion: The mention of specific technical details like "5MP digital sensors" could confuse readers who are not familiar with the technical specifications, detracting from clarity.\n3. Omission of Key Comparison Points: The answer does not touch upon some of the more subjective observations made by the author, like the practical advantages in using film forcertain types of photography.\n\nFinal Result:'
'Reasoning:\n1. Context Grounding: The answer provided does not reference the third book of the Arcana Chronicles by Kresley Cole or even discuss any content relevant to it. Instead, it discusses an MMA event in Calgary, Alberta, Canada.\n2. Relevance: The answer is entirely irrelevant to the question. The question is about the main conflict in the third book of a specific book series, but the answer describes an MMA fight event.\n3. Conciseness: While the answer is concise in its context, it is entirely off-topic and therefore does not satisfy the conciseness criterion in a meaningful way.\n\nThe answer may be deemed bad because it does not address the question about the Arcana Chronicles at all and instead provides unrelated information about an MMA event.\n\nFinal result:'

'Reasoning:\n\n1. Context Grounding:\n - Good: The answer is supported by the document. The suggestions mentioned (getting to know the client, signing a contract, and showcasing honesty and diplomacy) are directly referenced in the text provided.\n - Bad: There is no significant bad aspect in terms of context grounding; the answer sticks closely to the source material.\n\n2. Relevance:\n - Good: The answer is highly relevant to the question about best practices to avoid unnecessary revisions and conflicts. It addresses client understanding, contractual agreements, and the handling of extra charges—all crucial for minimizing conflicts.\n - Bad: There is no deviation from the topic. The answer is focused solely on the best practices, as asked in the question.\n\n3. Conciseness:\n - Good: The answer is concise and to the point, effectively summarizing the practices without unnecessary details.\n - Bad: The level of detail might be too succinct for some readers looking for more in-depth discussion, but this is minor given the criteria.\n\nFinal Result:'
"Reasoning for why the answer may be good:\n- The answer references the author’s emphasis on drawing from personal experiences of pain and emotion to create genuine and relatable characters, which is well-supported by the document.\n- It highlights the importance of genuineness and relatability, which aligns directly with the content provided in the document.\n- The answer stays focused on the specific question about creating a connection between the reader and the characters.\n\nReasoning for why the answer may be bad:\n- The answer could be seen as slightly verbose and might include more detail than necessary, rather than being extremely concise.\n- It does not explicitly mention the document's use of pain for romance authors specifically, which might add to the context.\n\nFinal result:"
"Reasoning:\n\nPros:\n1. Context Grounding: The document explicitly states that Mauro Rubin is the CEO of JoinPad and mentions that he was speaking at the event, which directly supports the answer.\n2. Relevance: The answer directly and correctly responds to the question about the CEO's identity during the event.\n3. Conciseness: The answer is brief and to the point, providing only the necessary information.\n\nCons:\n- There are no significant cons as the answer fulfillsall criteria effectively.\n\nFinal Result:"

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("Netta1994/setfit_baai_gpt-4o_improved-cot-instructions_two_reasoning_remove_final_evaluation_e1")
# Run inference
preds = model("Reasoning:

The answer is directly taken from the provided document, specifically from the line \"Allan Cox's First Class Delivery on a H128-10W for his Level 1 certification flight.\" This indicates that the information is well-supported and context-grounded.

The answer is relevant as it directly addresses the specific question about the type of engine used for Allan Cox's Level 1 certification flight.

The answer is concise and to the point without any unnecessary information.

Final Result:")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	45	125.3464	274

Label	Training Sample Count
0	199
1	208

Training Hyperparameters

batch_size: (16, 16)
num_epochs: (1, 1)
max_steps: -1
sampling_strategy: oversampling
num_iterations: 20
body_learning_rate: (2e-05, 2e-05)
head_learning_rate: 2e-05
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: False
warmup_proportion: 0.1
l2_weight: 0.01
seed: 42
eval_max_steps: -1
load_best_model_at_end: False

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0010	1	0.1714	-
0.0491	50	0.2595	-
0.0982	100	0.2566	-
0.1473	150	0.2328	-
0.1965	200	0.0808	-
0.2456	250	0.0481	-
0.2947	300	0.0402	-
0.3438	350	0.0251	-
0.3929	400	0.0204	-
0.4420	450	0.0194	-
0.4912	500	0.0175	-
0.5403	550	0.0146	-
0.5894	600	0.0088	-
0.6385	650	0.0052	-
0.6876	700	0.0053	-
0.7367	750	0.0033	-
0.7859	800	0.0028	-
0.8350	850	0.0026	-
0.8841	900	0.0031	-
0.9332	950	0.0022	-
0.9823	1000	0.0021	-

Framework Versions

Python: 3.10.14
SetFit: 1.1.0
Sentence Transformers: 3.1.1
Transformers: 4.44.0
PyTorch: 2.4.0+cu121
Datasets: 3.0.0
Tokenizers: 0.19.1

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}