sharukat's picture
Add SetFit model
f7c265f verified
|
raw
history blame
54.5 kB
metadata
library_name: setfit
tags:
  - setfit
  - sentence-transformers
  - text-classification
  - generated_from_setfit_trainer
metrics:
  - accuracy
  - precision
  - recall
  - f1
widget:
  - text: >
      <p>I'm having a problem serving my text classification model on
      <code>Tensorflow 1.12</code>. I'm using
      <code>tf.estimator.inputs.pandas_input_fn</code> to read in my data, and
      <code>tf.estimator.DNNClassifier</code> to train/evaluate. I'd then like
      to serve my model.

      (Apologies in advance, it's tough to provide a full working example here,
      but it's very much like the example TF provides at <a
      href="https://www.tensorflow.org/api_docs/python/tf/estimator/DNNClassifier"
      rel="nofollow
      noreferrer">https://www.tensorflow.org/api_docs/python/tf/estimator/DNNClassifier</a> 
      )</p>


      <p>I'm currently saving my model with ...</p>


      <pre class="lang-py prettyprint-override"><code>...

      estimator.export_savedmodel("./TEST_SERVING/",
      self.serving_input_receiver_fn, strip_default_attrs=True)

      ...

      def serving_input_receiver_fn(self):
            """An input receiver that expects a serialized tf.Example."""

            # feature spec dictionary  determines our input parameters for the model
            feature_spec = {
                'Headline': tf.VarLenFeature(dtype=tf.string),
                'Description': tf.VarLenFeature(dtype=tf.string)
            }

            # the inputs will be initially fed as strings with data serialized by
            # Google ProtoBuffers
            serialized_tf_example = tf.placeholder(
                dtype=tf.string, shape=None, name='input_example_tensor')
            receiver_tensors = {'examples': serialized_tf_example}

            # deserialize input
            features = tf.parse_example(serialized_tf_example, feature_spec)
            return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)


      </code></pre>


      <p>This actually fails to run with the error:</p>


      <pre class="lang-sh prettyprint-override"><code>TypeError: Failed to
      convert object of type &lt;class
      'tensorflow.python.framework.sparse_tensor.SparseTensor'&gt; to Tensor.
      Contents: SparseTensor(indices=Tensor("ParseExample/ParseExample:0",
      shape=(?, 2), 

      dtype=int64), values=Tensor("ParseExample/ParseExample:2", shape=(?,),
      dtype=string), dense_shape=Tensor("ParseExample/ParseExample:4",
      shape=(2,), dtype=int64)). Consider casting elements to a supported type.


      </code></pre>


      <p>I tried to save a second way doing:</p>


      <pre class="lang-py prettyprint-override"><code>def
      serving_input_receiver_fn(self):
        """Build the serving inputs."""
        INPUT_COLUMNS = ["Headline","Description"]
        inputs = {}
        for feat in INPUT_COLUMNS:
          inputs[feat] = tf.placeholder(shape=[None], dtype=tf.string, name=feat)
        return tf.estimator.export.ServingInputReceiver(inputs, inputs)
      </code></pre>


      <p>This actually works, until I try testing it with the
      <code>saved_model_cli</code>.

      Some output for <code>saved_model_cli show --all --dir
      TEST_SERVING/1553879255/</code>:</p>


      <pre class="lang-sh prettyprint-override"><code>MetaGraphDef with tag-set:
      'serve' contains the following SignatureDefs:


      signature_def['predict']:
        The given SavedModel SignatureDef contains the following input(s):
          inputs['Description'] tensor_info:
              dtype: DT_STRING
              shape: (-1)
              name: Description:0
          inputs['Headline'] tensor_info:
              dtype: DT_STRING
              shape: (-1)
              name: Headline:0
        The given SavedModel SignatureDef contains the following output(s):
          outputs['class_ids'] tensor_info:
              dtype: DT_INT64
              shape: (-1, 1)
              name: dnn/head/predictions/ExpandDims:0
          outputs['classes'] tensor_info:
              dtype: DT_STRING
              shape: (-1, 1)
              name: dnn/head/predictions/str_classes:0
          outputs['logits'] tensor_info:
              dtype: DT_FLOAT
              shape: (-1, 3)
              name: dnn/logits/BiasAdd:0
          outputs['probabilities'] tensor_info:
              dtype: DT_FLOAT
              shape: (-1, 3)
              name: dnn/head/predictions/probabilities:0
        Method name is: tensorflow/serving/predict

      </code></pre>


      <p>But now I can't seem to test it.</p>


      <pre class="lang-sh prettyprint-override"><code>&gt;&gt;&gt;
      saved_model_cli run --dir TEST_SERVING/1553879255/ --tag_set serve
      --signature_def predict --input_examples 'inputs=[{"Description":["What is
      going on"],"Headline":["Help me"]}]'

      Traceback (most recent call last):
       ...
        File "/Users/Josh/miniconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/tools/saved_model_cli.py", line 489, in _create_example_string
          feature_list)
      TypeError: 'What is going on' has type str, but expected one of: bytes


      </code></pre>


      <p>Ok, lets turn it into a bytes object by changing to <code>b["What is
      going on"]</code> and <code>b["Help me"]</code>...</p>


      <pre class="lang-sh prettyprint-override"><code>ValueError: Type &lt;class
      'bytes'&gt; for value b'What is going on' is not supported for
      tf.train.Feature.

      </code></pre>


      <p>Any ideas/thoughts??

      Thanks!</p>
  - text: >
      <p>In tensorflow <code>tf.keras.Model.compile</code>, you can pass a
      <code>lambda y_true, y_pred: val</code> function as a metric (though, it
      seems not documented), but I asked my self : "How does it aggregate it
      over the batches" ?</p>


      <p>I searched the documentation, but I've found nowhere how it is done
      ?</p>


      <p>By the way, I don't even know if it is an undefined behavior to do so
      and one should instead subclass the Metric class ? ( or at least provide
      the required methods).</p>


      <p>Also, is it pertinent to pass a loss as a metric (and in this case,
      same question : how is it aggregated over the batches ? )</p>
  - text: >
      <p>I'm working on a project where I have trained a series of binary
      classifiers with <strong>Keras</strong>, with <strong>Tensorflow</strong>
      as the backend engine. The input data I have is a series of images, where
      each binary classifier must make the prediction on the images, later I
      save the predictions on a CSV file.</p>

      <p>The problem I have is when I get the predictions from the first series
      of binary classifiers there isn't any warning, but when the 5th or 6th
      binary classifier calls the method <strong>predict</strong> on the input
      data I get the following warning:</p>

      <blockquote>

      <p>WARNING:tensorflow:5 out of the last 5 calls to &lt;function

      Model.make_predict_function..predict_function at

      0x2b280ff5c158&gt; triggered tf.function retracing. Tracing is expensive

      and the excessive number of tracings could be due to (1) creating

      @tf.function repeatedly in a loop, (2) passing tensors with different

      shapes, (3) passing Python objects instead of tensors. For (1), please

      define your @tf.function outside of the loop. For (2), @tf.function

      has experimental_relax_shapes=True option that relaxes argument shapes

      that can avoid unnecessary retracing. For (3), please refer to

      <a
      href="https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args"
      rel="noreferrer">https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args</a>

      and <a href="https://www.tensorflow.org/api_docs/python/tf/function"
      rel="noreferrer">https://www.tensorflow.org/api_docs/python/tf/function</a>
      for  more

      details.</p>

      </blockquote>

      <p>To answer each point in the parenthesis, here are my answers:</p>

      <ol>

      <li>The <strong>predict</strong> method is called inside a for loop.</li>

      <li>I don't pass tensors but a list of <strong>NumPy arrays</strong> of
      gray scale images, all of them with the same size in width and height. The
      only thing that can change is the batch size because the list can have
      only 1 image or more than one.</li>

      <li>As I wrote in point 2, I pass a list of NumPy arrays.</li>

      </ol>

      <p>I have debugged my program and found that this warning always happens
      when the method predict is called. To summarize the code I have written is
      the following:</p>

      <pre><code>import cv2 as cv

      import tensorflow as tf

      from tensorflow.keras.models import load_model

      # Load the models

      binary_classifiers = [load_model(path) for path in path2models]

      # Get the images

      images = [#Load the images with OpenCV]

      # Apply the resizing and reshapes on the images.

      my_list = list()

      for image in images:
          image_reworked = # Apply the resizing and reshaping on images
          my_list.append(image_reworked)

      # Get the prediction from each model

      # This is where I get the warning

      predictions = [model.predict(x=my_list,verbose=0) for model in
      binary_classifiers]

      </code></pre>

      <h3>What I have tried</h3>

      <p>I have defined a function as tf.function and putted the code of the
      predictions inside the tf.function like this</p>

      <pre><code>@tf.function

      def testing(models, faces):
          return [model.predict(x=faces,verbose=0) for model in models]
      </code></pre>

      <p>But I ended up getting the following error:</p>

      <blockquote>

      <p>RuntimeError: Detected a call to <code>Model.predict</code> inside a

      <code>tf.function</code>. Model.predict is a high-level endpoint that
      manages

      its own <code>tf.function</code>. Please move the call to
      <code>Model.predict</code> outside

      of all enclosing <code>tf.function</code>s. Note that you can call a
      <code>Model</code>

      directly on Tensors inside a <code>tf.function</code> like:
      <code>model(x)</code>.</p>

      </blockquote>

      <p>So calling the method <code>predict</code> is basically already a
      tf.function. So it's useless to define a tf.function when the warning I
      get it's from that method.</p>

      <p>I have also checked those other two questions:</p>

      <ol>

      <li><a
      href="https://stackoverflow.com/questions/61647404/tensorflow-2-getting-warningtensorflow9-out-of-the-last-9-calls-to-function">Tensorflow
      2: Getting &quot;WARNING:tensorflow:9 out of the last 9 calls to 
      triggered tf.function retracing. Tracing is expensive&quot;</a></li>

      <li><a
      href="https://stackoverflow.com/questions/65563185/loading-multiple-saved-tensorflow-keras-models-for-prediction">Loading
      multiple saved tensorflow/keras models for prediction</a></li>

      </ol>

      <p>But neither of the two questions answers my question about how to avoid
      this warning. Plus I have also checked the links in the warning message
      but I couldn't solve my problem.</p>

      <h3>What I want</h3>

      <p>I simply want to avoid this warning. While I'm still getting the
      predictions from the models I noticed that the python program takes way
      too much time on doing predictions for a list of images.</p>

      <h3>What I'm using</h3>

      <ul>

      <li>Python 3.6.13</li>

      <li>Tensorflow 2.3.0</li>

      </ul>

      <h3>Solution</h3>

      <p>After some tries to suppress the warning from the <code>predict</code>
      method, I have checked the documentation of Tensorflow and in one of the
      first tutorials on how to use Tensorflow it is explained that, by default,
      Tensorflow is executed in eager mode, which is useful for testing and
      debugging the network models. Since I have already tested my models many
      times, it was only required to disable the eager mode by writing this
      single python line of code:</p>

      <p><code>tf.compat.v1.disable_eager_execution()</code></p>

      <p>Now the warning doesn't show up anymore.</p>
  - text: >
      <p>Where one can find the github source code for
      <code>tf.quantization.fake_quant_with_min_max_args</code>. Checking the <a
      href="https://www.tensorflow.org/api_docs/python/tf/quantization/fake_quant_with_min_max_args"
      rel="nofollow noreferrer">TF API documentation</a>, there is no link to
      the github source file, and I could not find one on github.</p>
  - text: >
      <p>I'm trying to use <code>tf.Dataset</code> for a 3D image CNN where the
      shape of the 3D image fed into it from the training set and the validation
      set are different (training: (64, 64, 64), validation: (176, 176, 160)). I
      didn't even know this was possible, but I'm recreating this network based
      on a paper, and using the classic <code>feed_dict</code> method the
      network indeed works. For performance reasons (and just to learn) I'm
      trying to switch the network to use <code>tf.Dataset</code> instead.</p>


      <p>I have two datasets and iterators built like the following:</p>


      <pre class="lang-py prettyprint-override"><code>def _data_parser(dataset,
      shape):
              features = {"input": tf.FixedLenFeature((), tf.string),
                          "label": tf.FixedLenFeature((), tf.string)}
              parsed_features = tf.parse_single_example(dataset, features)

              image = tf.decode_raw(parsed_features["input"], tf.float32)
              image = tf.reshape(image, shape + (1,))

              label = tf.decode_raw(parsed_features["label"], tf.float32)
              label = tf.reshape(label, shape + (1,))
              return image, label

      train_datasets = ["train.tfrecord"]

      train_dataset = tf.data.TFRecordDataset(train_datasets)

      train_dataset = train_dataset.map(lambda x: _data_parser(x, (64, 64, 64)))

      train_dataset = train_dataset.batch(batch_size) # batch_size = 16

      train_iterator = train_dataset.make_initializable_iterator()


      val_datasets = ["validation.tfrecord"]

      val_dataset = tf.data.TFRecordDataset(val_datasets)

      val_dataset = val_dataset.map(lambda x: _data_parser(x, (176, 176, 160)))

      val_dataset = val_dataset.batch(1)

      val_iterator = val_dataset.make_initializable_iterator()

      </code></pre>


      <p><a
      href="https://www.tensorflow.org/guide/datasets#creating_an_iterator"
      rel="nofollow noreferrer">TensorFlow documentation</a> has examples
      regarding switching between datasets using
      <code>reinitializable_iterator</code> or <code>feedable_iterator</code>,
      but they all switch between iterators of <strong>same</strong> output
      shape, which is not the case here.</p>


      <p>How should I switch between training set and validation set using
      <code>tf.Dataset</code> and <code>tf.data.Iterator</code> in my case
      then?</p>
pipeline_tag: text-classification
inference: true
base_model: sentence-transformers/all-MiniLM-L6-v2
model-index:
  - name: SetFit with sentence-transformers/all-MiniLM-L6-v2
    results:
      - task:
          type: text-classification
          name: Text Classification
        dataset:
          name: Unknown
          type: unknown
          split: test
        metrics:
          - type: accuracy
            value: 0.71
            name: Accuracy
          - type: precision
            value: 0.7100840336134453
            name: Precision
          - type: recall
            value: 0.71
            name: Recall
          - type: f1
            value: 0.70997099709971
            name: F1

SetFit with sentence-transformers/all-MiniLM-L6-v2

This is a SetFit model that can be used for Text Classification. This SetFit model uses sentence-transformers/all-MiniLM-L6-v2 as the Sentence Transformer embedding model. A SetFitHead instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Sources

Model Labels

Label Examples
0
  • '

    The Problem

    \n\n

    I am converting my Tensorflow 1.14 estimator to TensorFlow 2.1. My current workflow involves training my tensorflow model on gcloud's ai-platform (training on gcloud) and using their model service to deploy my model for online predictions (model service).

    \n\n

    The issue when upgrading to TensorFlow 2 is that they have done away with placeholders, which is affecting my serving_input_fn and how I export my estimator model. With tensorflow 2, if I export a model without the use of placeholders, my model's "predict" SignatureDef only has a single "examples" tensor whereas previously it had many inputs named appropriately through my serving_input_fn.

    \n\n

    The previous set up for my estimator was as follows:

    \n\n
    def serving_input_fn():\n\n    inputs = {\n        'feature1': tf.compat.v1.placeholder(shape=None, dtype=tf.string),\n        'feature2': tf.compat.v1.placeholder(shape=None, dtype=tf.string),\n        'feature3': tf.compat.v1.placeholder(shape=None, dtype=tf.string),\n        ...\n    }\n\n    return tf.estimator.export.ServingInputReceiver(features=split_features, receiver_tensors=inputs)\n\nexporter = tf.estimator.LatestExporter('exporter', serving_input_fn)\n\neval_spec = tf.estimator.EvalSpec(\n    input_fn=lambda: input_eval_fn(args.test_dir),\n    exporters=[exporter],\n    start_delay_secs=10,\n    throttle_secs=0)\n\n...\n\ntf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)\n
    \n\n

    And this has worked fine in the past, it has allowed me to have a multi-input "predict" SignatureDef where I can send a json of the inputs to ai-platforms model service and get predictions back. But since I am trying to not rely on the tf.compat.v1 library, I want to avoid using placeholders.

    \n\n

    What I've tried

    \n\n

    Following the documentation linked here I've replaced my serving_input_fn with the tf.estimator.export.build_parsing_serving_input_receiver_fn method:

    \n\n
    feature_columns = ... # list of feature columns \nserving_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(\n  tf.feature_column.make_parse_example_spec(feature_columns))\n
    \n\n

    However, this gives me the following "predict" SignatureDef:

    \n\n
    signature_def['predict']:\n  The given SavedModel SignatureDef contains the following input(s):\n    inputs['examples'] tensor_info:\n        dtype: DT_STRING\n        shape: (-1)\n        name: input_example_tensor:0\n
    \n\n

    whereas before my "predict" SignatureDef was as follows:

    \n\n
    signature_def['predict']:\n  The given SavedModel SignatureDef contains the following input(s):\n    inputs['feature1'] tensor_info:\n        dtype: DT_STRING\n        shape: unknown_rank\n        name: Placeholder:0\n    inputs['feature2'] tensor_info:\n        dtype: DT_STRING\n        shape: unknown_rank\n        name: Placeholder_1:0\n    inputs['feature3'] tensor_info:\n        dtype: DT_STRING\n        shape: unknown_rank\n        name: Placeholder_2:0\n
    \n\n

    I've also tried using the tf.estimator.export.build_raw_serving_input_receiver_fn, but my understanding is that this method requires actual Tensors in order to be used instead of a feature spec. Unless I use placeholders, I don't really understand where to grab these serving Tensors from.

    \n\n

    So my main questions are:

    \n\n
      \n
    • Is it possible to create a multi-input "predict" signature def from an estimator model without using placeholders in Tensorflow 2?
    • \n
    • If it is not possible, how am I supposed to provide the instances to gcloud predictions service for the "examples" tensor in the "predict" signature def?
    • \n
    \n\n

    Thanks!

    \n'
  • '

    For the computation of Intersection over Union (IoU) I want to find coordinates of minimum and maximum values (the border pixels) in a segmentation image image_pred that is represented by a float32 3D tensor. In particular, I aim at finding top left and bottom right corner coordinates of objects in an image. The image is entirely comprised of black pixels (value 0.0) except where the object is located, I have color pixels (0.0 < values < 1.0). Here's an example for such a bounding box (in my case, the object is the traffic sign and the environment is blacked out):

    \n\n

    enter image description here

    \n\n

    My approach so far is to tf.boolean_mask for setting every pixel to False except for the color pixels:

    \n\n
    zeros = tf.zeros_like(image_pred)\nmask = tf.greater(image_pred, zeros)\nboolean_mask_pred = tf.boolean_mask(image_pred, mask)\n
    \n\n

    and then use tf.where to find the coordinates of the masked image. To determine the horizontal and vertical coordinate values of the top left and bottom right corners of the rectangle, I thought about using tf.recude_max and tf.reduce_min, but since these do not return a single value if I provide an axis, I am unsure if this is the correct function to use. According to the docs, if I do not specify axis, the function will reduce all dimensions which is not what I want either. Which is the correct function to do this? The IoU in the end is a single 1D float value.

    \n\n
    coordinates_pred = tf.where(boolean_mask_pred)\nx21 = tf.reduce_min(coordinates_pred, axis=1)\ny21 = tf.reduce_min(coordinates_pred, axis=0)\nx22 = tf.reduce_max(coordinates_pred, axis=1)\ny22 = tf.reduce_max(coordinates_pred, axis=0)\n
    \n'
  • '

    Computing mean, total, etc. of each feature in a dataset seems quite trivial in Pandas and Numpy, but I couldn't find any similarly easy functions/operations for tf.data.Dataset. Actually I found tf.data.Dataset.reduce which allows me to compute running sum, but it's not that easy for other operation (min, max, std, etc.)\n
    \n
    So, my question is, is there a simple way to compute statistics for tf.data.Dataset? Moreover, is there a way to standardize/normalize (an entire, i.e. not in batch) tf.data.Dataset, especially if not using tf.data.Dataset.reduce?

    \n'
1
  • '

    TensorFlow documentation have the following example that can illustrate how to create a batch generator to feed a training set in batches to a model when the training set is too large to fit in memory:

    \n
    from skimage.io import imread\nfrom skimage.transform import resize\nimport tensorflow as tf\nimport numpy as np\nimport math\n\n# Here, x_set is list of path to the images\n# and y_set are the associated classes.\n\nclass CIFAR10Sequence(tf.keras.utils.Sequence):\n\n    def init(self, x_set, y_set, batch_size):\n        self.x, self.y = x_set, y_set\n        self.batch_size = batch_size\n\n    def len(self):\n        return math.ceil(len(self.x) / self.batch_size)\n\n    def getitem(self, idx):\n        batch_x = self.x[idx * self.batch_size:(idx + 1) *\n        self.batch_size]\n        batch_y = self.y[idx * self.batch_size:(idx + 1) *\n        self.batch_size]\n\n        return np.array([\n            resize(imread(file_name), (200, 200))\n               for file_name in batch_x]), np.array(batch_y)\n
    \n

    My intention is to further increase the diversity of the training set by rotating each image 3x by 90º. In each Epoch of the training process, the model would first be fed with the "0º training set" and next with the 90º, 180º and 270º rotating sets, respectively.

    \n

    How can I modify the previous piece of code to perform this operation inside the CIFAR10Sequence() data generator?

    \n

    Please don't use tf.keras.preprocessing.image.ImageDataGenerator() so that the answer does not lose its generality for another type of similar problems that are of a different nature.

    \n

    NB: The idea would be to create the new data "in real time" as the model is fed instead of creating (in advance) and storing on disk a new and augmented training set bigger than the original one to be used later (also in batches) during the training process of the model.

    \n

    Thx in advance

    \n'
  • "

    I am trying to import a pretrained model from Huggingface's transformers library and extend it with a few layers for classification using tensorflow keras. When I directly use transformers model (Method 1), the model trains well and reaches a validation accuracy of 0.93 after 1 epoch. However, when trying to use the model as a layer within a tf.keras model (Method 2), the model can't get above 0.32 accuracy. As far as I can tell based on the documentation, the two approaches should be equivalent. My goal is to get Method 2 working so that I can add more layers to it instead of directly using the logits produced by Huggingface's classifier head but I'm stuck at this stage.

    \n
    import tensorflow as tf\n\nfrom transformers import TFRobertaForSequenceClassification\n
    \n

    Method 1:

    \n
    model = TFRobertaForSequenceClassification.from_pretrained("roberta-base", num_labels=6)\n
    \n

    Method 2:

    \n
    input_ids = tf.keras.Input(shape=(128,), dtype='int32')\n\nattention_mask = tf.keras.Input(shape=(128, ), dtype='int32')\n\ntransformer = TFRobertaForSequenceClassification.from_pretrained("roberta-base", num_labels=6)\n\nencoded = transformer([input_ids, attention_mask])\n\nlogits = encoded[0]\n\nmodel = tf.keras.models.Model(inputs = [input_ids, attention_mask], outputs = logits)\n\n
    \n

    Rest of the code for either method is identical,

    \n
    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0),\nloss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), \n              metrics=[tf.keras.metrics.SparseCategoricalAccuracy('accuracy')])\n
    \n

    I am using Tensorflow 2.3.0 and have tried with transformers versions 3.5.0 and 4.0.0.

    \n"
  • '

    In the official tf.custom_gradient documentation it shows how to define custom gradients for log(1 + exp(x))

    \n
    @tf.custom_gradient\ndef log1pexp(x):\n  e = tf.exp(x)\n  def grad(dy):\n    return dy * (1 - 1 / (1 + e))\n  return tf.math.log(1 + e), grad\n
    \n

    When y = log(1 + exp(x)), analytically the derivative comes out to be dy/dx = (1 - 1 / (1 + exp(x))).

    \n

    However in the code def grad says its dy * (1 - 1 / (1 + exp(x))).\ndy/dx = dy * (1 - 1 / (1 + exp(x))) is not a valid equation. While dx = dy * (1 - 1 / (1 + exp(x))) is wrong as it should be the reciprocal.

    \n

    What does the grad function equate to?

    \n'

Evaluation

Metrics

Label Accuracy Precision Recall F1
all 0.71 0.7101 0.71 0.7100

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("sharukat/sbert-questionclassifier")
# Run inference
preds = model("<p>Where one can find the github source code for <code>tf.quantization.fake_quant_with_min_max_args</code>. Checking the <a href=\"https://www.tensorflow.org/api_docs/python/tf/quantization/fake_quant_with_min_max_args\" rel=\"nofollow noreferrer\">TF API documentation</a>, there is no link to the github source file, and I could not find one on github.</p>
")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 15 336.203 3755
Label Training Sample Count
0 500
1 500

Training Hyperparameters

  • batch_size: (16, 16)
  • num_epochs: (2, 2)
  • max_steps: -1
  • sampling_strategy: oversampling
  • num_iterations: 20
  • body_learning_rate: (2e-05, 1e-05)
  • head_learning_rate: 0.01
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: False
  • use_amp: False
  • warmup_proportion: 0.1
  • max_length: 256
  • seed: 42
  • eval_max_steps: -1
  • load_best_model_at_end: False

Training Results

Epoch Step Training Loss Validation Loss
0.0004 1 0.26 -
0.02 50 0.2486 -
0.04 100 0.2383 -
0.06 150 0.309 -
0.08 200 0.2551 -
0.1 250 0.2675 -
0.12 300 0.2344 -
0.14 350 0.2686 -
0.16 400 0.2447 -
0.18 450 0.2317 -
0.2 500 0.2233 -
0.22 550 0.1999 -
0.24 600 0.2443 -
0.26 650 0.1667 -
0.28 700 0.2975 -
0.3 750 0.0902 -
0.32 800 0.1965 -
0.34 850 0.1571 -
0.36 900 0.1247 -
0.38 950 0.0494 -
0.4 1000 0.1222 -
0.42 1050 0.0828 -
0.44 1100 0.0393 -
0.46 1150 0.0104 -
0.48 1200 0.0143 -
0.5 1250 0.0505 -
0.52 1300 0.0053 -
0.54 1350 0.0337 -
0.56 1400 0.0013 -
0.58 1450 0.0061 -
0.6 1500 0.0519 -
0.62 1550 0.0068 -
0.64 1600 0.001 -
0.66 1650 0.0004 -
0.68 1700 0.0008 -
0.7 1750 0.0018 -
0.72 1800 0.0018 -
0.74 1850 0.0022 -
0.76 1900 0.0005 -
0.78 1950 0.0008 -
0.8 2000 0.0005 -
0.82 2050 0.0003 -
0.84 2100 0.0004 -
0.86 2150 0.0002 -
0.88 2200 0.0003 -
0.9 2250 0.0001 -
0.92 2300 0.0001 -
0.94 2350 0.0002 -
0.96 2400 0.0005 -
0.98 2450 0.0002 -
1.0 2500 0.0002 -
1.02 2550 0.0001 -
1.04 2600 0.0001 -
1.06 2650 0.0003 -
1.08 2700 0.0002 -
1.1 2750 0.0002 -
1.12 2800 0.0001 -
1.1400 2850 0.0001 -
1.16 2900 0.0002 -
1.18 2950 0.0594 -
1.2 3000 0.0002 -
1.22 3050 0.0002 -
1.24 3100 0.0001 -
1.26 3150 0.0262 -
1.28 3200 0.0001 -
1.3 3250 0.0001 -
1.32 3300 0.0001 -
1.34 3350 0.0001 -
1.3600 3400 0.0001 -
1.38 3450 0.0002 -
1.4 3500 0.0 -
1.42 3550 0.0001 -
1.44 3600 0.0001 -
1.46 3650 0.0001 -
1.48 3700 0.0001 -
1.5 3750 0.0001 -
1.52 3800 0.0001 -
1.54 3850 0.0001 -
1.56 3900 0.0001 -
1.58 3950 0.0001 -
1.6 4000 0.0001 -
1.62 4050 0.0002 -
1.6400 4100 0.0044 -
1.6600 4150 0.0001 -
1.6800 4200 0.0002 -
1.7 4250 0.0001 -
1.72 4300 0.0001 -
1.74 4350 0.0001 -
1.76 4400 0.0001 -
1.78 4450 0.0 -
1.8 4500 0.0001 -
1.8200 4550 0.0001 -
1.8400 4600 0.0 -
1.8600 4650 0.061 -
1.88 4700 0.0002 -
1.9 4750 0.0001 -
1.92 4800 0.0001 -
1.94 4850 0.0001 -
1.96 4900 0.0001 -
1.98 4950 0.0001 -
2.0 5000 0.0001 -

Framework Versions

  • Python: 3.10.12
  • SetFit: 1.0.3
  • Sentence Transformers: 2.4.0
  • Transformers: 4.37.2
  • PyTorch: 2.1.0+cu121
  • Datasets: 2.17.1
  • Tokenizers: 0.15.2

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}