sharukat commited on
Commit
f7c265f
·
verified ·
1 Parent(s): f46a59d

Add SetFit model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,481 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: setfit
3
+ tags:
4
+ - setfit
5
+ - sentence-transformers
6
+ - text-classification
7
+ - generated_from_setfit_trainer
8
+ metrics:
9
+ - accuracy
10
+ - precision
11
+ - recall
12
+ - f1
13
+ widget:
14
+ - text: "<p>I'm having a problem serving my text classification model on <code>Tensorflow\
15
+ \ 1.12</code>. I'm using <code>tf.estimator.inputs.pandas_input_fn</code> to read\
16
+ \ in my data, and <code>tf.estimator.DNNClassifier</code> to train/evaluate. I'd\
17
+ \ then like to serve my model.\n(Apologies in advance, it's tough to provide a\
18
+ \ full working example here, but it's very much like the example TF provides at\
19
+ \ <a href=\"https://www.tensorflow.org/api_docs/python/tf/estimator/DNNClassifier\"\
20
+ \ rel=\"nofollow noreferrer\">https://www.tensorflow.org/api_docs/python/tf/estimator/DNNClassifier</a>\
21
+ \ )</p>\n\n<p>I'm currently saving my model with ...</p>\n\n<pre class=\"lang-py\
22
+ \ prettyprint-override\"><code>...\nestimator.export_savedmodel(\"./TEST_SERVING/\"\
23
+ , self.serving_input_receiver_fn, strip_default_attrs=True)\n...\ndef serving_input_receiver_fn(self):\n\
24
+ \ \"\"\"An input receiver that expects a serialized tf.Example.\"\"\"\n\n\
25
+ \ # feature spec dictionary determines our input parameters for the model\n\
26
+ \ feature_spec = {\n 'Headline': tf.VarLenFeature(dtype=tf.string),\n\
27
+ \ 'Description': tf.VarLenFeature(dtype=tf.string)\n }\n\n \
28
+ \ # the inputs will be initially fed as strings with data serialized by\n \
29
+ \ # Google ProtoBuffers\n serialized_tf_example = tf.placeholder(\n \
30
+ \ dtype=tf.string, shape=None, name='input_example_tensor')\n receiver_tensors\
31
+ \ = {'examples': serialized_tf_example}\n\n # deserialize input\n features\
32
+ \ = tf.parse_example(serialized_tf_example, feature_spec)\n return tf.estimator.export.ServingInputReceiver(features,\
33
+ \ receiver_tensors)\n\n\n</code></pre>\n\n<p>This actually fails to run with the\
34
+ \ error:</p>\n\n<pre class=\"lang-sh prettyprint-override\"><code>TypeError: Failed\
35
+ \ to convert object of type &lt;class 'tensorflow.python.framework.sparse_tensor.SparseTensor'&gt;\
36
+ \ to Tensor. Contents: SparseTensor(indices=Tensor(\"ParseExample/ParseExample:0\"\
37
+ , shape=(?, 2), \ndtype=int64), values=Tensor(\"ParseExample/ParseExample:2\"\
38
+ , shape=(?,), dtype=string), dense_shape=Tensor(\"ParseExample/ParseExample:4\"\
39
+ , shape=(2,), dtype=int64)). Consider casting elements to a supported type.\n\n\
40
+ </code></pre>\n\n<p>I tried to save a second way doing:</p>\n\n<pre class=\"lang-py\
41
+ \ prettyprint-override\"><code>def serving_input_receiver_fn(self):\n \"\"\"\
42
+ Build the serving inputs.\"\"\"\n INPUT_COLUMNS = [\"Headline\",\"Description\"\
43
+ ]\n inputs = {}\n for feat in INPUT_COLUMNS:\n inputs[feat] = tf.placeholder(shape=[None],\
44
+ \ dtype=tf.string, name=feat)\n return tf.estimator.export.ServingInputReceiver(inputs,\
45
+ \ inputs)\n</code></pre>\n\n<p>This actually works, until I try testing it with\
46
+ \ the <code>saved_model_cli</code>.\nSome output for <code>saved_model_cli show\
47
+ \ --all --dir TEST_SERVING/1553879255/</code>:</p>\n\n<pre class=\"lang-sh prettyprint-override\"\
48
+ ><code>MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:\n\
49
+ \nsignature_def['predict']:\n The given SavedModel SignatureDef contains the\
50
+ \ following input(s):\n inputs['Description'] tensor_info:\n dtype:\
51
+ \ DT_STRING\n shape: (-1)\n name: Description:0\n inputs['Headline']\
52
+ \ tensor_info:\n dtype: DT_STRING\n shape: (-1)\n name: Headline:0\n\
53
+ \ The given SavedModel SignatureDef contains the following output(s):\n outputs['class_ids']\
54
+ \ tensor_info:\n dtype: DT_INT64\n shape: (-1, 1)\n name:\
55
+ \ dnn/head/predictions/ExpandDims:0\n outputs['classes'] tensor_info:\n \
56
+ \ dtype: DT_STRING\n shape: (-1, 1)\n name: dnn/head/predictions/str_classes:0\n\
57
+ \ outputs['logits'] tensor_info:\n dtype: DT_FLOAT\n shape: (-1,\
58
+ \ 3)\n name: dnn/logits/BiasAdd:0\n outputs['probabilities'] tensor_info:\n\
59
+ \ dtype: DT_FLOAT\n shape: (-1, 3)\n name: dnn/head/predictions/probabilities:0\n\
60
+ \ Method name is: tensorflow/serving/predict\n\n</code></pre>\n\n<p>But now I\
61
+ \ can't seem to test it.</p>\n\n<pre class=\"lang-sh prettyprint-override\"><code>&gt;&gt;&gt;\
62
+ \ saved_model_cli run --dir TEST_SERVING/1553879255/ --tag_set serve --signature_def\
63
+ \ predict --input_examples 'inputs=[{\"Description\":[\"What is going on\"],\"\
64
+ Headline\":[\"Help me\"]}]'\nTraceback (most recent call last):\n ...\n File\
65
+ \ \"/Users/Josh/miniconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/tools/saved_model_cli.py\"\
66
+ , line 489, in _create_example_string\n feature_list)\nTypeError: 'What is\
67
+ \ going on' has type str, but expected one of: bytes\n\n</code></pre>\n\n<p>Ok,\
68
+ \ lets turn it into a bytes object by changing to <code>b[\"What is going on\"\
69
+ ]</code> and <code>b[\"Help me\"]</code>...</p>\n\n<pre class=\"lang-sh prettyprint-override\"\
70
+ ><code>ValueError: Type &lt;class 'bytes'&gt; for value b'What is going on' is\
71
+ \ not supported for tf.train.Feature.\n</code></pre>\n\n<p>Any ideas/thoughts??\n\
72
+ Thanks!</p>\n"
73
+ - text: '<p>In tensorflow <code>tf.keras.Model.compile</code>, you can pass a <code>lambda
74
+ y_true, y_pred: val</code> function as a metric (though, it seems not documented),
75
+ but I asked my self : "How does it aggregate it over the batches" ?</p>
76
+
77
+
78
+ <p>I searched the documentation, but I''ve found nowhere how it is done ?</p>
79
+
80
+
81
+ <p>By the way, I don''t even know if it is an undefined behavior to do so and
82
+ one should instead subclass the Metric class ? ( or at least provide the required
83
+ methods).</p>
84
+
85
+
86
+ <p>Also, is it pertinent to pass a loss as a metric (and in this case, same question
87
+ : how is it aggregated over the batches ? )</p>
88
+
89
+ '
90
+ - text: "<p>I'm working on a project where I have trained a series of binary classifiers\
91
+ \ with <strong>Keras</strong>, with <strong>Tensorflow</strong> as the backend\
92
+ \ engine. The input data I have is a series of images, where each binary classifier\
93
+ \ must make the prediction on the images, later I save the predictions on a CSV\
94
+ \ file.</p>\n<p>The problem I have is when I get the predictions from the first\
95
+ \ series of binary classifiers there isn't any warning, but when the 5th or 6th\
96
+ \ binary classifier calls the method <strong>predict</strong> on the input data\
97
+ \ I get the following warning:</p>\n<blockquote>\n<p>WARNING:tensorflow:5 out\
98
+ \ of the last 5 calls to &lt;function\nModel.make_predict_function..predict_function\
99
+ \ at\n0x2b280ff5c158&gt; triggered tf.function retracing. Tracing is expensive\n\
100
+ and the excessive number of tracings could be due to (1) creating\n@tf.function\
101
+ \ repeatedly in a loop, (2) passing tensors with different\nshapes, (3) passing\
102
+ \ Python objects instead of tensors. For (1), please\ndefine your @tf.function\
103
+ \ outside of the loop. For (2), @tf.function\nhas experimental_relax_shapes=True\
104
+ \ option that relaxes argument shapes\nthat can avoid unnecessary retracing. For\
105
+ \ (3), please refer to\n<a href=\"https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args\"\
106
+ \ rel=\"noreferrer\">https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args</a>\n\
107
+ and <a href=\"https://www.tensorflow.org/api_docs/python/tf/function\" rel=\"\
108
+ noreferrer\">https://www.tensorflow.org/api_docs/python/tf/function</a> for more\n\
109
+ details.</p>\n</blockquote>\n<p>To answer each point in the parenthesis, here\
110
+ \ are my answers:</p>\n<ol>\n<li>The <strong>predict</strong> method is called\
111
+ \ inside a for loop.</li>\n<li>I don't pass tensors but a list of <strong>NumPy\
112
+ \ arrays</strong> of gray scale images, all of them with the same size in width\
113
+ \ and height. The only thing that can change is the batch size because the list\
114
+ \ can have only 1 image or more than one.</li>\n<li>As I wrote in point 2, I pass\
115
+ \ a list of NumPy arrays.</li>\n</ol>\n<p>I have debugged my program and found\
116
+ \ that this warning always happens when the method predict is called. To summarize\
117
+ \ the code I have written is the following:</p>\n<pre><code>import cv2 as cv\n\
118
+ import tensorflow as tf\nfrom tensorflow.keras.models import load_model\n# Load\
119
+ \ the models\nbinary_classifiers = [load_model(path) for path in path2models]\n\
120
+ # Get the images\nimages = [#Load the images with OpenCV]\n# Apply the resizing\
121
+ \ and reshapes on the images.\nmy_list = list()\nfor image in images:\n image_reworked\
122
+ \ = # Apply the resizing and reshaping on images\n my_list.append(image_reworked)\n\
123
+ \n# Get the prediction from each model\n# This is where I get the warning\npredictions\
124
+ \ = [model.predict(x=my_list,verbose=0) for model in binary_classifiers]\n</code></pre>\n\
125
+ <h3>What I have tried</h3>\n<p>I have defined a function as tf.function and putted\
126
+ \ the code of the predictions inside the tf.function like this</p>\n<pre><code>@tf.function\n\
127
+ def testing(models, faces):\n return [model.predict(x=faces,verbose=0) for\
128
+ \ model in models]\n</code></pre>\n<p>But I ended up getting the following error:</p>\n\
129
+ <blockquote>\n<p>RuntimeError: Detected a call to <code>Model.predict</code> inside\
130
+ \ a\n<code>tf.function</code>. Model.predict is a high-level endpoint that manages\n\
131
+ its own <code>tf.function</code>. Please move the call to <code>Model.predict</code>\
132
+ \ outside\nof all enclosing <code>tf.function</code>s. Note that you can call\
133
+ \ a <code>Model</code>\ndirectly on Tensors inside a <code>tf.function</code>\
134
+ \ like: <code>model(x)</code>.</p>\n</blockquote>\n<p>So calling the method <code>predict</code>\
135
+ \ is basically already a tf.function. So it's useless to define a tf.function\
136
+ \ when the warning I get it's from that method.</p>\n<p>I have also checked those\
137
+ \ other two questions:</p>\n<ol>\n<li><a href=\"https://stackoverflow.com/questions/61647404/tensorflow-2-getting-warningtensorflow9-out-of-the-last-9-calls-to-function\"\
138
+ >Tensorflow 2: Getting &quot;WARNING:tensorflow:9 out of the last 9 calls to \
139
+ \ triggered tf.function retracing. Tracing is expensive&quot;</a></li>\n<li><a\
140
+ \ href=\"https://stackoverflow.com/questions/65563185/loading-multiple-saved-tensorflow-keras-models-for-prediction\"\
141
+ >Loading multiple saved tensorflow/keras models for prediction</a></li>\n</ol>\n\
142
+ <p>But neither of the two questions answers my question about how to avoid this\
143
+ \ warning. Plus I have also checked the links in the warning message but I couldn't\
144
+ \ solve my problem.</p>\n<h3>What I want</h3>\n<p>I simply want to avoid this\
145
+ \ warning. While I'm still getting the predictions from the models I noticed that\
146
+ \ the python program takes way too much time on doing predictions for a list of\
147
+ \ images.</p>\n<h3>What I'm using</h3>\n<ul>\n<li>Python 3.6.13</li>\n<li>Tensorflow\
148
+ \ 2.3.0</li>\n</ul>\n<h3>Solution</h3>\n<p>After some tries to suppress the warning\
149
+ \ from the <code>predict</code> method, I have checked the documentation of Tensorflow\
150
+ \ and in one of the first tutorials on how to use Tensorflow it is explained that,\
151
+ \ by default, Tensorflow is executed in eager mode, which is useful for testing\
152
+ \ and debugging the network models. Since I have already tested my models many\
153
+ \ times, it was only required to disable the eager mode by writing this single\
154
+ \ python line of code:</p>\n<p><code>tf.compat.v1.disable_eager_execution()</code></p>\n\
155
+ <p>Now the warning doesn't show up anymore.</p>\n"
156
+ - text: '<p>Where one can find the github source code for <code>tf.quantization.fake_quant_with_min_max_args</code>.
157
+ Checking the <a href="https://www.tensorflow.org/api_docs/python/tf/quantization/fake_quant_with_min_max_args"
158
+ rel="nofollow noreferrer">TF API documentation</a>, there is no link to the github
159
+ source file, and I could not find one on github.</p>
160
+
161
+ '
162
+ - text: "<p>I'm trying to use <code>tf.Dataset</code> for a 3D image CNN where the\
163
+ \ shape of the 3D image fed into it from the training set and the validation set\
164
+ \ are different (training: (64, 64, 64), validation: (176, 176, 160)). I didn't\
165
+ \ even know this was possible, but I'm recreating this network based on a paper,\
166
+ \ and using the classic <code>feed_dict</code> method the network indeed works.\
167
+ \ For performance reasons (and just to learn) I'm trying to switch the network\
168
+ \ to use <code>tf.Dataset</code> instead.</p>\n\n<p>I have two datasets and iterators\
169
+ \ built like the following:</p>\n\n<pre class=\"lang-py prettyprint-override\"\
170
+ ><code>def _data_parser(dataset, shape):\n features = {\"input\": tf.FixedLenFeature((),\
171
+ \ tf.string),\n \"label\": tf.FixedLenFeature((), tf.string)}\n\
172
+ \ parsed_features = tf.parse_single_example(dataset, features)\n\n \
173
+ \ image = tf.decode_raw(parsed_features[\"input\"], tf.float32)\n image\
174
+ \ = tf.reshape(image, shape + (1,))\n\n label = tf.decode_raw(parsed_features[\"\
175
+ label\"], tf.float32)\n label = tf.reshape(label, shape + (1,))\n \
176
+ \ return image, label\n\ntrain_datasets = [\"train.tfrecord\"]\ntrain_dataset\
177
+ \ = tf.data.TFRecordDataset(train_datasets)\ntrain_dataset = train_dataset.map(lambda\
178
+ \ x: _data_parser(x, (64, 64, 64)))\ntrain_dataset = train_dataset.batch(batch_size)\
179
+ \ # batch_size = 16\ntrain_iterator = train_dataset.make_initializable_iterator()\n\
180
+ \nval_datasets = [\"validation.tfrecord\"]\nval_dataset = tf.data.TFRecordDataset(val_datasets)\n\
181
+ val_dataset = val_dataset.map(lambda x: _data_parser(x, (176, 176, 160)))\nval_dataset\
182
+ \ = val_dataset.batch(1)\nval_iterator = val_dataset.make_initializable_iterator()\n\
183
+ </code></pre>\n\n<p><a href=\"https://www.tensorflow.org/guide/datasets#creating_an_iterator\"\
184
+ \ rel=\"nofollow noreferrer\">TensorFlow documentation</a> has examples regarding\
185
+ \ switching between datasets using <code>reinitializable_iterator</code> or <code>feedable_iterator</code>,\
186
+ \ but they all switch between iterators of <strong>same</strong> output shape,\
187
+ \ which is not the case here.</p>\n\n<p>How should I switch between training set\
188
+ \ and validation set using <code>tf.Dataset</code> and <code>tf.data.Iterator</code>\
189
+ \ in my case then?</p>\n"
190
+ pipeline_tag: text-classification
191
+ inference: true
192
+ base_model: sentence-transformers/all-MiniLM-L6-v2
193
+ model-index:
194
+ - name: SetFit with sentence-transformers/all-MiniLM-L6-v2
195
+ results:
196
+ - task:
197
+ type: text-classification
198
+ name: Text Classification
199
+ dataset:
200
+ name: Unknown
201
+ type: unknown
202
+ split: test
203
+ metrics:
204
+ - type: accuracy
205
+ value: 0.71
206
+ name: Accuracy
207
+ - type: precision
208
+ value: 0.7100840336134453
209
+ name: Precision
210
+ - type: recall
211
+ value: 0.71
212
+ name: Recall
213
+ - type: f1
214
+ value: 0.70997099709971
215
+ name: F1
216
+ ---
217
+
218
+ # SetFit with sentence-transformers/all-MiniLM-L6-v2
219
+
220
+ This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) as the Sentence Transformer embedding model. A [SetFitHead](huggingface.co/docs/setfit/reference/main#setfit.SetFitHead) instance is used for classification.
221
+
222
+ The model has been trained using an efficient few-shot learning technique that involves:
223
+
224
+ 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
225
+ 2. Training a classification head with features from the fine-tuned Sentence Transformer.
226
+
227
+ ## Model Details
228
+
229
+ ### Model Description
230
+ - **Model Type:** SetFit
231
+ - **Sentence Transformer body:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
232
+ - **Classification head:** a [SetFitHead](huggingface.co/docs/setfit/reference/main#setfit.SetFitHead) instance
233
+ - **Maximum Sequence Length:** 256 tokens
234
+ - **Number of Classes:** 2 classes
235
+ <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
236
+ <!-- - **Language:** Unknown -->
237
+ <!-- - **License:** Unknown -->
238
+
239
+ ### Model Sources
240
+
241
+ - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
242
+ - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
243
+ - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
244
+
245
+ ### Model Labels
246
+ | Label | Examples |
247
+ |:------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
248
+ | 0 | <ul><li>'<p><strong>The Problem</strong></p>\n\n<p>I am converting my Tensorflow 1.14 estimator to TensorFlow 2.1. My current workflow involves training my tensorflow model on gcloud\'s ai-platform <a href="https://cloud.google.com/ai-platform/training/docs/training-jobs" rel="nofollow noreferrer">(training on gcloud)</a> and using their model service to deploy my model for online predictions <a href="https://cloud.google.com/ai-platform/prediction/docs/deploying-models" rel="nofollow noreferrer">(model service)</a>. </p>\n\n<p>The issue when upgrading to TensorFlow 2 is that they have done away with placeholders, which is affecting my <code>serving_input_fn</code> and how I export my estimator model. With tensorflow 2, if I export a model without the use of placeholders, my model\'s "predict" <code>SignatureDef</code> only has a single "examples" tensor whereas previously it had many inputs named appropriately through my <code>serving_input_fn</code>. </p>\n\n<p>The previous set up for my estimator was as follows: </p>\n\n<pre><code>def serving_input_fn():\n\n inputs = {\n \'feature1\': tf.compat.v1.placeholder(shape=None, dtype=tf.string),\n \'feature2\': tf.compat.v1.placeholder(shape=None, dtype=tf.string),\n \'feature3\': tf.compat.v1.placeholder(shape=None, dtype=tf.string),\n ...\n }\n\n return tf.estimator.export.ServingInputReceiver(features=split_features, receiver_tensors=inputs)\n\nexporter = tf.estimator.LatestExporter(\'exporter\', serving_input_fn)\n\neval_spec = tf.estimator.EvalSpec(\n input_fn=lambda: input_eval_fn(args.test_dir),\n exporters=[exporter],\n start_delay_secs=10,\n throttle_secs=0)\n\n...\n\ntf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)\n</code></pre>\n\n<p>And this has worked fine in the past, it has allowed me to have a multi-input "predict" SignatureDef where I can send a json of the inputs to ai-platforms model service and get predictions back. But since I am trying to not rely on the <code>tf.compat.v1</code> library, I want to avoid using placeholders. </p>\n\n<p><strong>What I\'ve tried</strong></p>\n\n<p>Following the documentation linked <a href="https://www.tensorflow.org/guide/saved_model#savedmodels_from_estimators" rel="nofollow noreferrer">here</a> I\'ve replaced my serving_input_fn with the <code>tf.estimator.export.build_parsing_serving_input_receiver_fn</code> method: </p>\n\n<pre><code>feature_columns = ... # list of feature columns \nserving_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(\n tf.feature_column.make_parse_example_spec(feature_columns))\n</code></pre>\n\n<p>However, this gives me the following "predict" SignatureDef:</p>\n\n<pre><code>signature_def[\'predict\']:\n The given SavedModel SignatureDef contains the following input(s):\n inputs[\'examples\'] tensor_info:\n dtype: DT_STRING\n shape: (-1)\n name: input_example_tensor:0\n</code></pre>\n\n<p>whereas before my "predict" SignatureDef was as follows:</p>\n\n<pre><code>signature_def[\'predict\']:\n The given SavedModel SignatureDef contains the following input(s):\n inputs[\'feature1\'] tensor_info:\n dtype: DT_STRING\n shape: unknown_rank\n name: Placeholder:0\n inputs[\'feature2\'] tensor_info:\n dtype: DT_STRING\n shape: unknown_rank\n name: Placeholder_1:0\n inputs[\'feature3\'] tensor_info:\n dtype: DT_STRING\n shape: unknown_rank\n name: Placeholder_2:0\n</code></pre>\n\n<p>I\'ve also tried using the <code>tf.estimator.export.build_raw_serving_input_receiver_fn</code>, but my understanding is that this method requires actual Tensors in order to be used instead of a feature spec. Unless I use placeholders, I don\'t really understand where to grab these serving Tensors from.</p>\n\n<p><strong>So my main questions are:</strong></p>\n\n<ul>\n<li>Is it possible to create a multi-input "predict" signature def from an estimator model without using placeholders in Tensorflow 2? </li>\n<li>If it is not possible, how am I supposed to provide the instances to gcloud predictions service for the "examples" tensor in the "predict" signature def? </li>\n</ul>\n\n<p>Thanks! </p>\n'</li><li>'<p>For the computation of Intersection over Union (IoU) I want to find coordinates of minimum and maximum values (the border pixels) in a segmentation image <code>image_pred</code> that is represented by a float32 3D tensor. In particular, I aim at finding top left and bottom right corner coordinates of objects in an image. The image is entirely comprised of black pixels (value 0.0) except where the object is located, I have color pixels (0.0 &lt; values &lt; 1.0). Here\'s an example for such a bounding box (in my case, the object is the traffic sign and the environment is blacked out):</p>\n\n<p><a href="https://i.stack.imgur.com/QU04v.jpg" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/QU04v.jpg" alt="enter image description here"></a></p>\n\n<p>My approach so far is to <code>tf.boolean_mask</code> for setting every pixel to False except for the color pixels:</p>\n\n<pre><code>zeros = tf.zeros_like(image_pred)\nmask = tf.greater(image_pred, zeros)\nboolean_mask_pred = tf.boolean_mask(image_pred, mask)\n</code></pre>\n\n<p>and then use <code>tf.where</code> to find the coordinates of the masked image. To determine the <strong>horizontal and vertical coordinate values</strong> of the top left and bottom right corners of the rectangle, I thought about using <code>tf.recude_max</code> and <code>tf.reduce_min</code>, but since these do not return a single value if I provide an <code>axis</code>, I am unsure if this is the correct function to use. According to the docs, if I do not specify <code>axis</code>, the function will reduce all dimensions which is not what I want either. Which is the correct function to do this? The IoU in the end is a single 1D float value.</p>\n\n<pre><code>coordinates_pred = tf.where(boolean_mask_pred)\nx21 = tf.reduce_min(coordinates_pred, axis=1)\ny21 = tf.reduce_min(coordinates_pred, axis=0)\nx22 = tf.reduce_max(coordinates_pred, axis=1)\ny22 = tf.reduce_max(coordinates_pred, axis=0)\n</code></pre>\n'</li><li>'<p>Computing mean, total, etc. of each feature in a dataset seems quite trivial in <code>Pandas</code> and <code>Numpy</code>, but I couldn\'t find any similarly easy functions/operations for <a href="https://www.tensorflow.org/api_docs/python/tf/data/Dataset" rel="noreferrer"><code>tf.data.Dataset</code></a>. Actually I found <a href="https://www.tensorflow.org/api_docs/python/tf/data/Dataset#reduce" rel="noreferrer"><code>tf.data.Dataset.reduce</code></a> which allows me to compute running <code>sum</code>, but it\'s not that easy for other operation (<code>min</code>, <code>max</code>, <code>std</code>, etc.)\n<br>\n<br>So, my question is, is there a simple way to compute statistics for <code>tf.data.Dataset</code>? Moreover, is there a way to standardize/normalize (an entire, i.e. not in batch) <code>tf.data.Dataset</code>, especially if not using <code>tf.data.Dataset.reduce</code>?</p>\n'</li></ul> |
249
+ | 1 | <ul><li>'<p>TensorFlow documentation have the following example that can illustrate how to create a batch generator to feed a training set in batches to a model when the training set is too large to fit in memory:</p>\n<pre class="lang-py prettyprint-override"><code>from skimage.io import imread\nfrom skimage.transform import resize\nimport tensorflow as tf\nimport numpy as np\nimport math\n\n# Here, `x_set` is list of path to the images\n# and `y_set` are the associated classes.\n\nclass CIFAR10Sequence(tf.keras.utils.Sequence):\n\n def __init__(self, x_set, y_set, batch_size):\n self.x, self.y = x_set, y_set\n self.batch_size = batch_size\n\n def __len__(self):\n return math.ceil(len(self.x) / self.batch_size)\n\n def __getitem__(self, idx):\n batch_x = self.x[idx * self.batch_size:(idx + 1) *\n self.batch_size]\n batch_y = self.y[idx * self.batch_size:(idx + 1) *\n self.batch_size]\n\n return np.array([\n resize(imread(file_name), (200, 200))\n for file_name in batch_x]), np.array(batch_y)\n</code></pre>\n<p>My intention is to further increase the diversity of the training set by rotating each image 3x by 90º. In each Epoch of the training process, the model would first be fed with the &quot;0º training set&quot; and next with the 90º, 180º and 270º rotating sets, respectively.</p>\n<p>How can I modify the previous piece of code to perform this operation inside the <code>CIFAR10Sequence()</code> data generator?</p>\n<p>Please don\'t use <code>tf.keras.preprocessing.image.ImageDataGenerator()</code> so that the answer does not lose its generality for another type of similar problems that are of a different nature.</p>\n<p>NB: The idea would be to create the new data &quot;in real time&quot; as the model is fed instead of creating (in advance) and storing on disk a new and augmented training set bigger than the original one to be used later (also in batches) during the training process of the model.</p>\n<p>Thx in advance</p>\n'</li><li>"<p>I am trying to import a pretrained model from Huggingface's transformers library and extend it with a few layers for classification using tensorflow keras. When I directly use transformers model (Method 1), the model trains well and reaches a validation accuracy of 0.93 after 1 epoch. However, when trying to use the model as a layer within a tf.keras model (Method 2), the model can't get above 0.32 accuracy. As far as I can tell based on the documentation, the two approaches should be equivalent. My goal is to get Method 2 working so that I can add more layers to it instead of directly using the logits produced by Huggingface's classifier head but I'm stuck at this stage.</p>\n<pre><code>import tensorflow as tf\n\nfrom transformers import TFRobertaForSequenceClassification\n</code></pre>\n<p>Method 1:</p>\n<pre><code>model = TFRobertaForSequenceClassification.from_pretrained(&quot;roberta-base&quot;, num_labels=6)\n</code></pre>\n<p>Method 2:</p>\n<pre><code>input_ids = tf.keras.Input(shape=(128,), dtype='int32')\n\nattention_mask = tf.keras.Input(shape=(128, ), dtype='int32')\n\ntransformer = TFRobertaForSequenceClassification.from_pretrained(&quot;roberta-base&quot;, num_labels=6)\n\nencoded = transformer([input_ids, attention_mask])\n\nlogits = encoded[0]\n\nmodel = tf.keras.models.Model(inputs = [input_ids, attention_mask], outputs = logits)\n\n</code></pre>\n<p>Rest of the code for either method is identical,</p>\n<pre><code>model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0),\nloss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), \n metrics=[tf.keras.metrics.SparseCategoricalAccuracy('accuracy')])\n</code></pre>\n<p>I am using Tensorflow 2.3.0 and have tried with transformers versions 3.5.0 and 4.0.0.</p>\n"</li><li>'<p>In the official <a href="https://www.tensorflow.org/api_docs/python/tf/custom_gradient" rel="nofollow noreferrer">tf.custom_gradient</a> documentation it shows how to define custom gradients for <code>log(1 + exp(x))</code></p>\n<pre class="lang-py prettyprint-override"><code>@tf.custom_gradient\ndef log1pexp(x):\n e = tf.exp(x)\n def grad(dy):\n return dy * (1 - 1 / (1 + e))\n return tf.math.log(1 + e), grad\n</code></pre>\n<p>When <code>y = log(1 + exp(x))</code>, analytically the derivative comes out to be <code>dy/dx = (1 - 1 / (1 + exp(x)))</code>.</p>\n<p>However in the code <code>def grad</code> says its <code>dy * (1 - 1 / (1 + exp(x)))</code>.\n<code>dy/dx = dy * (1 - 1 / (1 + exp(x)))</code> is not a valid equation. While <code>dx = dy * (1 - 1 / (1 + exp(x)))</code> is wrong as it should be the reciprocal.</p>\n<p>What does the <code>grad</code> function equate to?</p>\n'</li></ul> |
250
+
251
+ ## Evaluation
252
+
253
+ ### Metrics
254
+ | Label | Accuracy | Precision | Recall | F1 |
255
+ |:--------|:---------|:----------|:-------|:-------|
256
+ | **all** | 0.71 | 0.7101 | 0.71 | 0.7100 |
257
+
258
+ ## Uses
259
+
260
+ ### Direct Use for Inference
261
+
262
+ First install the SetFit library:
263
+
264
+ ```bash
265
+ pip install setfit
266
+ ```
267
+
268
+ Then you can load this model and run inference.
269
+
270
+ ```python
271
+ from setfit import SetFitModel
272
+
273
+ # Download from the 🤗 Hub
274
+ model = SetFitModel.from_pretrained("sharukat/sbert-questionclassifier")
275
+ # Run inference
276
+ preds = model("<p>Where one can find the github source code for <code>tf.quantization.fake_quant_with_min_max_args</code>. Checking the <a href=\"https://www.tensorflow.org/api_docs/python/tf/quantization/fake_quant_with_min_max_args\" rel=\"nofollow noreferrer\">TF API documentation</a>, there is no link to the github source file, and I could not find one on github.</p>
277
+ ")
278
+ ```
279
+
280
+ <!--
281
+ ### Downstream Use
282
+
283
+ *List how someone could finetune this model on their own dataset.*
284
+ -->
285
+
286
+ <!--
287
+ ### Out-of-Scope Use
288
+
289
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
290
+ -->
291
+
292
+ <!--
293
+ ## Bias, Risks and Limitations
294
+
295
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
296
+ -->
297
+
298
+ <!--
299
+ ### Recommendations
300
+
301
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
302
+ -->
303
+
304
+ ## Training Details
305
+
306
+ ### Training Set Metrics
307
+ | Training set | Min | Median | Max |
308
+ |:-------------|:----|:--------|:-----|
309
+ | Word count | 15 | 336.203 | 3755 |
310
+
311
+ | Label | Training Sample Count |
312
+ |:------|:----------------------|
313
+ | 0 | 500 |
314
+ | 1 | 500 |
315
+
316
+ ### Training Hyperparameters
317
+ - batch_size: (16, 16)
318
+ - num_epochs: (2, 2)
319
+ - max_steps: -1
320
+ - sampling_strategy: oversampling
321
+ - num_iterations: 20
322
+ - body_learning_rate: (2e-05, 1e-05)
323
+ - head_learning_rate: 0.01
324
+ - loss: CosineSimilarityLoss
325
+ - distance_metric: cosine_distance
326
+ - margin: 0.25
327
+ - end_to_end: False
328
+ - use_amp: False
329
+ - warmup_proportion: 0.1
330
+ - max_length: 256
331
+ - seed: 42
332
+ - eval_max_steps: -1
333
+ - load_best_model_at_end: False
334
+
335
+ ### Training Results
336
+ | Epoch | Step | Training Loss | Validation Loss |
337
+ |:------:|:----:|:-------------:|:---------------:|
338
+ | 0.0004 | 1 | 0.26 | - |
339
+ | 0.02 | 50 | 0.2486 | - |
340
+ | 0.04 | 100 | 0.2383 | - |
341
+ | 0.06 | 150 | 0.309 | - |
342
+ | 0.08 | 200 | 0.2551 | - |
343
+ | 0.1 | 250 | 0.2675 | - |
344
+ | 0.12 | 300 | 0.2344 | - |
345
+ | 0.14 | 350 | 0.2686 | - |
346
+ | 0.16 | 400 | 0.2447 | - |
347
+ | 0.18 | 450 | 0.2317 | - |
348
+ | 0.2 | 500 | 0.2233 | - |
349
+ | 0.22 | 550 | 0.1999 | - |
350
+ | 0.24 | 600 | 0.2443 | - |
351
+ | 0.26 | 650 | 0.1667 | - |
352
+ | 0.28 | 700 | 0.2975 | - |
353
+ | 0.3 | 750 | 0.0902 | - |
354
+ | 0.32 | 800 | 0.1965 | - |
355
+ | 0.34 | 850 | 0.1571 | - |
356
+ | 0.36 | 900 | 0.1247 | - |
357
+ | 0.38 | 950 | 0.0494 | - |
358
+ | 0.4 | 1000 | 0.1222 | - |
359
+ | 0.42 | 1050 | 0.0828 | - |
360
+ | 0.44 | 1100 | 0.0393 | - |
361
+ | 0.46 | 1150 | 0.0104 | - |
362
+ | 0.48 | 1200 | 0.0143 | - |
363
+ | 0.5 | 1250 | 0.0505 | - |
364
+ | 0.52 | 1300 | 0.0053 | - |
365
+ | 0.54 | 1350 | 0.0337 | - |
366
+ | 0.56 | 1400 | 0.0013 | - |
367
+ | 0.58 | 1450 | 0.0061 | - |
368
+ | 0.6 | 1500 | 0.0519 | - |
369
+ | 0.62 | 1550 | 0.0068 | - |
370
+ | 0.64 | 1600 | 0.001 | - |
371
+ | 0.66 | 1650 | 0.0004 | - |
372
+ | 0.68 | 1700 | 0.0008 | - |
373
+ | 0.7 | 1750 | 0.0018 | - |
374
+ | 0.72 | 1800 | 0.0018 | - |
375
+ | 0.74 | 1850 | 0.0022 | - |
376
+ | 0.76 | 1900 | 0.0005 | - |
377
+ | 0.78 | 1950 | 0.0008 | - |
378
+ | 0.8 | 2000 | 0.0005 | - |
379
+ | 0.82 | 2050 | 0.0003 | - |
380
+ | 0.84 | 2100 | 0.0004 | - |
381
+ | 0.86 | 2150 | 0.0002 | - |
382
+ | 0.88 | 2200 | 0.0003 | - |
383
+ | 0.9 | 2250 | 0.0001 | - |
384
+ | 0.92 | 2300 | 0.0001 | - |
385
+ | 0.94 | 2350 | 0.0002 | - |
386
+ | 0.96 | 2400 | 0.0005 | - |
387
+ | 0.98 | 2450 | 0.0002 | - |
388
+ | 1.0 | 2500 | 0.0002 | - |
389
+ | 1.02 | 2550 | 0.0001 | - |
390
+ | 1.04 | 2600 | 0.0001 | - |
391
+ | 1.06 | 2650 | 0.0003 | - |
392
+ | 1.08 | 2700 | 0.0002 | - |
393
+ | 1.1 | 2750 | 0.0002 | - |
394
+ | 1.12 | 2800 | 0.0001 | - |
395
+ | 1.1400 | 2850 | 0.0001 | - |
396
+ | 1.16 | 2900 | 0.0002 | - |
397
+ | 1.18 | 2950 | 0.0594 | - |
398
+ | 1.2 | 3000 | 0.0002 | - |
399
+ | 1.22 | 3050 | 0.0002 | - |
400
+ | 1.24 | 3100 | 0.0001 | - |
401
+ | 1.26 | 3150 | 0.0262 | - |
402
+ | 1.28 | 3200 | 0.0001 | - |
403
+ | 1.3 | 3250 | 0.0001 | - |
404
+ | 1.32 | 3300 | 0.0001 | - |
405
+ | 1.34 | 3350 | 0.0001 | - |
406
+ | 1.3600 | 3400 | 0.0001 | - |
407
+ | 1.38 | 3450 | 0.0002 | - |
408
+ | 1.4 | 3500 | 0.0 | - |
409
+ | 1.42 | 3550 | 0.0001 | - |
410
+ | 1.44 | 3600 | 0.0001 | - |
411
+ | 1.46 | 3650 | 0.0001 | - |
412
+ | 1.48 | 3700 | 0.0001 | - |
413
+ | 1.5 | 3750 | 0.0001 | - |
414
+ | 1.52 | 3800 | 0.0001 | - |
415
+ | 1.54 | 3850 | 0.0001 | - |
416
+ | 1.56 | 3900 | 0.0001 | - |
417
+ | 1.58 | 3950 | 0.0001 | - |
418
+ | 1.6 | 4000 | 0.0001 | - |
419
+ | 1.62 | 4050 | 0.0002 | - |
420
+ | 1.6400 | 4100 | 0.0044 | - |
421
+ | 1.6600 | 4150 | 0.0001 | - |
422
+ | 1.6800 | 4200 | 0.0002 | - |
423
+ | 1.7 | 4250 | 0.0001 | - |
424
+ | 1.72 | 4300 | 0.0001 | - |
425
+ | 1.74 | 4350 | 0.0001 | - |
426
+ | 1.76 | 4400 | 0.0001 | - |
427
+ | 1.78 | 4450 | 0.0 | - |
428
+ | 1.8 | 4500 | 0.0001 | - |
429
+ | 1.8200 | 4550 | 0.0001 | - |
430
+ | 1.8400 | 4600 | 0.0 | - |
431
+ | 1.8600 | 4650 | 0.061 | - |
432
+ | 1.88 | 4700 | 0.0002 | - |
433
+ | 1.9 | 4750 | 0.0001 | - |
434
+ | 1.92 | 4800 | 0.0001 | - |
435
+ | 1.94 | 4850 | 0.0001 | - |
436
+ | 1.96 | 4900 | 0.0001 | - |
437
+ | 1.98 | 4950 | 0.0001 | - |
438
+ | 2.0 | 5000 | 0.0001 | - |
439
+
440
+ ### Framework Versions
441
+ - Python: 3.10.12
442
+ - SetFit: 1.0.3
443
+ - Sentence Transformers: 2.4.0
444
+ - Transformers: 4.37.2
445
+ - PyTorch: 2.1.0+cu121
446
+ - Datasets: 2.17.1
447
+ - Tokenizers: 0.15.2
448
+
449
+ ## Citation
450
+
451
+ ### BibTeX
452
+ ```bibtex
453
+ @article{https://doi.org/10.48550/arxiv.2209.11055,
454
+ doi = {10.48550/ARXIV.2209.11055},
455
+ url = {https://arxiv.org/abs/2209.11055},
456
+ author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
457
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
458
+ title = {Efficient Few-Shot Learning Without Prompts},
459
+ publisher = {arXiv},
460
+ year = {2022},
461
+ copyright = {Creative Commons Attribution 4.0 International}
462
+ }
463
+ ```
464
+
465
+ <!--
466
+ ## Glossary
467
+
468
+ *Clearly define terms in order to be accessible across audiences.*
469
+ -->
470
+
471
+ <!--
472
+ ## Model Card Authors
473
+
474
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
475
+ -->
476
+
477
+ <!--
478
+ ## Model Card Contact
479
+
480
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
481
+ -->
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "sentence-transformers/all-MiniLM-L6-v2",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 384,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 1536,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 6,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.37.2",
23
+ "type_vocab_size": 2,
24
+ "use_cache": true,
25
+ "vocab_size": 30522
26
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "2.0.0",
4
+ "transformers": "4.6.1",
5
+ "pytorch": "1.8.1"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null
9
+ }
config_setfit.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "normalize_embeddings": false,
3
+ "labels": null
4
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:38dce997b57eb194580761b809421adbb6954804fa762b940905be33a22278dc
3
+ size 90864192
model_head.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d6e16c12e0211511769d651cc2d97cef6703e028da39cbd378afeb53cdaf7ce
3
+ size 4630
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 256,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "max_length": 128,
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "[PAD]",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "[SEP]",
57
+ "stride": 0,
58
+ "strip_accents": null,
59
+ "tokenize_chinese_chars": true,
60
+ "tokenizer_class": "BertTokenizer",
61
+ "truncation_side": "right",
62
+ "truncation_strategy": "longest_first",
63
+ "unk_token": "[UNK]"
64
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff