Add new SentenceTransformer model.
Browse files- README.md +174 -8
- model.safetensors +1 -1
README.md
CHANGED
@@ -8,6 +8,22 @@ datasets:
|
|
8 |
language:
|
9 |
- en
|
10 |
library_name: sentence-transformers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
pipeline_tag: sentence-similarity
|
12 |
tags:
|
13 |
- sentence-transformers
|
@@ -48,6 +64,105 @@ widget:
|
|
48 |
- It is meant to stimulate root growth - in particular to stimulate the creation
|
49 |
of roots.
|
50 |
- A person folds a piece of paper.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
---
|
52 |
|
53 |
# SentenceTransformer based on allenai/longformer-base-4096
|
@@ -144,6 +259,56 @@ You can finetune this model on your own dataset.
|
|
144 |
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
|
145 |
-->
|
146 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
147 |
<!--
|
148 |
## Bias, Risks and Limitations
|
149 |
|
@@ -425,9 +590,9 @@ You can finetune this model on your own dataset.
|
|
425 |
|
426 |
- `overwrite_output_dir`: True
|
427 |
- `eval_strategy`: steps
|
428 |
-
- `learning_rate`:
|
429 |
- `num_train_epochs`: 10
|
430 |
-
- `warmup_steps`:
|
431 |
- `load_best_model_at_end`: True
|
432 |
|
433 |
#### All Hyperparameters
|
@@ -444,7 +609,7 @@ You can finetune this model on your own dataset.
|
|
444 |
- `gradient_accumulation_steps`: 1
|
445 |
- `eval_accumulation_steps`: None
|
446 |
- `torch_empty_cache_steps`: None
|
447 |
-
- `learning_rate`:
|
448 |
- `weight_decay`: 0.0
|
449 |
- `adam_beta1`: 0.9
|
450 |
- `adam_beta2`: 0.999
|
@@ -455,7 +620,7 @@ You can finetune this model on your own dataset.
|
|
455 |
- `lr_scheduler_type`: linear
|
456 |
- `lr_scheduler_kwargs`: {}
|
457 |
- `warmup_ratio`: 0.0
|
458 |
-
- `warmup_steps`:
|
459 |
- `log_level`: passive
|
460 |
- `log_level_replica`: warning
|
461 |
- `log_on_each_node`: True
|
@@ -549,10 +714,11 @@ You can finetune this model on your own dataset.
|
|
549 |
</details>
|
550 |
|
551 |
### Training Logs
|
552 |
-
| Epoch | Step | Training Loss |
|
553 |
-
|
554 |
-
| 0.0487 | 200 | 3.
|
555 |
-
| 0.0973 | 400 | 3.
|
|
|
556 |
|
557 |
|
558 |
### Framework Versions
|
|
|
8 |
language:
|
9 |
- en
|
10 |
library_name: sentence-transformers
|
11 |
+
metrics:
|
12 |
+
- pearson_cosine
|
13 |
+
- spearman_cosine
|
14 |
+
- pearson_manhattan
|
15 |
+
- spearman_manhattan
|
16 |
+
- pearson_euclidean
|
17 |
+
- spearman_euclidean
|
18 |
+
- pearson_dot
|
19 |
+
- spearman_dot
|
20 |
+
- pearson_max
|
21 |
+
- spearman_max
|
22 |
+
- cosine_accuracy
|
23 |
+
- dot_accuracy
|
24 |
+
- manhattan_accuracy
|
25 |
+
- euclidean_accuracy
|
26 |
+
- max_accuracy
|
27 |
pipeline_tag: sentence-similarity
|
28 |
tags:
|
29 |
- sentence-transformers
|
|
|
64 |
- It is meant to stimulate root growth - in particular to stimulate the creation
|
65 |
of roots.
|
66 |
- A person folds a piece of paper.
|
67 |
+
model-index:
|
68 |
+
- name: SentenceTransformer based on allenai/longformer-base-4096
|
69 |
+
results:
|
70 |
+
- task:
|
71 |
+
type: semantic-similarity
|
72 |
+
name: Semantic Similarity
|
73 |
+
dataset:
|
74 |
+
name: sts dev
|
75 |
+
type: sts-dev
|
76 |
+
metrics:
|
77 |
+
- type: pearson_cosine
|
78 |
+
value: .nan
|
79 |
+
name: Pearson Cosine
|
80 |
+
- type: spearman_cosine
|
81 |
+
value: .nan
|
82 |
+
name: Spearman Cosine
|
83 |
+
- type: pearson_manhattan
|
84 |
+
value: 0.1953366031192939
|
85 |
+
name: Pearson Manhattan
|
86 |
+
- type: spearman_manhattan
|
87 |
+
value: 0.18628029922412706
|
88 |
+
name: Spearman Manhattan
|
89 |
+
- type: pearson_euclidean
|
90 |
+
value: 0.12038330059026879
|
91 |
+
name: Pearson Euclidean
|
92 |
+
- type: spearman_euclidean
|
93 |
+
value: 0.11701423250889276
|
94 |
+
name: Spearman Euclidean
|
95 |
+
- type: pearson_dot
|
96 |
+
value: -0.020898059060793592
|
97 |
+
name: Pearson Dot
|
98 |
+
- type: spearman_dot
|
99 |
+
value: -0.019267171663208498
|
100 |
+
name: Spearman Dot
|
101 |
+
- type: pearson_max
|
102 |
+
value: .nan
|
103 |
+
name: Pearson Max
|
104 |
+
- type: spearman_max
|
105 |
+
value: .nan
|
106 |
+
name: Spearman Max
|
107 |
+
- task:
|
108 |
+
type: triplet
|
109 |
+
name: Triplet
|
110 |
+
dataset:
|
111 |
+
name: triplet dev
|
112 |
+
type: triplet-dev
|
113 |
+
metrics:
|
114 |
+
- type: cosine_accuracy
|
115 |
+
value: 0.5089611178614823
|
116 |
+
name: Cosine Accuracy
|
117 |
+
- type: dot_accuracy
|
118 |
+
value: 0.24939246658566222
|
119 |
+
name: Dot Accuracy
|
120 |
+
- type: manhattan_accuracy
|
121 |
+
value: 0.511543134872418
|
122 |
+
name: Manhattan Accuracy
|
123 |
+
- type: euclidean_accuracy
|
124 |
+
value: 0.5103280680437424
|
125 |
+
name: Euclidean Accuracy
|
126 |
+
- type: max_accuracy
|
127 |
+
value: 0.511543134872418
|
128 |
+
name: Max Accuracy
|
129 |
+
- task:
|
130 |
+
type: semantic-similarity
|
131 |
+
name: Semantic Similarity
|
132 |
+
dataset:
|
133 |
+
name: label accuracy dev
|
134 |
+
type: label-accuracy-dev
|
135 |
+
metrics:
|
136 |
+
- type: pearson_cosine
|
137 |
+
value: .nan
|
138 |
+
name: Pearson Cosine
|
139 |
+
- type: spearman_cosine
|
140 |
+
value: .nan
|
141 |
+
name: Spearman Cosine
|
142 |
+
- type: pearson_manhattan
|
143 |
+
value: 0.049476403113581605
|
144 |
+
name: Pearson Manhattan
|
145 |
+
- type: spearman_manhattan
|
146 |
+
value: 0.05279290870444774
|
147 |
+
name: Spearman Manhattan
|
148 |
+
- type: pearson_euclidean
|
149 |
+
value: 0.03906753540286213
|
150 |
+
name: Pearson Euclidean
|
151 |
+
- type: spearman_euclidean
|
152 |
+
value: 0.04333503769885663
|
153 |
+
name: Spearman Euclidean
|
154 |
+
- type: pearson_dot
|
155 |
+
value: -0.011658647110881755
|
156 |
+
name: Pearson Dot
|
157 |
+
- type: spearman_dot
|
158 |
+
value: -0.009275521591297707
|
159 |
+
name: Spearman Dot
|
160 |
+
- type: pearson_max
|
161 |
+
value: .nan
|
162 |
+
name: Pearson Max
|
163 |
+
- type: spearman_max
|
164 |
+
value: .nan
|
165 |
+
name: Spearman Max
|
166 |
---
|
167 |
|
168 |
# SentenceTransformer based on allenai/longformer-base-4096
|
|
|
259 |
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
|
260 |
-->
|
261 |
|
262 |
+
## Evaluation
|
263 |
+
|
264 |
+
### Metrics
|
265 |
+
|
266 |
+
#### Semantic Similarity
|
267 |
+
* Dataset: `sts-dev`
|
268 |
+
* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
|
269 |
+
|
270 |
+
| Metric | Value |
|
271 |
+
|:-------------------|:--------|
|
272 |
+
| pearson_cosine | nan |
|
273 |
+
| spearman_cosine | nan |
|
274 |
+
| pearson_manhattan | 0.1953 |
|
275 |
+
| spearman_manhattan | 0.1863 |
|
276 |
+
| pearson_euclidean | 0.1204 |
|
277 |
+
| spearman_euclidean | 0.117 |
|
278 |
+
| pearson_dot | -0.0209 |
|
279 |
+
| spearman_dot | -0.0193 |
|
280 |
+
| pearson_max | nan |
|
281 |
+
| **spearman_max** | **nan** |
|
282 |
+
|
283 |
+
#### Triplet
|
284 |
+
* Dataset: `triplet-dev`
|
285 |
+
* Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
|
286 |
+
|
287 |
+
| Metric | Value |
|
288 |
+
|:-------------------|:-----------|
|
289 |
+
| cosine_accuracy | 0.509 |
|
290 |
+
| dot_accuracy | 0.2494 |
|
291 |
+
| manhattan_accuracy | 0.5115 |
|
292 |
+
| euclidean_accuracy | 0.5103 |
|
293 |
+
| **max_accuracy** | **0.5115** |
|
294 |
+
|
295 |
+
#### Semantic Similarity
|
296 |
+
* Dataset: `label-accuracy-dev`
|
297 |
+
* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
|
298 |
+
|
299 |
+
| Metric | Value |
|
300 |
+
|:-------------------|:--------|
|
301 |
+
| pearson_cosine | nan |
|
302 |
+
| spearman_cosine | nan |
|
303 |
+
| pearson_manhattan | 0.0495 |
|
304 |
+
| spearman_manhattan | 0.0528 |
|
305 |
+
| pearson_euclidean | 0.0391 |
|
306 |
+
| spearman_euclidean | 0.0433 |
|
307 |
+
| pearson_dot | -0.0117 |
|
308 |
+
| spearman_dot | -0.0093 |
|
309 |
+
| pearson_max | nan |
|
310 |
+
| **spearman_max** | **nan** |
|
311 |
+
|
312 |
<!--
|
313 |
## Bias, Risks and Limitations
|
314 |
|
|
|
590 |
|
591 |
- `overwrite_output_dir`: True
|
592 |
- `eval_strategy`: steps
|
593 |
+
- `learning_rate`: 3.304439853025411e-05
|
594 |
- `num_train_epochs`: 10
|
595 |
+
- `warmup_steps`: 1
|
596 |
- `load_best_model_at_end`: True
|
597 |
|
598 |
#### All Hyperparameters
|
|
|
609 |
- `gradient_accumulation_steps`: 1
|
610 |
- `eval_accumulation_steps`: None
|
611 |
- `torch_empty_cache_steps`: None
|
612 |
+
- `learning_rate`: 3.304439853025411e-05
|
613 |
- `weight_decay`: 0.0
|
614 |
- `adam_beta1`: 0.9
|
615 |
- `adam_beta2`: 0.999
|
|
|
620 |
- `lr_scheduler_type`: linear
|
621 |
- `lr_scheduler_kwargs`: {}
|
622 |
- `warmup_ratio`: 0.0
|
623 |
+
- `warmup_steps`: 1
|
624 |
- `log_level`: passive
|
625 |
- `log_level_replica`: warning
|
626 |
- `log_on_each_node`: True
|
|
|
714 |
</details>
|
715 |
|
716 |
### Training Logs
|
717 |
+
| Epoch | Step | Training Loss | stsb loss | quora loss | all-nli-triplet loss | natural-questions loss | label-accuracy-dev_spearman_max | sts-dev_spearman_max | triplet-dev_max_accuracy |
|
718 |
+
|:------:|:----:|:-------------:|:---------:|:----------:|:--------------------:|:----------------------:|:-------------------------------:|:--------------------:|:------------------------:|
|
719 |
+
| 0.0487 | 200 | 3.3109 | - | - | - | - | - | - | - |
|
720 |
+
| 0.0973 | 400 | 3.5823 | - | - | - | - | - | - | - |
|
721 |
+
| 0.1217 | 500 | - | 4.7553 | 2.7670 | 3.4649 | 2.7670 | nan | nan | 0.5115 |
|
722 |
|
723 |
|
724 |
### Framework Versions
|
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 594668880
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:045c131132c7123db999814287f8b2d08c841dacc9bc6aa11413997282d31ac7
|
3 |
size 594668880
|