Udated model to the best trained one.
Browse files- 2_Dense/config.json +0 -1
- 2_Dense/pytorch_model.bin +0 -3
- README.md +51 -13
- eval/similarity_evaluation_sts-test_results.csv +10 -20
- loss_digest.json +0 -0
- modules.json +0 -6
- pytorch_model.bin +1 -1
2_Dense/config.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"in_features": 768, "out_features": 512, "bias": true, "activation_function": "torch.nn.modules.activation.Tanh"}
|
|
|
|
2_Dense/pytorch_model.bin
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:318abfeb7ac3562dae47bd5126150009554f49c4704ea18ecc8903dfd970d857
|
3 |
-
size 1575975
|
|
|
|
|
|
|
|
README.md
CHANGED
@@ -17,7 +17,8 @@ widget:
|
|
17 |
|
18 |
# bertin-roberta-base-finetuning-esnli
|
19 |
|
20 |
-
This is a [sentence-transformers](https://www.SBERT.net) model trained on a
|
|
|
21 |
|
22 |
Based around the siamese networks approach from [this paper](https://arxiv.org/pdf/1908.10084.pdf).
|
23 |
<!--- Describe your model here -->
|
@@ -41,6 +42,43 @@ embeddings = model.encode(sentences)
|
|
41 |
print(embeddings)
|
42 |
```
|
43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
## Evaluation Results
|
45 |
|
46 |
<!--- Describe how your model was evaluated -->
|
@@ -48,14 +86,14 @@ Our model was evaluated on the task of Semantic Textual Similarity using the [Se
|
|
48 |
|
49 |
| | [BETO STS](https://huggingface.co/espejelomar/sentece-embeddings-BETO) | BERTIN STS (this model) | Relative improvement |
|
50 |
|-------------------:|---------:|-----------:|---------------------:|
|
51 |
-
| cosine_pearson | 0.609803 | 0.
|
52 |
-
| cosine_spearman | 0.528776 | 0.
|
53 |
-
| euclidean_pearson | 0.590613 | 0.
|
54 |
-
| euclidean_spearman | 0.526529 | 0.
|
55 |
-
| manhattan_pearson | 0.589108 | 0.
|
56 |
-
| manhattan_spearman | 0.525910 | 0.
|
57 |
-
| dot_pearson | 0.544078 | 0.
|
58 |
-
| dot_spearman | 0.460427 | 0.
|
59 |
|
60 |
|
61 |
## Training
|
@@ -72,7 +110,8 @@ The whole dataset used is available [here](https://huggingface.co/datasets/hacka
|
|
72 |
|
73 |
**DataLoader**:
|
74 |
|
75 |
-
`sentence_transformers.datasets.NoDuplicatesDataLoader.NoDuplicatesDataLoader`
|
|
|
76 |
```
|
77 |
{'batch_size': 64}
|
78 |
```
|
@@ -87,7 +126,7 @@ The whole dataset used is available [here](https://huggingface.co/datasets/hacka
|
|
87 |
Parameters of the fit()-Method:
|
88 |
```
|
89 |
{
|
90 |
-
"epochs":
|
91 |
"evaluation_steps": 0,
|
92 |
"evaluator": "sentence_transformers.evaluation.EmbeddingSimilarityEvaluator.EmbeddingSimilarityEvaluator",
|
93 |
"max_grad_norm": 1,
|
@@ -97,7 +136,7 @@ Parameters of the fit()-Method:
|
|
97 |
},
|
98 |
"scheduler": "WarmupLinear",
|
99 |
"steps_per_epoch": null,
|
100 |
-
"warmup_steps":
|
101 |
"weight_decay": 0.01
|
102 |
}
|
103 |
```
|
@@ -108,7 +147,6 @@ Parameters of the fit()-Method:
|
|
108 |
SentenceTransformer(
|
109 |
(0): Transformer({'max_seq_length': 514, 'do_lower_case': False}) with Transformer model: RobertaModel
|
110 |
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
|
111 |
-
(2): Dense({'in_features': 768, 'out_features': 512, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
|
112 |
)
|
113 |
```
|
114 |
|
|
|
17 |
|
18 |
# bertin-roberta-base-finetuning-esnli
|
19 |
|
20 |
+
This is a [sentence-transformers](https://www.SBERT.net) model trained on a
|
21 |
+
collection of NLI tasks for Spanish. It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
|
22 |
|
23 |
Based around the siamese networks approach from [this paper](https://arxiv.org/pdf/1908.10084.pdf).
|
24 |
<!--- Describe your model here -->
|
|
|
42 |
print(embeddings)
|
43 |
```
|
44 |
|
45 |
+
## Usage (HuggingFace Transformers)
|
46 |
+
Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
|
47 |
+
|
48 |
+
```python
|
49 |
+
from transformers import AutoTokenizer, AutoModel
|
50 |
+
import torch
|
51 |
+
|
52 |
+
|
53 |
+
#Mean Pooling - Take attention mask into account for correct averaging
|
54 |
+
def mean_pooling(model_output, attention_mask):
|
55 |
+
token_embeddings = model_output[0] #First element of model_output contains all token embeddings
|
56 |
+
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
|
57 |
+
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
|
58 |
+
|
59 |
+
|
60 |
+
# Sentences we want sentence embeddings for
|
61 |
+
sentences = ['This is an example sentence', 'Each sentence is converted']
|
62 |
+
|
63 |
+
# Load model from HuggingFace Hub
|
64 |
+
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
|
65 |
+
model = AutoModel.from_pretrained('{MODEL_NAME}')
|
66 |
+
|
67 |
+
# Tokenize sentences
|
68 |
+
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
|
69 |
+
|
70 |
+
# Compute token embeddings
|
71 |
+
with torch.no_grad():
|
72 |
+
model_output = model(**encoded_input)
|
73 |
+
|
74 |
+
# Perform pooling. In this case, mean pooling.
|
75 |
+
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
|
76 |
+
|
77 |
+
print("Sentence embeddings:")
|
78 |
+
print(sentence_embeddings)
|
79 |
+
```
|
80 |
+
|
81 |
+
|
82 |
## Evaluation Results
|
83 |
|
84 |
<!--- Describe how your model was evaluated -->
|
|
|
86 |
|
87 |
| | [BETO STS](https://huggingface.co/espejelomar/sentece-embeddings-BETO) | BERTIN STS (this model) | Relative improvement |
|
88 |
|-------------------:|---------:|-----------:|---------------------:|
|
89 |
+
| cosine_pearson | 0.609803 | 0.683188 | +12.03 |
|
90 |
+
| cosine_spearman | 0.528776 | 0.615916 | +16.48 |
|
91 |
+
| euclidean_pearson | 0.590613 | 0.672601 | +13.88 |
|
92 |
+
| euclidean_spearman | 0.526529 | 0.611539 | +16.15 |
|
93 |
+
| manhattan_pearson | 0.589108 | 0.672040 | +14.08 |
|
94 |
+
| manhattan_spearman | 0.525910 | 0.610517 | +16.09 |
|
95 |
+
| dot_pearson | 0.544078 | 0.600517 | +10.37 |
|
96 |
+
| dot_spearman | 0.460427 | 0.521260 | +13.21 |
|
97 |
|
98 |
|
99 |
## Training
|
|
|
110 |
|
111 |
**DataLoader**:
|
112 |
|
113 |
+
`sentence_transformers.datasets.NoDuplicatesDataLoader.NoDuplicatesDataLoader`
|
114 |
+
of length 1818 with parameters:
|
115 |
```
|
116 |
{'batch_size': 64}
|
117 |
```
|
|
|
126 |
Parameters of the fit()-Method:
|
127 |
```
|
128 |
{
|
129 |
+
"epochs": 10,
|
130 |
"evaluation_steps": 0,
|
131 |
"evaluator": "sentence_transformers.evaluation.EmbeddingSimilarityEvaluator.EmbeddingSimilarityEvaluator",
|
132 |
"max_grad_norm": 1,
|
|
|
136 |
},
|
137 |
"scheduler": "WarmupLinear",
|
138 |
"steps_per_epoch": null,
|
139 |
+
"warmup_steps": 909,
|
140 |
"weight_decay": 0.01
|
141 |
}
|
142 |
```
|
|
|
147 |
SentenceTransformer(
|
148 |
(0): Transformer({'max_seq_length': 514, 'do_lower_case': False}) with Transformer model: RobertaModel
|
149 |
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
|
|
|
150 |
)
|
151 |
```
|
152 |
|
eval/similarity_evaluation_sts-test_results.csv
CHANGED
@@ -1,21 +1,11 @@
|
|
1 |
epoch,steps,cosine_pearson,cosine_spearman,euclidean_pearson,euclidean_spearman,manhattan_pearson,manhattan_spearman,dot_pearson,dot_spearman
|
2 |
-
0,-1,0.
|
3 |
-
1,-1,0.
|
4 |
-
2,-1,0.
|
5 |
-
3,-1,0.
|
6 |
-
4,-1,0.
|
7 |
-
5,-1,0.
|
8 |
-
6,-1,0.
|
9 |
-
7,-1,0.
|
10 |
-
8,-1,0.
|
11 |
-
9,-1,0.
|
12 |
-
10,-1,0.6434909937737626,0.5713701447625351,0.6301188859529719,0.5709692410446885,0.6309719329436375,0.5701395230401529,0.593913567963774,0.522557073939444
|
13 |
-
11,-1,0.641203878405462,0.5722014251907718,0.6284168875038928,0.5737909498411451,0.6295168797303964,0.5728132601653629,0.5893572348665002,0.5218607585112776
|
14 |
-
12,-1,0.6405665479784053,0.5712144563426479,0.6258392075727873,0.5693129298830195,0.6262440363392721,0.5679223727890534,0.593952756495054,0.5268886237775188
|
15 |
-
13,-1,0.6390052346365416,0.5686395678794071,0.6258537618625887,0.5685859625426081,0.6265438374367317,0.5677389726542497,0.591956305872708,0.5218520657539587
|
16 |
-
14,-1,0.6401240726804178,0.5711650411421381,0.6278602450688386,0.5727693520022645,0.628050553113738,0.5709335183573409,0.5937276661244524,0.5234451981826964
|
17 |
-
15,-1,0.6398403358896347,0.5692425497972115,0.6246306232307527,0.5691193313826032,0.6255512511477327,0.5683736149577787,0.5940274286246308,0.5237160798409092
|
18 |
-
16,-1,0.640328214937794,0.5708227567207858,0.6255762617684392,0.5716483840159948,0.6265171469104598,0.569976860529018,0.5945084491171609,0.5247411860311914
|
19 |
-
17,-1,0.6404406282410006,0.5712850352823815,0.6251102831494417,0.5715652062596898,0.6257154822798084,0.5695532590559501,0.5939059512747178,0.525733381896788
|
20 |
-
18,-1,0.64141106615211,0.572578991980065,0.6261621835757434,0.5725016418579003,0.6268679024101312,0.5700271563683411,0.5965506884620715,0.5278869557071051
|
21 |
-
19,-1,0.6406208759751268,0.5720221725807018,0.625890984121176,0.5726780638465656,0.6263716719579958,0.5694595420415887,0.5959571243848296,0.5275360779553968
|
|
|
1 |
epoch,steps,cosine_pearson,cosine_spearman,euclidean_pearson,euclidean_spearman,manhattan_pearson,manhattan_spearman,dot_pearson,dot_spearman
|
2 |
+
0,-1,0.6831884913062921,0.6159162222541099,0.6726005233636806,0.6115392058863335,0.6720401096771059,0.6105173097665644,0.6005167896896939,0.5212600492097655
|
3 |
+
1,-1,0.6706171111332979,0.6008531510212776,0.6565912032452935,0.5949169636344843,0.6555142909342582,0.5935398433843475,0.5765151466955727,0.49637768476198035
|
4 |
+
2,-1,0.6763825624896551,0.6087882606796842,0.6627392144068636,0.6053590389366899,0.6612759395162868,0.6030838801547247,0.5826990236692152,0.5088888493638298
|
5 |
+
3,-1,0.66260616452593,0.5913823777186296,0.6469213245153994,0.5891702556310773,0.6449471942861446,0.5872578064093931,0.5818409585899842,0.5052892808258618
|
6 |
+
4,-1,0.6566925461921814,0.5871384798501856,0.6379456634562074,0.5819500400390282,0.6356299181697714,0.5793092883148608,0.5725533633222645,0.5005210619710372
|
7 |
+
5,-1,0.6560126958746472,0.584645192515697,0.6375859060277993,0.5799601798248812,0.6358427415811263,0.578232849404072,0.5777523875165609,0.5017760148916008
|
8 |
+
6,-1,0.6503433461367746,0.578081436343585,0.6326739453456565,0.5758382504320848,0.6308846572628577,0.5745397200941126,0.571361965152683,0.49444579046714365
|
9 |
+
7,-1,0.6511867735121081,0.5769374865250576,0.6323147897935092,0.5744373103224324,0.6309669803317294,0.573106665075477,0.57342064744336,0.4975609366385161
|
10 |
+
8,-1,0.6506119610377241,0.5781030546060674,0.6326539782626099,0.5757848865607669,0.6310415147465013,0.5743098307522757,0.5723862516745356,0.49789660206491654
|
11 |
+
9,-1,0.6488271901388144,0.5782767677139244,0.6287620409812228,0.5742694918130841,0.6272343282453402,0.5729337473833224,0.5685335534384852,0.4968351056062509
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
loss_digest.json
CHANGED
The diff for this file is too large to render.
See raw diff
|
|
modules.json
CHANGED
@@ -10,11 +10,5 @@
|
|
10 |
"name": "1",
|
11 |
"path": "1_Pooling",
|
12 |
"type": "sentence_transformers.models.Pooling"
|
13 |
-
},
|
14 |
-
{
|
15 |
-
"idx": 2,
|
16 |
-
"name": "2",
|
17 |
-
"path": "2_Dense",
|
18 |
-
"type": "sentence_transformers.models.Dense"
|
19 |
}
|
20 |
]
|
|
|
10 |
"name": "1",
|
11 |
"path": "1_Pooling",
|
12 |
"type": "sentence_transformers.models.Pooling"
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
}
|
14 |
]
|
pytorch_model.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 498664817
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:eaff1c454271166e40db8096964f269f9b5de9fad5e056c455e5de9be3404ba9
|
3 |
size 498664817
|