Udated model to the best trained one.

Browse files

Files changed (7) hide show

2_Dense/config.json +0 -1
2_Dense/pytorch_model.bin +0 -3
README.md +51 -13
eval/similarity_evaluation_sts-test_results.csv +10 -20
loss_digest.json +0 -0
modules.json +0 -6
pytorch_model.bin +1 -1

2_Dense/config.json DELETED Viewed

	@@ -1 +0,0 @@
1	- {"in_features": 768, "out_features": 512, "bias": true, "activation_function": "torch.nn.modules.activation.Tanh"}

2_Dense/pytorch_model.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:318abfeb7ac3562dae47bd5126150009554f49c4704ea18ecc8903dfd970d857
-size 1575975

README.md CHANGED Viewed

@@ -17,7 +17,8 @@ widget:
 # bertin-roberta-base-finetuning-esnli
-This is a [sentence-transformers](https://www.SBERT.net) model trained on a collection of NLI tasks for Spanish. It maps sentences & paragraphs to a 512 dimensional dense vector space and can be used for tasks like clustering or semantic search.
 Based around the siamese networks approach from [this paper](https://arxiv.org/pdf/1908.10084.pdf).
 <!--- Describe your model here -->
@@ -41,6 +42,43 @@ embeddings = model.encode(sentences)
 print(embeddings)
 ```
 ## Evaluation Results
 <!--- Describe how your model was evaluated -->
@@ -48,14 +86,14 @@ Our model was evaluated on the task of Semantic Textual Similarity using the [Se
 |                    | [BETO STS](https://huggingface.co/espejelomar/sentece-embeddings-BETO) | BERTIN STS (this model) | Relative improvement |
 |-------------------:|---------:|-----------:|---------------------:|
-|   cosine_pearson   | 0.609803 | 0.669326   | +9.76                |
-|   cosine_spearman  | 0.528776 | 0.596159   | +12.74               |
-|  euclidean_pearson | 0.590613 | 0.665561   | +12.69               |
-| euclidean_spearman | 0.526529 | 0.600940   | +14.13               |
-|  manhattan_pearson | 0.589108 | 0.665463   | +12.96               |
-| manhattan_spearman | 0.525910 | 0.600947   | +14.27               |
-|     dot_pearson    | 0.544078 | 0.600923   | +10.45               |
-|    dot_spearman    | 0.460427 | 0.517005   | +12.29               |
 ## Training
@@ -72,7 +110,8 @@ The whole dataset used is available [here](https://huggingface.co/datasets/hacka
 **DataLoader**:
-`sentence_transformers.datasets.NoDuplicatesDataLoader.NoDuplicatesDataLoader` of length 1127 with parameters:
 ```
 {'batch_size': 64}
 ```
@@ -87,7 +126,7 @@ The whole dataset used is available [here](https://huggingface.co/datasets/hacka
 Parameters of the fit()-Method:
 ```
 {
-    "epochs": 20,
     "evaluation_steps": 0,
     "evaluator": "sentence_transformers.evaluation.EmbeddingSimilarityEvaluator.EmbeddingSimilarityEvaluator",
     "max_grad_norm": 1,
@@ -97,7 +136,7 @@ Parameters of the fit()-Method:
     },
     "scheduler": "WarmupLinear",
     "steps_per_epoch": null,
-    "warmup_steps": 1127,
     "weight_decay": 0.01
 }
 ```
@@ -108,7 +147,6 @@ Parameters of the fit()-Method:
 SentenceTransformer(
   (0): Transformer({'max_seq_length': 514, 'do_lower_case': False}) with Transformer model: RobertaModel
   (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
-  (2): Dense({'in_features': 768, 'out_features': 512, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
 )
 ```

 # bertin-roberta-base-finetuning-esnli
+This is a [sentence-transformers](https://www.SBERT.net) model trained on a
+collection of NLI tasks for Spanish. It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
 Based around the siamese networks approach from [this paper](https://arxiv.org/pdf/1908.10084.pdf).
 <!--- Describe your model here -->
 print(embeddings)
 ```
+## Usage (HuggingFace Transformers)
+Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
+```python
+from transformers import AutoTokenizer, AutoModel
+import torch
+#Mean Pooling - Take attention mask into account for correct averaging
+def mean_pooling(model_output, attention_mask):
+    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
+    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
+    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
+# Sentences we want sentence embeddings for
+sentences = ['This is an example sentence', 'Each sentence is converted']
+# Load model from HuggingFace Hub
+tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
+model = AutoModel.from_pretrained('{MODEL_NAME}')
+# Tokenize sentences
+encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
+# Compute token embeddings
+with torch.no_grad():
+    model_output = model(**encoded_input)
+# Perform pooling. In this case, mean pooling.
+sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
+print("Sentence embeddings:")
+print(sentence_embeddings)
+```
 ## Evaluation Results
 <!--- Describe how your model was evaluated -->
 |                    | [BETO STS](https://huggingface.co/espejelomar/sentece-embeddings-BETO) | BERTIN STS (this model) | Relative improvement |
 |-------------------:|---------:|-----------:|---------------------:|
+|   cosine_pearson   | 0.609803 | 0.683188   | +12.03               |
+|   cosine_spearman  | 0.528776 | 0.615916   | +16.48               |
+|  euclidean_pearson | 0.590613 | 0.672601   | +13.88               |
+| euclidean_spearman | 0.526529 | 0.611539   | +16.15               |
+|  manhattan_pearson | 0.589108 | 0.672040   | +14.08               |
+| manhattan_spearman | 0.525910 | 0.610517   | +16.09               |
+|     dot_pearson    | 0.544078 | 0.600517   | +10.37               |
+|    dot_spearman    | 0.460427 | 0.521260   | +13.21               |
 ## Training
 **DataLoader**:
+`sentence_transformers.datasets.NoDuplicatesDataLoader.NoDuplicatesDataLoader`
+of length 1818 with parameters:
 ```
 {'batch_size': 64}
 ```
 Parameters of the fit()-Method:
 ```
 {
+    "epochs": 10,
     "evaluation_steps": 0,
     "evaluator": "sentence_transformers.evaluation.EmbeddingSimilarityEvaluator.EmbeddingSimilarityEvaluator",
     "max_grad_norm": 1,
     },
     "scheduler": "WarmupLinear",
     "steps_per_epoch": null,
+    "warmup_steps": 909,
     "weight_decay": 0.01
 }
 ```
 SentenceTransformer(
   (0): Transformer({'max_seq_length': 514, 'do_lower_case': False}) with Transformer model: RobertaModel
   (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
 )
 ```

eval/similarity_evaluation_sts-test_results.csv CHANGED Viewed

@@ -1,21 +1,11 @@
 epoch,steps,cosine_pearson,cosine_spearman,euclidean_pearson,euclidean_spearman,manhattan_pearson,manhattan_spearman,dot_pearson,dot_spearman
-0,-1,0.6619363474232212,0.5885900850993088,0.6601369870108922,0.5898887432719473,0.6588104710011553,0.5895314311753609,0.5830982449199276,0.4937547304641535
-1,-1,0.6693262164948957,0.5961588063998108,0.6655605051331788,0.6009401780214281,0.6654625082134044,0.60094695325875,0.6009226777841519,0.5170048024637112
-2,-1,0.6547420445754193,0.5823884531525437,0.6511385841619973,0.5855875470652291,0.6522056613011148,0.5865922991358415,0.5843887822582672,0.5008089314360363
-3,-1,0.6552498219421751,0.584669481102323,0.64666647770094,0.5882148051301598,0.6481504965342848,0.5881976468771515,0.5997006203644883,0.519217293851411
-4,-1,0.6528668557506568,0.582745952876213,0.6410731501019628,0.5856126736791128,0.6424055098070853,0.5859782098059078,0.5916670098025287,0.5158791756140418
-5,-1,0.6483945219301912,0.5788003785077136,0.6343322715136153,0.5786700439633394,0.6364103314423305,0.579525783424424,0.5912828063040729,0.5199335798952477
-6,-1,0.6423989989672444,0.5699500862683221,0.6313857693866886,0.5746688473110814,0.6328955724455424,0.5744953472610018,0.5803600675604295,0.5036125587291159
-7,-1,0.6462629722681043,0.5770818260673343,0.6318230435253588,0.5775325284896901,0.6325422525209058,0.5764058505549855,0.586762886345868,0.5173493168898005
-8,-1,0.639790660325868,0.5676685783645897,0.6294617784838941,0.5698867228853173,0.6299734551587954,0.5695742381451001,0.5880059591595673,0.5146391367378975
-9,-1,0.6450089783532716,0.5758663314471489,0.6333562814425514,0.5766438502962163,0.6340741475326621,0.575110984810785,0.5879731498917842,0.5192021383415104
-10,-1,0.6434909937737626,0.5713701447625351,0.6301188859529719,0.5709692410446885,0.6309719329436375,0.5701395230401529,0.593913567963774,0.522557073939444
-11,-1,0.641203878405462,0.5722014251907718,0.6284168875038928,0.5737909498411451,0.6295168797303964,0.5728132601653629,0.5893572348665002,0.5218607585112776
-12,-1,0.6405665479784053,0.5712144563426479,0.6258392075727873,0.5693129298830195,0.6262440363392721,0.5679223727890534,0.593952756495054,0.5268886237775188
-13,-1,0.6390052346365416,0.5686395678794071,0.6258537618625887,0.5685859625426081,0.6265438374367317,0.5677389726542497,0.591956305872708,0.5218520657539587
-14,-1,0.6401240726804178,0.5711650411421381,0.6278602450688386,0.5727693520022645,0.628050553113738,0.5709335183573409,0.5937276661244524,0.5234451981826964
-15,-1,0.6398403358896347,0.5692425497972115,0.6246306232307527,0.5691193313826032,0.6255512511477327,0.5683736149577787,0.5940274286246308,0.5237160798409092
-16,-1,0.640328214937794,0.5708227567207858,0.6255762617684392,0.5716483840159948,0.6265171469104598,0.569976860529018,0.5945084491171609,0.5247411860311914
-17,-1,0.6404406282410006,0.5712850352823815,0.6251102831494417,0.5715652062596898,0.6257154822798084,0.5695532590559501,0.5939059512747178,0.525733381896788
-18,-1,0.64141106615211,0.572578991980065,0.6261621835757434,0.5725016418579003,0.6268679024101312,0.5700271563683411,0.5965506884620715,0.5278869557071051
-19,-1,0.6406208759751268,0.5720221725807018,0.625890984121176,0.5726780638465656,0.6263716719579958,0.5694595420415887,0.5959571243848296,0.5275360779553968

 epoch,steps,cosine_pearson,cosine_spearman,euclidean_pearson,euclidean_spearman,manhattan_pearson,manhattan_spearman,dot_pearson,dot_spearman
+0,-1,0.6831884913062921,0.6159162222541099,0.6726005233636806,0.6115392058863335,0.6720401096771059,0.6105173097665644,0.6005167896896939,0.5212600492097655
+1,-1,0.6706171111332979,0.6008531510212776,0.6565912032452935,0.5949169636344843,0.6555142909342582,0.5935398433843475,0.5765151466955727,0.49637768476198035
+2,-1,0.6763825624896551,0.6087882606796842,0.6627392144068636,0.6053590389366899,0.6612759395162868,0.6030838801547247,0.5826990236692152,0.5088888493638298
+3,-1,0.66260616452593,0.5913823777186296,0.6469213245153994,0.5891702556310773,0.6449471942861446,0.5872578064093931,0.5818409585899842,0.5052892808258618
+4,-1,0.6566925461921814,0.5871384798501856,0.6379456634562074,0.5819500400390282,0.6356299181697714,0.5793092883148608,0.5725533633222645,0.5005210619710372
+5,-1,0.6560126958746472,0.584645192515697,0.6375859060277993,0.5799601798248812,0.6358427415811263,0.578232849404072,0.5777523875165609,0.5017760148916008
+6,-1,0.6503433461367746,0.578081436343585,0.6326739453456565,0.5758382504320848,0.6308846572628577,0.5745397200941126,0.571361965152683,0.49444579046714365
+7,-1,0.6511867735121081,0.5769374865250576,0.6323147897935092,0.5744373103224324,0.6309669803317294,0.573106665075477,0.57342064744336,0.4975609366385161
+8,-1,0.6506119610377241,0.5781030546060674,0.6326539782626099,0.5757848865607669,0.6310415147465013,0.5743098307522757,0.5723862516745356,0.49789660206491654
+9,-1,0.6488271901388144,0.5782767677139244,0.6287620409812228,0.5742694918130841,0.6272343282453402,0.5729337473833224,0.5685335534384852,0.4968351056062509

loss_digest.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

modules.json CHANGED Viewed

@@ -10,11 +10,5 @@
     "name": "1",
     "path": "1_Pooling",
     "type": "sentence_transformers.models.Pooling"
-  },
-  {
-    "idx": 2,
-    "name": "2",
-    "path": "2_Dense",
-    "type": "sentence_transformers.models.Dense"
   }
 ]

     "name": "1",
     "path": "1_Pooling",
     "type": "sentence_transformers.models.Pooling"
   }
 ]

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:aa94d27f78e0ea7d33cdcb67c9b9cf8959fa314dace803031713b5c976f761e2
 size 498664817

 version https://git-lfs.github.com/spec/v1
+oid sha256:eaff1c454271166e40db8096964f269f9b5de9fad5e056c455e5de9be3404ba9
 size 498664817