somosnlp-hackathon-2022
/

bertin-roberta-base-finetuning-esnli

Sentence Similarity

sentence-transformers

feature-extraction

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

mmazuecos commited on Apr 3, 2022

Commit

fe235fe

·

1 Parent(s): b0a247d

Update README.md

Files changed (1) hide show

README.md +13 -1

README.md CHANGED Viewed

@@ -110,7 +110,19 @@ We used a collection of datasets of Natural Language Inference as training data:
  - [SNLI](https://nlp.stanford.edu/projects/snli/), automatically translated
  - [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/), automatically translated
-The whole dataset used is available [here](https://huggingface.co/datasets/hackathon-pln-es/ESnli).
 **DataLoader**:

  - [SNLI](https://nlp.stanford.edu/projects/snli/), automatically translated
  - [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/), automatically translated
+The whole dataset used is available [here](https://huggingface.co/datasets/hackathon-pln-es/nli-es).
+Here we leave the trick we used to increase the amount of data for training here:
+```
+  for row in reader:
+    if row['language'] == 'es':
+      sent1 = row['sentence1'].strip()
+      sent2 = row['sentence2'].strip()
+      add_to_samples(sent1, sent2, row['gold_label'])
+      add_to_samples(sent2, sent1, row['gold_label'])  #Also add the opposite
+```
 **DataLoader**: