Update README.md
Browse files
README.md
CHANGED
@@ -20,7 +20,7 @@ Developed to be language-agnostic, this model supports both French and English.
|
|
20 |
influenced by its behavior in a monolingual context (English or French).
|
21 |
|
22 |
## Dataset
|
23 |
-
The training dataset is composed of the [mMARCO
|
24 |
Additionally, we have included [SQuAD](https://huggingface.co/datasets/rajpurkar/squad) data from the "train" split, forming query/positive/hard negative triplets. In
|
25 |
order to generate hard negative data for SQuAD, we considered contexts from the same theme as the query but from a different set of queries. Hence, the negative
|
26 |
observations belong to the same themes as the queries but presumably do not contain the answer to the question.
|
|
|
20 |
influenced by its behavior in a monolingual context (English or French).
|
21 |
|
22 |
## Dataset
|
23 |
+
The training dataset is composed of the [mMARCO](https://huggingface.co/datasets/unicamp-dl/mmarco) dataset, consisting of query/positive/hard negative triplets.
|
24 |
Additionally, we have included [SQuAD](https://huggingface.co/datasets/rajpurkar/squad) data from the "train" split, forming query/positive/hard negative triplets. In
|
25 |
order to generate hard negative data for SQuAD, we considered contexts from the same theme as the query but from a different set of queries. Hence, the negative
|
26 |
observations belong to the same themes as the queries but presumably do not contain the answer to the question.
|