Update README.md
Browse files
README.md
CHANGED
@@ -3,13 +3,14 @@ license: apache-2.0
|
|
3 |
language:
|
4 |
- en
|
5 |
library_name: sentence-transformers
|
6 |
-
pipeline_tag:
|
7 |
widget:
|
8 |
- text: How are you
|
9 |
---
|
10 |
|
11 |
# Dataset Collection:
|
12 |
* The English-French Translation Dataset is collected from Kaggle.[Dataset](https://www.kaggle.com/datasets/dhruvildave/en-fr-translation-dataset).
|
|
|
13 |
About Dataset:
|
14 |
French/English parallel texts for training translation models.
|
15 |
Over 22.5 million sentences in French and English.Dataset created
|
@@ -17,4 +18,6 @@ by Chris Callison-Burch, who crawled millions of web pages and
|
|
17 |
then used a set of simple heuristics to transform French URLs onto English URLs,
|
18 |
and assumed that these documents are translations of each other.
|
19 |
This is the main dataset of Workshop on Statistical Machine Translation (WML) 2015 Dataset
|
20 |
-
that can be used for Machine Translation and Language Models.
|
|
|
|
|
|
3 |
language:
|
4 |
- en
|
5 |
library_name: sentence-transformers
|
6 |
+
pipeline_tag: translation
|
7 |
widget:
|
8 |
- text: How are you
|
9 |
---
|
10 |
|
11 |
# Dataset Collection:
|
12 |
* The English-French Translation Dataset is collected from Kaggle.[Dataset](https://www.kaggle.com/datasets/dhruvildave/en-fr-translation-dataset).
|
13 |
+
|
14 |
About Dataset:
|
15 |
French/English parallel texts for training translation models.
|
16 |
Over 22.5 million sentences in French and English.Dataset created
|
|
|
18 |
then used a set of simple heuristics to transform French URLs onto English URLs,
|
19 |
and assumed that these documents are translations of each other.
|
20 |
This is the main dataset of Workshop on Statistical Machine Translation (WML) 2015 Dataset
|
21 |
+
that can be used for Machine Translation and Language Models.
|
22 |
+
|
23 |
+
Refer to the paper here:[PDF](https://www.statmt.org/wmt15/pdf/WMT01.pdf)
|