Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,18 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
library_name: sentence-transformers
|
6 |
+
pipeline_tag: translation
|
7 |
---
|
8 |
+
|
9 |
+
# Dataset Collection:
|
10 |
+
* The English-French Translation Dataset is collected from Kaggle.[Dataset](https://www.kaggle.com/datasets/dhruvildave/en-fr-translation-dataset).
|
11 |
+
About Dataset:
|
12 |
+
French/English parallel texts for training translation models.
|
13 |
+
Over 22.5 million sentences in French and English.Dataset created
|
14 |
+
by Chris Callison-Burch, who crawled millions of web pages and
|
15 |
+
then used a set of simple heuristics to transform French URLs onto English URLs,
|
16 |
+
and assumed that these documents are translations of each other.
|
17 |
+
This is the main dataset of Workshop on Statistical Machine Translation (WML) 2015 Dataset
|
18 |
+
that can be used for Machine Translation and Language Models. Refer to the paper here:[PDF](https://www.statmt.org/wmt15/pdf/WMT01.pdf)
|