File size: 895 Bytes
a56ddd6
 
36ca584
 
 
cf3885f
a3238e5
20f5b49
a56ddd6
36ca584
 
 
f0b4588
36ca584
 
 
 
 
 
 
f0b4588
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
---
license: apache-2.0
language:
- en
library_name: sentence-transformers
pipeline_tag: sentence-similarity
widget:
- text: How are you
---

# Dataset Collection:
* The English-French Translation Dataset is collected from Kaggle.[Dataset](https://www.kaggle.com/datasets/dhruvildave/en-fr-translation-dataset).

About Dataset:
French/English parallel texts for training translation models.
Over 22.5 million sentences in French and English.Dataset created
by Chris Callison-Burch, who crawled millions of web pages and
then used a set of simple heuristics to transform French URLs onto English URLs,
 and assumed that these documents are translations of each other.
This is the main dataset of Workshop on Statistical Machine Translation (WML) 2015 Dataset
that can be used for Machine Translation and Language Models. 

Refer to the paper here:[PDF](https://www.statmt.org/wmt15/pdf/WMT01.pdf)