Update README.md
Browse files
README.md
CHANGED
@@ -67,14 +67,14 @@ The model takes procurement descriptions written in any of [104 languages](https
|
|
67 |
| Transport services (excl. Waste transport). 💺
|
68 |
|
69 |
## Intended uses & limitations
|
70 |
-
Input description should be written in any of [the 104 languages](https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages) that MBERT supports.
|
71 |
-
The model is just evaluated in 22 languages. Thus there is no information about the performances in other languages.
|
72 |
-
The domain is also restricted by the awarded procurement notice descriptions in European Union. Evaluating on whole document texts might change the performance.
|
73 |
|
74 |
## Training and evaluation data
|
75 |
-
The whole data consists of 744,360 rows. Shuffled and split into train and validation sets by using 80%/20% manner.
|
76 |
-
Each description represents a unique contract notice description awarded between 2011 and 2018.
|
77 |
-
Both training and validation data have contract notice descriptions written in 22 European Languages. (Malta and Irish are extracted due to scarcity compared to whole data)
|
78 |
|
79 |
## Training procedure
|
80 |
The training procedure has been completed on Google Cloud V3-8 TPUs. Thanks [Google](https://sites.research.google/trc/about/) for giving the access to Cloud TPUs
|
@@ -117,5 +117,4 @@ The following hyperparameters were used during training:
|
|
117 |
| SV| 0.607| 3326|
|
118 |
| DA| 0.603| 1925|
|
119 |
| FR| 0.601| 33113|
|
120 |
-
| ET| 0.572| 458||
|
121 |
-
|
|
|
67 |
| Transport services (excl. Waste transport). 💺
|
68 |
|
69 |
## Intended uses & limitations
|
70 |
+
- Input description should be written in any of [the 104 languages](https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages) that MBERT supports.
|
71 |
+
- The model is just evaluated in 22 languages. Thus there is no information about the performances in other languages.
|
72 |
+
- The domain is also restricted by the awarded procurement notice descriptions in European Union. Evaluating on whole document texts might change the performance.
|
73 |
|
74 |
## Training and evaluation data
|
75 |
+
- The whole data consists of 744,360 rows. Shuffled and split into train and validation sets by using 80%/20% manner.
|
76 |
+
- Each description represents a unique contract notice description awarded between 2011 and 2018.
|
77 |
+
- Both training and validation data have contract notice descriptions written in 22 European Languages. (Malta and Irish are extracted due to scarcity compared to whole data)
|
78 |
|
79 |
## Training procedure
|
80 |
The training procedure has been completed on Google Cloud V3-8 TPUs. Thanks [Google](https://sites.research.google/trc/about/) for giving the access to Cloud TPUs
|
|
|
117 |
| SV| 0.607| 3326|
|
118 |
| DA| 0.603| 1925|
|
119 |
| FR| 0.601| 33113|
|
120 |
+
| ET| 0.572| 458||
|
|