MKaan
/

multilingual-cpv-sector-classifier

Text Classification

public procurement

Inference Endpoints

Model card Files Files and versions Community

MKaan commited on Nov 28, 2021

Commit

68a94aa

·

1 Parent(s): 31e3bfe

Update README.md

Files changed (1) hide show

README.md +7 -8

README.md CHANGED Viewed

@@ -67,14 +67,14 @@ The model takes procurement descriptions written in any of [104 languages](https
  | Transport services (excl. Waste transport). 💺
 ## Intended uses & limitations
-Input description should be written in any of [the 104 languages](https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages) that MBERT supports.
-The model is just evaluated in 22 languages. Thus there is no information about the performances in other languages.
-The domain is also restricted by the awarded procurement notice descriptions in European Union. Evaluating on whole document texts might change the performance.
 ## Training and evaluation data
-The whole data consists of 744,360 rows. Shuffled and split into train and validation sets by using 80%/20% manner.
-Each description represents a unique contract notice description awarded between 2011 and 2018.
-Both training and validation data have contract notice descriptions written in 22 European Languages. (Malta and Irish are extracted due to scarcity compared to whole data)
 ## Training procedure
 The training procedure has been completed on Google Cloud V3-8 TPUs. Thanks [Google](https://sites.research.google/trc/about/) for giving the access to Cloud TPUs
@@ -117,5 +117,4 @@ The following hyperparameters were used during training:
 | SV| 0.607| 3326|
 | DA| 0.603| 1925|
 | FR| 0.601| 33113|
-| ET| 0.572| 458||

  | Transport services (excl. Waste transport). 💺
 ## Intended uses & limitations
+- Input description should be written in any of [the 104 languages](https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages) that MBERT supports.
+- The model is just evaluated in 22 languages. Thus there is no information about the performances in other languages.
+- The domain is also restricted by the awarded procurement notice descriptions in European Union. Evaluating on whole document texts might change the performance.
 ## Training and evaluation data
+- The whole data consists of 744,360 rows. Shuffled and split into train and validation sets by using 80%/20% manner.
+- Each description represents a unique contract notice description awarded between 2011 and 2018.
+- Both training and validation data have contract notice descriptions written in 22 European Languages. (Malta and Irish are extracted due to scarcity compared to whole data)
 ## Training procedure
 The training procedure has been completed on Google Cloud V3-8 TPUs. Thanks [Google](https://sites.research.google/trc/about/) for giving the access to Cloud TPUs
 | SV| 0.607| 3326|
 | DA| 0.603| 1925|
 | FR| 0.601| 33113|
+| ET| 0.572| 458||