MaCoCu
/

XLMR-MaCoCu-tr

@@ -30,17 +30,17 @@ For training, we used all Turkish data that was present in the monolingual Turki
 # Benchmark performance
-We tested the performance of **XLMR-MaCoCu-tr** on benchmarks of XPOS, UPOS and NER from the [Universal Dependencies](https://universaldependencies.org/) project. We also tested on a Google translated version of the COPA data set (for details see our [Github repo](https://github.com/RikVN/COPA)). We compare performance to the strong multi-lingual models XLMR-base and XLMR-large, but also to the monolingual [BERTurk](https://huggingface.co/dbmdz/bert-base-turkish-cased) model. For details regarding the fine-tuning procedure you can checkout our [Github](https://github.com/macocu/LanguageModels).
 Scores are averages of three runs, except for COPA, for which we use 10 runs. We use the same hyperparameter settings for all models for POS/NER, for COPA we optimized each learning rate on the dev set.
-|                    | **UPOS** | **UPOS** | **XPOS** | **XPOS** | **NER** | **NER**  | **COPA** |
-|--------------------|:--------:|:--------:|:--------:|:--------:|---------|----------| ----------|
-|                    |  **Dev** | **Test** |  **Dev** | **Test** | **Dev** | **Test** |  **Test** |
-| **XLM-R-base**     |   89.0   |   89.0   |   90.4   |   90.6   |   92.8  |   92.6   | 56.0 |
-| **XLM-R-large**    |   89.4   |   89.3   |   90.8   |   90.7   |   94.1  |   94.1   | 52.1 |
-| **BERTurk**        |   88.2   |   88.4   |   89.7   |   89.6   |   92.6  |   92.6   | 57.0 |
-| **XLMR-MaCoCu-tr** |   89.1   |   89.4   |   90.7   |   90.5   |   94.4  |   94.4   | 60.7 |
 # Acknowledgements

 # Benchmark performance
+We tested the performance of **XLMR-MaCoCu-tr** on benchmarks of XPOS, UPOS and NER from the [Universal Dependencies](https://universaldependencies.org/) project. For COPA, we train on a machine translated (MT) set of the data (for details see our [Github repo](https://github.com/RikVN/COPA)), and evaluate on a similar MT set, but also on the human-translated (HT) test set from the [XCOPA](https://github.com/cambridgeltl/xcopa) project. We compare performance to the strong multi-lingual models XLMR-base and XLMR-large, but also to the monolingual [BERTurk](https://huggingface.co/dbmdz/bert-base-turkish-cased) model. For details regarding the fine-tuning procedure you can checkout our [Github](https://github.com/macocu/LanguageModels).
 Scores are averages of three runs, except for COPA, for which we use 10 runs. We use the same hyperparameter settings for all models for POS/NER, for COPA we optimized each learning rate on the dev set.
+|                    | **UPOS** | **UPOS** | **XPOS** | **XPOS** | **NER** | **NER**  | **COPA** | **COPA** |
+|--------------------|:--------:|:--------:|:--------:|:--------:|---------|----------| ----------| ----------|
+|                    |  **Dev** | **Test** |  **Dev** | **Test** | **Dev** | **Test** |  **Test (MT)** | **Test (HT)** |
+| **XLM-R-base**     |   89.0   |   89.0   |   90.4   |   90.6   |   92.8  |   92.6   | 56.0 |   56.4  |
+| **XLM-R-large**    |   89.4   |   89.3   |   90.8   |   90.7   |   94.1  |   94.1   | 52.1 |   53.2  |
+| **BERTurk**        |   88.2   |   88.4   |   89.7   |   89.6   |   92.6  |   92.6   | 57.0 |     |
+| **XLMR-MaCoCu-tr** |   89.1   |   89.4   |   90.7   |   90.5   |   94.4  |   94.4   | 60.7 |     |
 # Acknowledgements