Commit
·
2d9a5d7
1
Parent(s):
27b518a
Update README.md
Browse files
README.md
CHANGED
@@ -1,4 +1,5 @@
|
|
1 |
# AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages
|
|
|
2 |
|
3 |
This repository contains the model for our paper `AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages` which will appear at the Third Simple and Efficient Natural Language Processing, at EMNLP 2022.
|
4 |
|
@@ -9,7 +10,7 @@ This repository contains the model for our paper `AfroLM: A Self-Active Learning
|
|
9 |
AfroLM has been pretrained from scratch on 23 African Languages: Amharic, Afan Oromo, Bambara, Ghomalá, Éwé, Fon, Hausa, Ìgbò, Kinyarwanda, Lingala, Luganda, Luo, Mooré, Chewa, Naija, Shona, Swahili, Setswana, Twi, Wolof, Xhosa, Yorùbá, and Zulu.
|
10 |
|
11 |
## Evaluation Results
|
12 |
-
AfroLM was evaluated on MasakhaNER1.0 (10 African Languages) and MasakhaNER2.0 (21 African Languages) datasets; on text classification and sentiment analysis. AfroLM outperformed AfriBERTa, mBERT, and XLMR-base, and was very competitive with AfroXLMR. AfroLM is also very data efficient because it was pretrained on a dataset 14x+ smaller than its competitors' datasets. Below is the average
|
13 |
|
14 |
Model | MasakhaNER | MasakhaNER2.0* | Text Classification (Yoruba/Hausa) | Sentiment Analysis (YOSM) | OOD Sentiment Analysis (Twitter -> YOSM) |
|
15 |
|:---: |:---: |:---: | :---: |:---: | :---: |
|
|
|
1 |
# AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages
|
2 |
+
- [GitHub Repository of the Paper](https://github.com/bonaventuredossou/MLM_AL)
|
3 |
|
4 |
This repository contains the model for our paper `AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages` which will appear at the Third Simple and Efficient Natural Language Processing, at EMNLP 2022.
|
5 |
|
|
|
10 |
AfroLM has been pretrained from scratch on 23 African Languages: Amharic, Afan Oromo, Bambara, Ghomalá, Éwé, Fon, Hausa, Ìgbò, Kinyarwanda, Lingala, Luganda, Luo, Mooré, Chewa, Naija, Shona, Swahili, Setswana, Twi, Wolof, Xhosa, Yorùbá, and Zulu.
|
11 |
|
12 |
## Evaluation Results
|
13 |
+
AfroLM was evaluated on MasakhaNER1.0 (10 African Languages) and MasakhaNER2.0 (21 African Languages) datasets; on text classification and sentiment analysis. AfroLM outperformed AfriBERTa, mBERT, and XLMR-base, and was very competitive with AfroXLMR. AfroLM is also very data efficient because it was pretrained on a dataset 14x+ smaller than its competitors' datasets. Below is the average performance of various models, across various datasets. Please consult our paper for more language-level performance.
|
14 |
|
15 |
Model | MasakhaNER | MasakhaNER2.0* | Text Classification (Yoruba/Hausa) | Sentiment Analysis (YOSM) | OOD Sentiment Analysis (Twitter -> YOSM) |
|
16 |
|:---: |:---: |:---: | :---: |:---: | :---: |
|