bonadossou
/

afrolm_active_learning

active learning

language modeling

research papers

natural language processing

self-active learning

Model card Files Files and versions Community

bonadossou commited on Oct 31, 2022

Commit

2d9a5d7

·

1 Parent(s): 27b518a

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -1,4 +1,5 @@
 # AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages
 This repository contains the model for our paper `AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages` which will appear at the Third Simple and Efficient Natural Language Processing, at EMNLP 2022.
@@ -9,7 +10,7 @@ This repository contains the model for our paper `AfroLM: A Self-Active Learning
 AfroLM has been pretrained from scratch on 23 African Languages: Amharic, Afan Oromo, Bambara, Ghomalá, Éwé, Fon, Hausa, Ìgbò, Kinyarwanda, Lingala, Luganda, Luo, Mooré, Chewa, Naija, Shona, Swahili, Setswana, Twi, Wolof, Xhosa, Yorùbá, and Zulu.
 ## Evaluation Results
-AfroLM was evaluated on MasakhaNER1.0 (10 African Languages) and MasakhaNER2.0 (21 African Languages) datasets; on text classification and sentiment analysis. AfroLM outperformed AfriBERTa, mBERT, and XLMR-base, and was very competitive with AfroXLMR. AfroLM is also very data efficient because it was pretrained on a dataset 14x+ smaller than its competitors' datasets. Below is the average of performance of various models, across various datasets. Please consult our paper for more language-level performance.
 Model | MasakhaNER | MasakhaNER2.0* | Text Classification (Yoruba/Hausa) | Sentiment Analysis (YOSM) | OOD Sentiment Analysis (Twitter -> YOSM) |
 |:---: |:---: |:---: | :---: |:---: | :---: |

 # AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages
+- [GitHub Repository of the Paper](https://github.com/bonaventuredossou/MLM_AL)
 This repository contains the model for our paper `AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages` which will appear at the Third Simple and Efficient Natural Language Processing, at EMNLP 2022.
 AfroLM has been pretrained from scratch on 23 African Languages: Amharic, Afan Oromo, Bambara, Ghomalá, Éwé, Fon, Hausa, Ìgbò, Kinyarwanda, Lingala, Luganda, Luo, Mooré, Chewa, Naija, Shona, Swahili, Setswana, Twi, Wolof, Xhosa, Yorùbá, and Zulu.
 ## Evaluation Results
+AfroLM was evaluated on MasakhaNER1.0 (10 African Languages) and MasakhaNER2.0 (21 African Languages) datasets; on text classification and sentiment analysis. AfroLM outperformed AfriBERTa, mBERT, and XLMR-base, and was very competitive with AfroXLMR. AfroLM is also very data efficient because it was pretrained on a dataset 14x+ smaller than its competitors' datasets. Below is the average performance of various models, across various datasets. Please consult our paper for more language-level performance.
 Model | MasakhaNER | MasakhaNER2.0* | Text Classification (Yoruba/Hausa) | Sentiment Analysis (YOSM) | OOD Sentiment Analysis (Twitter -> YOSM) |
 |:---: |:---: |:---: | :---: |:---: | :---: |