QCRI
/

AliShahroor commited on
Commit
c244610
Β·
verified Β·
1 Parent(s): d9ff944

update table

Browse files
Files changed (1) hide show
  1. README.md +61 -59
README.md CHANGED
@@ -21,7 +21,7 @@ tags:
21
  # LlamaLens: Specialized Multilingual LLM forAnalyzing News and Social Media Content
22
 
23
  ## Overview
24
- LlamaLens is a specialized multilingual LLM designed for analyzing news and social media content. It focuses on 19 NLP tasks, leveraging 52 datasets across Arabic, English, and Hindi.
25
 
26
  <p align="center">
27
  <picture>
@@ -77,80 +77,82 @@ print(generated_text)
77
 
78
  ## Results
79
 
80
- Below, we present the performance of **LlamaLens** compared to existing SOTA (if available) and the Llama-Instruct baseline, The β€œΞ”β€ (Delta) column here is
81
  calculated as **(LLamalens – SOTA)**.
82
 
83
  ---
84
 
85
  ## Arabic
86
 
87
- | **Task** | **Dataset** | **Metric** | **SOTA** | **Llama-instruct** | **LLamalens** | **Ξ”** (LLamalens - SOTA) |
88
- |------------------------|---------------------------|-----------:|--------:|--------------------:|--------------:|------------------------------:|
89
- | News Summarization | xlsum | R-2 | 0.137 | 0.034 | 0.075 | -0.062 |
90
- | News Genre | ASND | Ma-F1 | 0.770 | 0.587 | 0.938 | 0.168 |
91
- | News Genre | SANADAkhbarona | Acc | 0.940 | 0.784 | 0.922 | -0.018 |
92
- | News Genre | SANADAlArabiya | Acc | 0.974 | 0.893 | 0.986 | 0.012 |
93
- | News Genre | SANADAlkhaleej | Acc | 0.986 | 0.865 | 0.967 | -0.019 |
94
- | News Genre | UltimateDataset | Ma-F1 | 0.970 | 0.376 | 0.883 | -0.087 |
95
- | News Credibility | NewsCredibility | Acc | 0.899 | 0.455 | 0.494 | -0.405 |
96
- | Emotion | Emotional-Tone | W-F1 | 0.658 | 0.358 | 0.748 | 0.090 |
97
- | Emotion | NewsHeadline | Acc | 1.000 | 0.406 | 0.551 | -0.449 |
98
- | Sarcasm | ArSarcasm-v2 | F1_Pos | 0.584 | 0.477 | 0.307 | -0.277 |
99
- | Sentiment | ar_reviews_100k | F1_Pos | – | 0.343 | 0.665 | – |
100
- | Sentiment | ArSAS | Acc | 0.920 | 0.603 | 0.795 | -0.125 |
101
- | Stance | stance | Ma-F1 | 0.767 | 0.608 | 0.936 | 0.169 |
102
- | Stance | Mawqif-Arabic-Stance | Ma-F1 | 0.789 | 0.764 | 0.867 | 0.078 |
103
- | Att.worthiness | CT22Attentionworthy | W-F1 | 0.412 | 0.158 | 0.544 | 0.132 |
104
- | Checkworthiness | CT24_T1 | F1_Pos | 0.569 | 0.404 | 0.877 | 0.308 |
105
- | Claim | CT22Claim | Acc | 0.703 | 0.581 | 0.778 | 0.075 |
106
- | Factuality | Arafacts | Mi-F1 | 0.850 | 0.210 | 0.534 | -0.316 |
107
- | Factuality | COVID19Factuality | W-F1 | 0.831 | 0.492 | 0.781 | -0.050 |
108
- | Propaganda | ArPro | Mi-F1 | 0.767 | 0.597 | 0.762 | -0.005 |
109
- | Cyberbullying | ArCyc_CB | Acc | 0.863 | 0.766 | 0.753 | -0.110 |
110
- | Harmfulness | CT22Harmful | F1_Pos | 0.557 | 0.507 | 0.508 | -0.049 |
111
- | Hate Speech | annotated-hatetweets-4 | W-F1 | 0.630 | 0.257 | 0.549 | -0.081 |
112
- | Hate Speech | OSACT4SubtaskB | Mi-F1 | 0.950 | 0.819 | 0.802 | -0.148 |
113
- | Offensive | ArCyc_OFF | Ma-F1 | 0.878 | 0.489 | 0.652 | -0.226 |
114
- | Offensive | OSACT4SubtaskA | Ma-F1 | 0.905 | 0.782 | 0.899 | -0.006 |
 
115
 
116
  ---
117
 
118
  ## English
119
 
120
- | **Task** | **Dataset** | **Metric** | **SOTA** | **Llama-instruct** | **LLamalens** | **Ξ”** (LLamalens - SOTA) |
121
- |----------------------|---------------------------|-----------:|--------:|--------------------:|--------------:|------------------------------:|
122
- | News Summarization | xlsum | R-2 | 0.152 | 0.074 | 0.141 | -0.011 |
123
- | News Genre | CNN_News_Articles | Acc | 0.940 | 0.644 | 0.915 | -0.025 |
124
- | News Genre | News_Category | Ma-F1 | 0.769 | 0.970 | 0.505 | -0.264 |
125
- | News Genre | SemEval23T3-ST1 | Mi-F1 | 0.815 | 0.687 | 0.241 | -0.574 |
126
- | Subjectivity | CT24_T2 | Ma-F1 | 0.744 | 0.535 | 0.508 | -0.236 |
127
- | Emotion | emotion | Ma-F1 | 0.790 | 0.353 | 0.878 | 0.088 |
128
- | Sarcasm | News-Headlines | Acc | 0.897 | 0.668 | 0.956 | 0.059 |
129
- | Sentiment | NewsMTSC | Ma-F1 | 0.817 | 0.628 | 0.627 | -0.190 |
130
- | Checkworthiness | CT24_T1 | F1_Pos | 0.753 | 0.404 | 0.877 | 0.124 |
131
- | Claim | claim-detection | Mi-F1 | – | 0.545 | 0.915 | – |
132
- | Factuality | News_dataset | Acc | 0.920 | 0.654 | 0.946 | 0.026 |
133
- | Factuality | Politifact | W-F1 | 0.490 | 0.121 | 0.290 | -0.200 |
134
- | Propaganda | QProp | Ma-F1 | 0.667 | 0.759 | 0.851 | 0.184 |
135
- | Cyberbullying | Cyberbullying | Acc | 0.907 | 0.175 | 0.847 | -0.060 |
136
- | Offensive | Offensive_Hateful | Mi-F1 | – | 0.692 | 0.805 | – |
137
- | Offensive | offensive_language | Mi-F1 | 0.994 | 0.646 | 0.884 | -0.110 |
138
- | Offensive & Hate | hate-offensive-speech | Acc | 0.945 | 0.602 | 0.924 | -0.021 |
 
139
 
140
  ---
141
 
142
  ## Hindi
143
 
144
- | **Task** | **Dataset** | **Metric** | **SOTA** | **Llama-instruct** | **LLamalens** | **Ξ”** (LLamalens - SOTA) |
145
- |------------------------|------------------------|-----------:|--------:|--------------------:|--------------:|------------------------------:|
146
- | NLI | NLI_dataset | W-F1 | 0.646 | 0.633 | 0.655 | 0.009 |
147
- | News Summarization | xlsum | R-2 | 0.136 | 0.078 | 0.117 | -0.019 |
148
- | Sentiment | Sentiment Analysis | Acc | 0.697 | 0.552 | 0.669 | -0.028 |
149
- | Factuality | fake-news | Mi-F1 | – | 0.759 | 0.713 | – |
150
- | Hate Speech | hate-speech-detection | Mi-F1 | 0.639 | 0.750 | 0.994 | 0.355 |
151
- | Hate Speech | Hindi-Hostility | W-F1 | 0.841 | 0.469 | 0.720 | -0.121 |
152
- | Offensive | Offensive Speech | Mi-F1 | 0.723 | 0.621 | 0.847 | 0.124 |
153
- | Cyberbullying | MC_Hinglish1 | Acc | 0.609 | 0.233 | 0.587 | -0.022 |
154
 
155
  ## Paper
156
  For an in-depth understanding, refer to our paper: [**LlamaLens: Specialized Multilingual LLM for Analyzing News and Social Media Content**](https://arxiv.org/pdf/2410.15308).
 
21
  # LlamaLens: Specialized Multilingual LLM forAnalyzing News and Social Media Content
22
 
23
  ## Overview
24
+ LlamaLens is a specialized multilingual LLM designed for analyzing news and social media content. It focuses on 18 NLP tasks, leveraging 52 datasets across Arabic, English, and Hindi.
25
 
26
  <p align="center">
27
  <picture>
 
77
 
78
  ## Results
79
 
80
+ Below, we present the performance of **LlamaLens** , where *"English"* refers to the English-instructed model and *"Native"* refers to the model trained with native language instructions compared to existing SOTA (if available) and the Llama-Instruct baseline, The β€œΞ”β€ (Delta) column here is
81
  calculated as **(LLamalens – SOTA)**.
82
 
83
  ---
84
 
85
  ## Arabic
86
 
87
+ | **Task** | **Dataset** | **Metric** | **SOTA** | **Llama3.1-instruct** | **Llamalens-English** | **Llamalens-Native** | **Ξ” (LLamalens - SOTA)** |
88
+ |:----------------------------------:|:--------------------------------------------:|:----------:|:--------:|:---------------------:|:---------------------:|:--------------------:|:------------------------:|
89
+ | Attentionworthiness Detection | CT22Attentionworthy | W-F1 | 0.412 | 0.158 | 0.425 | 0.454 | 0.013 |
90
+ | Checkworthiness Detection | CT24_checkworthy | F1_Pos | 0.569 | 0.610 | 0.502 | 0.509 | -0.067 |
91
+ | Claim Detection | CT22Claim | Acc | 0.703 | 0.581 | 0.734 | 0.756 | 0.031 |
92
+ | Cyberbullying Detection | ArCyc_CB | Acc | 0.863 | 0.766 | 0.870 | 0.833 | 0.007 |
93
+ | Emotion Detection | Emotional-Tone | W-F1 | 0.658 | 0.358 | 0.705 | 0.736 | 0.047 |
94
+ | Emotion Detection | NewsHeadline | Acc | 1.000 | 0.406 | 0.480 | 0.458 | -0.520 |
95
+ | Factuality | Arafacts | Mi-F1 | 0.850 | 0.210 | 0.771 | 0.738 | -0.079 |
96
+ | Factuality | COVID19Factuality | W-F1 | 0.831 | 0.492 | 0.800 | 0.840 | -0.031 |
97
+ | Harmfulness Detection | CT22Harmful | F1_Pos | 0.557 | 0.507 | 0.523 | 0.535 | -0.034 |
98
+ | Hate Speech Detection | annotated-hatetweets-4-classes | W-F1 | 0.630 | 0.257 | 0.526 | 0.517 | -0.104 |
99
+ | Hate Speech Detection | OSACT4SubtaskB | Mi-F1 | 0.950 | 0.819 | 0.955 | 0.955 | 0.005 |
100
+ | News Categorization | ASND | Ma-F1 | 0.770 | 0.587 | 0.919 | 0.929 | 0.149 |
101
+ | News Categorization | SANADAkhbarona-news-categorization | Acc | 0.940 | 0.784 | 0.954 | 0.953 | 0.014 |
102
+ | News Categorization | SANADAlArabiya-news-categorization | Acc | 0.974 | 0.893 | 0.987 | 0.985 | 0.013 |
103
+ | News Categorization | SANADAlkhaleej-news-categorization | Acc | 0.986 | 0.865 | 0.984 | 0.982 | -0.002 |
104
+ | News Categorization | UltimateDataset | Ma-F1 | 0.970 | 0.376 | 0.865 | 0.880 | -0.105 |
105
+ | News Credibility | NewsCredibilityDataset | Acc | 0.899 | 0.455 | 0.935 | 0.933 | 0.036 |
106
+ | News Summarization | xlsum | R-2 | 0.137 | 0.034 | 0.129 | 0.130 | -0.009 |
107
+ | Offensive Language Detection | ArCyc_OFF | Ma-F1 | 0.878 | 0.489 | 0.877 | 0.879 | -0.001 |
108
+ | Offensive Language Detection | OSACT4SubtaskA | Ma-F1 | 0.905 | 0.782 | 0.896 | 0.882 | -0.009 |
109
+ | Propaganda Detection | ArPro | Mi-F1 | 0.767 | 0.597 | 0.747 | 0.731 | -0.020 |
110
+ | Sarcasm Detection | ArSarcasm-v2 | F1_Pos | 0.584 | 0.477 | 0.520 | 0.542 | -0.064 |
111
+ | Sentiment Classification | ar_reviews_100k | F1_Pos | -- | 0.681 | 0.785 | 0.779 | -- |
112
+ | Sentiment Classification | ArSAS | Acc | 0.920 | 0.603 | 0.800 | 0.804 | -0.120 |
113
+ | Stance Detection | stance | Ma-F1 | 0.767 | 0.608 | 0.926 | 0.881 | 0.159 |
114
+ | Stance Detection | Mawqif-Arabic-Stance-main | Ma-F1 | 0.789 | 0.764 | 0.853 | 0.826 | 0.065 |
115
+ | Subjectivity Detection | ThatiAR | f1_pos | 0.800 | 0.562 | 0.441 | 0.383 | -0.359 |
116
 
117
  ---
118
 
119
  ## English
120
 
121
+ | **Task** | **Dataset** | **Metric** | **SOTA** | **Llama3.1-instruct** | **Llamalens-English** | **Llamalens-Native** | **Ξ” (LLamalens - SOTA)** |
122
+ |:----------------------------------:|:--------------------------------------------:|:----------:|:--------:|:---------------------:|:---------------------:|:--------------------:|:------------------------:|
123
+ | Checkworthiness Detection | CT24_checkworthy | f1_pos | 0.753 | 0.404 | 0.942 | 0.942 | 0.189 |
124
+ | Claim Detection | claim-detection | Mi-F1 | -- | 0.545 | 0.864 | 0.889 | -- |
125
+ | Cyberbullying Detection | Cyberbullying | Acc | 0.907 | 0.175 | 0.836 | 0.855 | -0.071 |
126
+ | Emotion Detection | emotion | Ma-F1 | 0.790 | 0.353 | 0.803 | 0.808 | 0.013 |
127
+ | Factuality | News_dataset | Acc | 0.920 | 0.654 | 1.000 | 1.000 | 0.080 |
128
+ | Factuality | Politifact | W-F1 | 0.490 | 0.121 | 0.287 | 0.311 | -0.203 |
129
+ | News Categorization | CNN_News_Articles_2011-2022 | Acc | 0.940 | 0.644 | 0.970 | 0.970 | 0.030 |
130
+ | News Categorization | News_Category_Dataset | Ma-F1 | 0.769 | 0.970 | 0.824 | 0.520 | 0.055 |
131
+ | News Genre Categorisation | SemEval23T3-subtask1 | Mi-F1 | 0.815 | 0.687 | 0.241 | 0.253 | -0.574 |
132
+ | News Summarization | xlsum | R-2 | 0.152 | 0.074 | 0.182 | 0.181 | 0.030 |
133
+ | Offensive Language Detection | Offensive_Hateful_Dataset_New | Mi-F1 | -- | 0.692 | 0.814 | 0.813 | -- |
134
+ | Offensive Language Detection | offensive_language_dataset | Mi-F1 | 0.994 | 0.646 | 0.899 | 0.893 | -0.095 |
135
+ | Offensive Language and Hate Speech | hate-offensive-speech | Acc | 0.945 | 0.602 | 0.931 | 0.935 | -0.014 |
136
+ | Propaganda Detection | QProp | Ma-F1 | 0.667 | 0.759 | 0.963 | 0.973 | 0.296 |
137
+ | Sarcasm Detection | News-Headlines-Dataset-For-Sarcasm-Detection | Acc | 0.897 | 0.668 | 0.936 | 0.947 | 0.039 |
138
+ | Sentiment Classification | NewsMTSC-dataset | Ma-F1 | 0.817 | 0.628 | 0.751 | 0.748 | -0.066 |
139
+ | Subjectivity Detection | clef2024-checkthat-lab | Ma-F1 | 0.744 | 0.535 | 0.642 | 0.628 | -0.102 |
140
+ |
141
 
142
  ---
143
 
144
  ## Hindi
145
 
146
+ | **Task** | **Dataset** | **Metric** | **SOTA** | **Llama3.1-instruct** | **Llamalens-English** | **Llamalens-Native** | **Ξ” (LLamalens - SOTA)** |
147
+ |:----------------------------------:|:--------------------------------------------:|:----------:|:--------:|:---------------------:|:---------------------:|:--------------------:|:------------------------:|
148
+ | Factuality | fake-news | Mi-F1 | -- | 0.759 | 0.994 | 0.993 | -- |
149
+ | Hate Speech Detection | hate-speech-detection | Mi-F1 | 0.639 | 0.750 | 0.963 | 0.963 | 0.324 |
150
+ | Hate Speech Detection | Hindi-Hostility-Detection-CONSTRAINT-2021 | W-F1 | 0.841 | 0.469 | 0.753 | 0.753 | -0.088 |
151
+ | Natural Language Inference | Natural Language Inference | W-F1 | 0.646 | 0.633 | 0.568 | 0.679 | -0.078 |
152
+ | News Summarization | xlsum | R-2 | 0.136 | 0.078 | 0.171 | 0.170 | 0.035 |
153
+ | Offensive Language Detection | Offensive Speech Detection | Mi-F1 | 0.723 | 0.621 | 0.862 | 0.865 | 0.139 |
154
+ | Cyberbullying Detection | MC_Hinglish1 | Acc | 0.609 | 0.233 | 0.625 | 0.627 | 0.016 |
155
+ | Sentiment Classification | Sentiment Analysis | Acc | 0.697 | 0.552 | 0.647 | 0.654 | -0.050
156
 
157
  ## Paper
158
  For an in-depth understanding, refer to our paper: [**LlamaLens: Specialized Multilingual LLM for Analyzing News and Social Media Content**](https://arxiv.org/pdf/2410.15308).