add results
Browse files
README.md
CHANGED
@@ -71,6 +71,84 @@ response = tokenizer.decode(output[0], skip_special_tokens=True)
|
|
71 |
print(response)
|
72 |
```
|
73 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
## Paper
|
75 |
For an in-depth understanding, refer to our paper: [**LlamaLens: Specialized Multilingual LLM for Analyzing News and Social Media Content**](https://arxiv.org/pdf/2410.15308).
|
76 |
|
|
|
71 |
print(response)
|
72 |
```
|
73 |
|
74 |
+
|
75 |
+
## Results
|
76 |
+
|
77 |
+
Below, we present the performance of **LlamaLens** compared to existing SOTA (if available) and the Llama-Instruct baseline, The “Delta” column here is
|
78 |
+
calculated as **(LLamalens – SOTA)**.
|
79 |
+
|
80 |
+
---
|
81 |
+
|
82 |
+
## Arabic
|
83 |
+
|
84 |
+
| **Task** | **Dataset** | **Metric** | **SOTA** | **Llama-instruct** | **LLamalens** | **Delta** (LLamalens - SOTA) |
|
85 |
+
|------------------------|---------------------------|-----------:|--------:|--------------------:|--------------:|------------------------------:|
|
86 |
+
| News Summarization | xlsum | R-2 | 0.137 | 0.034 | 0.075 | -0.062 |
|
87 |
+
| News Genre | ASND | Ma-F1 | 0.770 | 0.587 | 0.938 | 0.168 |
|
88 |
+
| News Genre | SANADAkhbarona | Acc | 0.940 | 0.784 | 0.922 | -0.018 |
|
89 |
+
| News Genre | SANADAlArabiya | Acc | 0.974 | 0.893 | 0.986 | 0.012 |
|
90 |
+
| News Genre | SANADAlkhaleej | Acc | 0.986 | 0.865 | 0.967 | -0.019 |
|
91 |
+
| News Genre | UltimateDataset | Ma-F1 | 0.970 | 0.376 | 0.883 | -0.087 |
|
92 |
+
| News Credibility | NewsCredibility | Acc | 0.899 | 0.455 | 0.494 | -0.405 |
|
93 |
+
| Emotion | Emotional-Tone | W-F1 | 0.658 | 0.358 | 0.748 | 0.090 |
|
94 |
+
| Emotion | NewsHeadline | Acc | 1.000 | 0.406 | 0.551 | -0.449 |
|
95 |
+
| Sarcasm | ArSarcasm-v2 | F1_Pos | 0.584 | 0.477 | 0.307 | -0.277 |
|
96 |
+
| Sentiment | ar_reviews_100k | F1_Pos | – | 0.343 | 0.665 | – |
|
97 |
+
| Sentiment | ArSAS | Acc | 0.920 | 0.603 | 0.795 | -0.125 |
|
98 |
+
| Stance | stance | Ma-F1 | 0.767 | 0.608 | 0.936 | 0.169 |
|
99 |
+
| Stance | Mawqif-Arabic-Stance | Ma-F1 | 0.789 | 0.764 | 0.867 | 0.078 |
|
100 |
+
| Att.worthiness | CT22Attentionworthy | W-F1 | 0.412 | 0.158 | 0.544 | 0.132 |
|
101 |
+
| Checkworthiness | CT24_T1 | F1_Pos | 0.569 | 0.404 | 0.877 | 0.308 |
|
102 |
+
| Claim | CT22Claim | Acc | 0.703 | 0.581 | 0.778 | 0.075 |
|
103 |
+
| Factuality | Arafacts | Mi-F1 | 0.850 | 0.210 | 0.534 | -0.316 |
|
104 |
+
| Factuality | COVID19Factuality | W-F1 | 0.831 | 0.492 | 0.781 | -0.050 |
|
105 |
+
| Propaganda | ArPro | Mi-F1 | 0.767 | 0.597 | 0.762 | -0.005 |
|
106 |
+
| Cyberbullying | ArCyc_CB | Acc | 0.863 | 0.766 | 0.753 | -0.110 |
|
107 |
+
| Harmfulness | CT22Harmful | F1_Pos | 0.557 | 0.507 | 0.508 | -0.049 |
|
108 |
+
| Hate Speech | annotated-hatetweets-4 | W-F1 | 0.630 | 0.257 | 0.549 | -0.081 |
|
109 |
+
| Hate Speech | OSACT4SubtaskB | Mi-F1 | 0.950 | 0.819 | 0.802 | -0.148 |
|
110 |
+
| Offensive | ArCyc_OFF | Ma-F1 | 0.878 | 0.489 | 0.652 | -0.226 |
|
111 |
+
| Offensive | OSACT4SubtaskA | Ma-F1 | 0.905 | 0.782 | 0.899 | -0.006 |
|
112 |
+
|
113 |
+
---
|
114 |
+
|
115 |
+
## English
|
116 |
+
|
117 |
+
| **Task** | **Dataset** | **Metric** | **SOTA** | **Llama-instruct** | **LLamalens** | **Delta** (LLamalens - SOTA) |
|
118 |
+
|----------------------|---------------------------|-----------:|--------:|--------------------:|--------------:|------------------------------:|
|
119 |
+
| News Summarization | xlsum | R-2 | 0.152 | 0.074 | 0.141 | -0.011 |
|
120 |
+
| News Genre | CNN_News_Articles | Acc | 0.940 | 0.644 | 0.915 | -0.025 |
|
121 |
+
| News Genre | News_Category | Ma-F1 | 0.769 | 0.970 | 0.505 | -0.264 |
|
122 |
+
| News Genre | SemEval23T3-ST1 | Mi-F1 | 0.815 | 0.687 | 0.241 | -0.574 |
|
123 |
+
| Subjectivity | CT24_T2 | Ma-F1 | 0.744 | 0.535 | 0.508 | -0.236 |
|
124 |
+
| Emotion | emotion | Ma-F1 | 0.790 | 0.353 | 0.878 | 0.088 |
|
125 |
+
| Sarcasm | News-Headlines | Acc | 0.897 | 0.668 | 0.956 | 0.059 |
|
126 |
+
| Sentiment | NewsMTSC | Ma-F1 | 0.817 | 0.628 | 0.627 | -0.190 |
|
127 |
+
| Checkworthiness | CT24_T1 | F1_Pos | 0.753 | 0.404 | 0.877 | 0.124 |
|
128 |
+
| Claim | claim-detection | Mi-F1 | – | 0.545 | 0.915 | – |
|
129 |
+
| Factuality | News_dataset | Acc | 0.920 | 0.654 | 0.946 | 0.026 |
|
130 |
+
| Factuality | Politifact | W-F1 | 0.490 | 0.121 | 0.290 | -0.200 |
|
131 |
+
| Propaganda | QProp | Ma-F1 | 0.667 | 0.759 | 0.851 | 0.184 |
|
132 |
+
| Cyberbullying | Cyberbullying | Acc | 0.907 | 0.175 | 0.847 | -0.060 |
|
133 |
+
| Offensive | Offensive_Hateful | Mi-F1 | – | 0.692 | 0.805 | – |
|
134 |
+
| Offensive | offensive_language | Mi-F1 | 0.994 | 0.646 | 0.884 | -0.110 |
|
135 |
+
| Offensive & Hate | hate-offensive-speech | Acc | 0.945 | 0.602 | 0.924 | -0.021 |
|
136 |
+
|
137 |
+
---
|
138 |
+
|
139 |
+
## Hindi
|
140 |
+
|
141 |
+
| **Task** | **Dataset** | **Metric** | **SOTA** | **Llama-instruct** | **LLamalens** | **Delta** (LLamalens - SOTA) |
|
142 |
+
|------------------------|------------------------|-----------:|--------:|--------------------:|--------------:|------------------------------:|
|
143 |
+
| NLI | NLI_dataset | W-F1 | 0.646 | 0.633 | 0.655 | 0.009 |
|
144 |
+
| News Summarization | xlsum | R-2 | 0.136 | 0.078 | 0.117 | -0.019 |
|
145 |
+
| Sentiment | Sentiment Analysis | Acc | 0.697 | 0.552 | 0.669 | -0.028 |
|
146 |
+
| Factuality | fake-news | Mi-F1 | – | 0.759 | 0.713 | – |
|
147 |
+
| Hate Speech | hate-speech-detection | Mi-F1 | 0.639 | 0.750 | 0.994 | 0.355 |
|
148 |
+
| Hate Speech | Hindi-Hostility | W-F1 | 0.841 | 0.469 | 0.720 | -0.121 |
|
149 |
+
| Offensive | Offensive Speech | Mi-F1 | 0.723 | 0.621 | 0.847 | 0.124 |
|
150 |
+
| Cyberbullying | MC_Hinglish1 | Acc | 0.609 | 0.233 | 0.587 | -0.022 |
|
151 |
+
|
152 |
## Paper
|
153 |
For an in-depth understanding, refer to our paper: [**LlamaLens: Specialized Multilingual LLM for Analyzing News and Social Media Content**](https://arxiv.org/pdf/2410.15308).
|
154 |
|