QCRI
/

LlamaLens

@@ -21,7 +21,7 @@ tags:
 # LlamaLens: Specialized Multilingual LLM forAnalyzing News and Social Media Content
 ## Overview
-LlamaLens is a specialized multilingual LLM designed for analyzing news and social media content. It focuses on 19 NLP tasks, leveraging 52 datasets across Arabic, English, and Hindi.
 <p align="center">
 <picture>
@@ -77,80 +77,82 @@ print(generated_text)
 ## Results
-Below, we present the performance of **LlamaLens** compared to existing SOTA (if available) and the Llama-Instruct baseline, The “Δ” (Delta) column here is
 calculated as **(LLamalens – SOTA)**.
 ---
 ## Arabic
-| **Task**               | **Dataset**               | **Metric** | **SOTA** | **Llama-instruct** | **LLamalens** | **Δ** (LLamalens - SOTA) |
-|------------------------|---------------------------|-----------:|--------:|--------------------:|--------------:|------------------------------:|
-| News Summarization     | xlsum                     | R-2        | 0.137   | 0.034              | 0.075         | -0.062                       |
-| News Genre             | ASND                      | Ma-F1      | 0.770   | 0.587              | 0.938         | 0.168                        |
-| News Genre             | SANADAkhbarona            | Acc        | 0.940   | 0.784              | 0.922         | -0.018                       |
-| News Genre             | SANADAlArabiya            | Acc        | 0.974   | 0.893              | 0.986         | 0.012                        |
-| News Genre             | SANADAlkhaleej            | Acc        | 0.986   | 0.865              | 0.967         | -0.019                       |
-| News Genre             | UltimateDataset           | Ma-F1      | 0.970   | 0.376              | 0.883         | -0.087                       |
-| News Credibility       | NewsCredibility           | Acc        | 0.899   | 0.455              | 0.494         | -0.405                       |
-| Emotion                | Emotional-Tone            | W-F1       | 0.658   | 0.358              | 0.748         | 0.090                        |
-| Emotion                | NewsHeadline              | Acc        | 1.000   | 0.406              | 0.551         | -0.449                       |
-| Sarcasm                | ArSarcasm-v2              | F1_Pos     | 0.584   | 0.477              | 0.307         | -0.277                       |
-| Sentiment              | ar_reviews_100k           | F1_Pos     | –       | 0.343              | 0.665         | –                            |
-| Sentiment              | ArSAS                     | Acc        | 0.920   | 0.603              | 0.795         | -0.125                       |
-| Stance                 | stance                    | Ma-F1      | 0.767   | 0.608              | 0.936         | 0.169                        |
-| Stance                 | Mawqif-Arabic-Stance      | Ma-F1      | 0.789   | 0.764              | 0.867         | 0.078                        |
-| Att.worthiness         | CT22Attentionworthy       | W-F1       | 0.412   | 0.158              | 0.544         | 0.132                        |
-| Checkworthiness        | CT24_T1                   | F1_Pos     | 0.569   | 0.404              | 0.877         | 0.308                        |
-| Claim                  | CT22Claim                 | Acc        | 0.703   | 0.581              | 0.778         | 0.075                        |
-| Factuality             | Arafacts                  | Mi-F1      | 0.850   | 0.210              | 0.534         | -0.316                       |
-| Factuality             | COVID19Factuality         | W-F1       | 0.831   | 0.492              | 0.781         | -0.050                       |
-| Propaganda             | ArPro                     | Mi-F1      | 0.767   | 0.597              | 0.762         | -0.005                       |
-| Cyberbullying          | ArCyc_CB                  | Acc        | 0.863   | 0.766              | 0.753         | -0.110                       |
-| Harmfulness            | CT22Harmful               | F1_Pos     | 0.557   | 0.507              | 0.508         | -0.049                       |
-| Hate Speech            | annotated-hatetweets-4    | W-F1       | 0.630   | 0.257              | 0.549         | -0.081                       |
-| Hate Speech            | OSACT4SubtaskB            | Mi-F1      | 0.950   | 0.819              | 0.802         | -0.148                       |
-| Offensive              | ArCyc_OFF                 | Ma-F1      | 0.878   | 0.489              | 0.652         | -0.226                       |
-| Offensive              | OSACT4SubtaskA            | Ma-F1      | 0.905   | 0.782              | 0.899         | -0.006                       |
 ---
 ## English
-| **Task**             | **Dataset**               | **Metric** | **SOTA** | **Llama-instruct** | **LLamalens** | **Δ** (LLamalens - SOTA) |
-|----------------------|---------------------------|-----------:|--------:|--------------------:|--------------:|------------------------------:|
-| News Summarization   | xlsum                     | R-2        | 0.152   | 0.074              | 0.141         | -0.011                       |
-| News Genre           | CNN_News_Articles         | Acc        | 0.940   | 0.644              | 0.915         | -0.025                       |
-| News Genre           | News_Category             | Ma-F1      | 0.769   | 0.970              | 0.505         | -0.264                       |
-| News Genre           | SemEval23T3-ST1           | Mi-F1      | 0.815   | 0.687              | 0.241         | -0.574                       |
-| Subjectivity         | CT24_T2                   | Ma-F1      | 0.744   | 0.535              | 0.508         | -0.236                       |
-| Emotion              | emotion                   | Ma-F1      | 0.790   | 0.353              | 0.878         | 0.088                        |
-| Sarcasm              | News-Headlines            | Acc        | 0.897   | 0.668              | 0.956         | 0.059                        |
-| Sentiment            | NewsMTSC                  | Ma-F1      | 0.817   | 0.628              | 0.627         | -0.190                       |
-| Checkworthiness      | CT24_T1                   | F1_Pos     | 0.753   | 0.404              | 0.877         | 0.124                        |
-| Claim                | claim-detection           | Mi-F1      | –       | 0.545              | 0.915         | –                            |
-| Factuality           | News_dataset              | Acc        | 0.920   | 0.654              | 0.946         | 0.026                        |
-| Factuality           | Politifact                | W-F1       | 0.490   | 0.121              | 0.290         | -0.200                       |
-| Propaganda           | QProp                     | Ma-F1      | 0.667   | 0.759              | 0.851         | 0.184                        |
-| Cyberbullying        | Cyberbullying             | Acc        | 0.907   | 0.175              | 0.847         | -0.060                       |
-| Offensive            | Offensive_Hateful         | Mi-F1      | –       | 0.692              | 0.805         | –                            |
-| Offensive            | offensive_language        | Mi-F1      | 0.994   | 0.646              | 0.884         | -0.110                       |
-| Offensive & Hate     | hate-offensive-speech     | Acc        | 0.945   | 0.602              | 0.924         | -0.021                       |
 ---
 ## Hindi
-| **Task**               | **Dataset**            | **Metric** | **SOTA** | **Llama-instruct** | **LLamalens** | **Δ** (LLamalens - SOTA) |
-|------------------------|------------------------|-----------:|--------:|--------------------:|--------------:|------------------------------:|
-| NLI                    | NLI_dataset           | W-F1       | 0.646   | 0.633              | 0.655         | 0.009                        |
-| News Summarization     | xlsum                 | R-2        | 0.136   | 0.078              | 0.117         | -0.019                       |
-| Sentiment              | Sentiment Analysis    | Acc        | 0.697   | 0.552              | 0.669         | -0.028                       |
-| Factuality             | fake-news             | Mi-F1      | –       | 0.759              | 0.713         | –                            |
-| Hate Speech            | hate-speech-detection | Mi-F1      | 0.639   | 0.750              | 0.994         | 0.355                        |
-| Hate Speech            | Hindi-Hostility       | W-F1       | 0.841   | 0.469              | 0.720         | -0.121                       |
-| Offensive              | Offensive Speech      | Mi-F1      | 0.723   | 0.621              | 0.847         | 0.124                        |
-| Cyberbullying          | MC_Hinglish1          | Acc        | 0.609   | 0.233              | 0.587         | -0.022                       |
 ## Paper
 For an in-depth understanding, refer to our paper: [**LlamaLens: Specialized Multilingual LLM for Analyzing News and Social Media Content**](https://arxiv.org/pdf/2410.15308).

 # LlamaLens: Specialized Multilingual LLM forAnalyzing News and Social Media Content
 ## Overview
+LlamaLens is a specialized multilingual LLM designed for analyzing news and social media content. It focuses on 18 NLP tasks, leveraging 52 datasets across Arabic, English, and Hindi.
 <p align="center">
 <picture>
 ## Results
+Below, we present  the performance of **LlamaLens** , where *"English"* refers to the English-instructed model and *"Native"* refers to the model trained with native language instructions compared to existing SOTA (if available) and the Llama-Instruct baseline, The “Δ” (Delta) column here is
 calculated as **(LLamalens – SOTA)**.
 ---
 ## Arabic
+|              **Task**              |                  **Dataset**                 | **Metric** | **SOTA** | **Llama3.1-instruct** | **Llamalens-English** | **Llamalens-Native** | **Δ (LLamalens - SOTA)** |
+|:----------------------------------:|:--------------------------------------------:|:----------:|:--------:|:---------------------:|:---------------------:|:--------------------:|:------------------------:|
+|    Attentionworthiness Detection   |              CT22Attentionworthy             |    W-F1    |   0.412  |         0.158         |         0.425         |         0.454        |           0.013          |
+|      Checkworthiness Detection     |               CT24_checkworthy               |   F1_Pos   |   0.569  |         0.610         |         0.502         |         0.509        |          -0.067          |
+|           Claim Detection          |                   CT22Claim                  |     Acc    |   0.703  |         0.581         |         0.734         |         0.756        |           0.031          |
+|       Cyberbullying Detection      |                   ArCyc_CB                   |     Acc    |   0.863  |         0.766         |         0.870         |         0.833        |           0.007          |
+|          Emotion Detection         |                Emotional-Tone                |    W-F1    |   0.658  |         0.358         |         0.705         |         0.736        |           0.047          |
+|          Emotion Detection         |                 NewsHeadline                 |     Acc    |   1.000  |         0.406         |         0.480         |         0.458        |          -0.520          |
+|             Factuality             |                   Arafacts                   |    Mi-F1   |   0.850  |         0.210         |         0.771         |         0.738        |          -0.079          |
+|             Factuality             |               COVID19Factuality              |    W-F1    |   0.831  |         0.492         |         0.800         |         0.840        |          -0.031          |
+|        Harmfulness Detection       |                  CT22Harmful                 |   F1_Pos   |   0.557  |         0.507         |         0.523         |         0.535        |          -0.034          |
+|        Hate Speech Detection       |        annotated-hatetweets-4-classes        |    W-F1    |   0.630  |         0.257         |         0.526         |         0.517        |          -0.104          |
+|        Hate Speech Detection       |                OSACT4SubtaskB                |    Mi-F1   |   0.950  |         0.819         |         0.955         |         0.955        |           0.005          |
+|         News Categorization        |                     ASND                     |    Ma-F1   |   0.770  |         0.587         |         0.919         |         0.929        |           0.149          |
+|         News Categorization        |      SANADAkhbarona-news-categorization      |     Acc    |   0.940  |         0.784         |         0.954         |         0.953        |           0.014          |
+|         News Categorization        |      SANADAlArabiya-news-categorization      |     Acc    |   0.974  |         0.893         |         0.987         |         0.985        |           0.013          |
+|         News Categorization        |      SANADAlkhaleej-news-categorization      |     Acc    |   0.986  |         0.865         |         0.984         |         0.982        |          -0.002          |
+|         News Categorization        |                UltimateDataset               |    Ma-F1   |   0.970  |         0.376         |         0.865         |         0.880        |          -0.105          |
+|          News Credibility          |            NewsCredibilityDataset            |     Acc    |   0.899  |         0.455         |         0.935         |         0.933        |           0.036          |
+|         News Summarization         |                     xlsum                    |     R-2    |   0.137  |         0.034         |         0.129         |         0.130        |          -0.009          |
+|    Offensive Language Detection    |                   ArCyc_OFF                  |    Ma-F1   |   0.878  |         0.489         |         0.877         |         0.879        |          -0.001          |
+|    Offensive Language Detection    |                OSACT4SubtaskA                |    Ma-F1   |   0.905  |         0.782         |         0.896         |         0.882        |          -0.009          |
+|        Propaganda Detection        |                     ArPro                    |    Mi-F1   |   0.767  |         0.597         |         0.747         |         0.731        |          -0.020          |
+|          Sarcasm Detection         |                 ArSarcasm-v2                 |   F1_Pos   |   0.584  |         0.477         |         0.520         |         0.542        |          -0.064          |
+|      Sentiment Classification      |                ar_reviews_100k               |   F1_Pos   |    --    |         0.681         |         0.785         |         0.779        |            --            |
+|      Sentiment Classification      |                     ArSAS                    |     Acc    |   0.920  |         0.603         |         0.800         |         0.804        |          -0.120          |
+|          Stance Detection          |                    stance                    |    Ma-F1   |   0.767  |         0.608         |         0.926         |         0.881        |           0.159          |
+|          Stance Detection          |           Mawqif-Arabic-Stance-main          |    Ma-F1   |   0.789  |         0.764         |         0.853         |         0.826        |           0.065          |
+|       Subjectivity Detection       |                    ThatiAR                   |   f1_pos   |   0.800  |         0.562         |         0.441         |         0.383        |          -0.359          |
 ---
 ## English
+|              **Task**              |                  **Dataset**                 | **Metric** | **SOTA** | **Llama3.1-instruct** | **Llamalens-English** | **Llamalens-Native** | **Δ (LLamalens - SOTA)** |
+|:----------------------------------:|:--------------------------------------------:|:----------:|:--------:|:---------------------:|:---------------------:|:--------------------:|:------------------------:|
+|      Checkworthiness Detection     |               CT24_checkworthy               |   f1_pos   |   0.753  |         0.404         |         0.942         |         0.942        |           0.189          |
+|           Claim Detection          |                claim-detection               |    Mi-F1   |    --    |         0.545         |         0.864         |         0.889        |            --            |
+|       Cyberbullying Detection      |                 Cyberbullying                |     Acc    |   0.907  |         0.175         |         0.836         |         0.855        |          -0.071          |
+|         Emotion Detection          |                    emotion                   |    Ma-F1   |   0.790  |         0.353         |         0.803         |         0.808        |           0.013          |
+|             Factuality             |                 News_dataset                 |     Acc    |   0.920  |         0.654         |         1.000         |         1.000        |           0.080          |
+|             Factuality             |                  Politifact                  |    W-F1    |   0.490  |         0.121         |         0.287         |         0.311        |          -0.203          |
+|         News Categorization        |          CNN_News_Articles_2011-2022         |     Acc    |   0.940  |         0.644         |         0.970         |         0.970        |           0.030          |
+|         News Categorization        |             News_Category_Dataset            |    Ma-F1   |   0.769  |         0.970         |         0.824         |         0.520        |           0.055          |
+|      News Genre Categorisation     |             SemEval23T3-subtask1             |    Mi-F1   |   0.815  |         0.687         |         0.241         |         0.253        |          -0.574          |
+|         News Summarization         |                     xlsum                    |     R-2    |   0.152  |         0.074         |         0.182         |         0.181        |           0.030          |
+|    Offensive Language Detection    |         Offensive_Hateful_Dataset_New        |    Mi-F1   |    --    |         0.692         |         0.814         |         0.813        |            --            |
+|    Offensive Language Detection    |          offensive_language_dataset          |    Mi-F1   |   0.994  |         0.646         |         0.899         |         0.893        |          -0.095          |
+| Offensive Language and Hate Speech |             hate-offensive-speech            |     Acc    |   0.945  |         0.602         |         0.931         |         0.935        |          -0.014          |
+|        Propaganda Detection        |                     QProp                    |    Ma-F1   |   0.667  |         0.759         |         0.963         |         0.973        |           0.296          |
+|          Sarcasm Detection         | News-Headlines-Dataset-For-Sarcasm-Detection |     Acc    |   0.897  |         0.668         |         0.936         |         0.947        |           0.039          |
+|      Sentiment Classification      |               NewsMTSC-dataset               |    Ma-F1   |   0.817  |         0.628         |         0.751         |         0.748        |          -0.066          |
+|       Subjectivity Detection       |            clef2024-checkthat-lab            |    Ma-F1   |   0.744  |         0.535         |         0.642         |         0.628        |          -0.102          |
+|
 ---
 ## Hindi
+|              **Task**              |                  **Dataset**                 | **Metric** | **SOTA** | **Llama3.1-instruct** | **Llamalens-English** | **Llamalens-Native** | **Δ (LLamalens - SOTA)** |
+|:----------------------------------:|:--------------------------------------------:|:----------:|:--------:|:---------------------:|:---------------------:|:--------------------:|:------------------------:|
+|             Factuality             |                   fake-news                  |    Mi-F1   |    --    |         0.759         |         0.994         |         0.993        |            --            |
+|        Hate Speech Detection       |             hate-speech-detection            |    Mi-F1   |   0.639  |         0.750         |         0.963         |         0.963        |           0.324          |
+|        Hate Speech Detection       |   Hindi-Hostility-Detection-CONSTRAINT-2021  |    W-F1    |   0.841  |         0.469         |         0.753         |         0.753        |          -0.088          |
+|     Natural Language Inference     |          Natural Language Inference          |    W-F1    |   0.646  |         0.633         |         0.568         |         0.679        |          -0.078          |
+|         News Summarization         |                     xlsum                    |     R-2    |   0.136  |         0.078         |         0.171         |         0.170        |           0.035          |
+|    Offensive Language Detection    |          Offensive Speech Detection          |    Mi-F1   |   0.723  |         0.621         |         0.862         |         0.865        |           0.139          |
+|       Cyberbullying Detection      |                 MC_Hinglish1                 |     Acc    |   0.609  |         0.233         |         0.625         |         0.627        |           0.016          |
+|      Sentiment Classification      |              Sentiment Analysis              |     Acc    |   0.697  |         0.552         |         0.647         |         0.654        |          -0.050
 ## Paper
 For an in-depth understanding, refer to our paper: [**LlamaLens: Specialized Multilingual LLM for Analyzing News and Social Media Content**](https://arxiv.org/pdf/2410.15308).