Update README.md
Browse files
README.md
CHANGED
@@ -1,14 +1,14 @@
|
|
1 |
---
|
2 |
-
base_model:
|
3 |
library_name: peft
|
4 |
license: apache-2.0
|
5 |
language:
|
6 |
-
-
|
7 |
tags:
|
8 |
- propaganda
|
9 |
---
|
10 |
|
11 |
-
# Model Card for identrics/
|
12 |
|
13 |
|
14 |
|
@@ -16,35 +16,37 @@ tags:
|
|
16 |
## Model Description
|
17 |
|
18 |
- **Developed by:** [`Identrics`](https://identrics.ai/)
|
19 |
-
- **Language:**
|
20 |
- **License:** apache-2.0
|
21 |
-
- **Finetuned from model:** [`
|
22 |
- **Context window :** 8192 tokens
|
23 |
|
24 |
## Model Description
|
25 |
|
26 |
-
This model consists of a fine-tuned version of
|
27 |
-
|
|
|
|
|
28 |
|
29 |
|
30 |
## Propaganda taxonomy
|
31 |
|
32 |
-
The propaganda techniques
|
33 |
|
34 |
-
1. Self-Identification Techniques
|
35 |
These techniques exploit the audience's feelings of association (or desire to be associated) with a larger group. They suggest that the audience should feel united, motivated, or threatened by the same factors that unite, motivate, or threaten that group.
|
36 |
|
37 |
|
38 |
-
2. Defamation Techniques
|
39 |
These techniques represent direct or indirect attacks against an entity's reputation and worth.
|
40 |
|
41 |
-
3. Legitimisation Techniques
|
42 |
These techniques attempt to prove and legitimise the propagandist's statements by using arguments that cannot be falsified because they are based on moral values or personal experiences.
|
43 |
|
44 |
-
4. Logical Fallacies
|
45 |
These techniques appeal to the audience's reason and masquerade as objective and factual arguments, but in reality, they exploit distractions and flawed logic.
|
46 |
|
47 |
-
5. Rhetorical Devices
|
48 |
These techniques seek to influence the audience and control the conversation by using linguistic methods.
|
49 |
|
50 |
|
@@ -68,8 +70,8 @@ Then the model can be downloaded and used for inference:
|
|
68 |
```py
|
69 |
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
70 |
|
71 |
-
model = AutoModelForSequenceClassification.from_pretrained("identrics/
|
72 |
-
tokenizer = AutoTokenizer.from_pretrained("identrics/
|
73 |
|
74 |
tokens = tokenizer("Our country is the most powerful country in the world!", return_tensors="pt")
|
75 |
output = model(**tokens)
|
@@ -92,14 +94,19 @@ print(output.logits)
|
|
92 |
## Training Details
|
93 |
|
94 |
|
95 |
-
During the training stage,
|
96 |
-
|
|
|
|
|
|
|
|
|
97 |
|
98 |
|
|
|
99 |
|
100 |
-
|
101 |
|
102 |
-
## Citation
|
103 |
|
104 |
If you find our work useful, please consider citing WASPer:
|
105 |
|
|
|
1 |
---
|
2 |
+
base_model: mistralai/Mistral-7B-v0.1
|
3 |
library_name: peft
|
4 |
license: apache-2.0
|
5 |
language:
|
6 |
+
- en
|
7 |
tags:
|
8 |
- propaganda
|
9 |
---
|
10 |
|
11 |
+
# Model Card for identrics/wasper_propaganda_classifier_en
|
12 |
|
13 |
|
14 |
|
|
|
16 |
## Model Description
|
17 |
|
18 |
- **Developed by:** [`Identrics`](https://identrics.ai/)
|
19 |
+
- **Language:** English
|
20 |
- **License:** apache-2.0
|
21 |
+
- **Finetuned from model:** [`mistralai/Mistral-7B-v0.1`](https://huggingface.co/mistralai/Mistral-7B-v0.1)
|
22 |
- **Context window :** 8192 tokens
|
23 |
|
24 |
## Model Description
|
25 |
|
26 |
+
This model consists of a fine-tuned version of mistralai/Mistral-7B-v0.1 for a propaganda detection task. It is effectively a multilabel classifier, determining whether a given propaganda text in English contains or not 5 predefined propaganda types.
|
27 |
+
|
28 |
+
|
29 |
+
This model was created by [`Identrics`](https://identrics.ai/), in the scope of the WASPer project. The detailed taxonomy of the full pipeline could be found [here](https://github.com/Identrics/wasper/).
|
30 |
|
31 |
|
32 |
## Propaganda taxonomy
|
33 |
|
34 |
+
The propaganda techniques identifiable with this model are classified into five categories:
|
35 |
|
36 |
+
1. **Self-Identification Techniques**:
|
37 |
These techniques exploit the audience's feelings of association (or desire to be associated) with a larger group. They suggest that the audience should feel united, motivated, or threatened by the same factors that unite, motivate, or threaten that group.
|
38 |
|
39 |
|
40 |
+
2. **Defamation Techniques**:
|
41 |
These techniques represent direct or indirect attacks against an entity's reputation and worth.
|
42 |
|
43 |
+
3. **Legitimisation Techniques**:
|
44 |
These techniques attempt to prove and legitimise the propagandist's statements by using arguments that cannot be falsified because they are based on moral values or personal experiences.
|
45 |
|
46 |
+
4. **Logical Fallacies**:
|
47 |
These techniques appeal to the audience's reason and masquerade as objective and factual arguments, but in reality, they exploit distractions and flawed logic.
|
48 |
|
49 |
+
5. **Rhetorical Devices**:
|
50 |
These techniques seek to influence the audience and control the conversation by using linguistic methods.
|
51 |
|
52 |
|
|
|
70 |
```py
|
71 |
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
72 |
|
73 |
+
model = AutoModelForSequenceClassification.from_pretrained("identrics/wasper_propaganda_classifier_en", num_labels=5)
|
74 |
+
tokenizer = AutoTokenizer.from_pretrained("identrics/wasper_propaganda_classifier_en")
|
75 |
|
76 |
tokens = tokenizer("Our country is the most powerful country in the world!", return_tensors="pt")
|
77 |
output = model(**tokens)
|
|
|
94 |
## Training Details
|
95 |
|
96 |
|
97 |
+
During the training stage, the objective was to develop the multi-label classifier to identify different types of propaganda using a dataset containing both real and artificially generated samples.
|
98 |
+
|
99 |
+
The data has been carefully annotated by domain experts based on a predefined taxonomy, which covers five primary categories. Some examples are assigned to a single category, while others are classified into multiple categories, reflecting the nuanced nature of propaganda where multiple techniques can be found within a single text.
|
100 |
+
|
101 |
+
|
102 |
+
The model reached an F1-weighted score of **0.464** during training.
|
103 |
|
104 |
|
105 |
+
## Compute Infrastructure
|
106 |
|
107 |
+
This model was fine-tuned using a **GPU / 2xNVIDIA Tesla V100 32GB**.
|
108 |
|
109 |
+
## Citation [this section is to be updated soon]
|
110 |
|
111 |
If you find our work useful, please consider citing WASPer:
|
112 |
|