boryana commited on
Commit
8d07ed6
·
verified ·
1 Parent(s): b67453b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -19
README.md CHANGED
@@ -1,14 +1,14 @@
1
  ---
2
- base_model: INSAIT-Institute/BgGPT-7B-Instruct-v0.2
3
  library_name: peft
4
  license: apache-2.0
5
  language:
6
- - bg
7
  tags:
8
  - propaganda
9
  ---
10
 
11
- # Model Card for identrics/BG_propaganda_detector
12
 
13
 
14
 
@@ -16,35 +16,37 @@ tags:
16
  ## Model Description
17
 
18
  - **Developed by:** [`Identrics`](https://identrics.ai/)
19
- - **Language:** Bulgarian
20
  - **License:** apache-2.0
21
- - **Finetuned from model:** [`INSAIT-Institute/BgGPT-7B-Instruct-v0.2`](https://huggingface.co/INSAIT-Institute/BgGPT-7B-Instruct-v0.2)
22
  - **Context window :** 8192 tokens
23
 
24
  ## Model Description
25
 
26
- This model consists of a fine-tuned version of BgGPT-7B-Instruct-v0.2 for a propaganda detection task. It is effectively a multilabel classifier, determining wether a given propaganda text in English contains or not 5 predefined propaganda types.
27
- This model was created by [`Identrics`](https://identrics.ai/), in the scope of the WASPer project. The detailed taxonomy could be found [here](https://github.com/Identrics/wasper/).
 
 
28
 
29
 
30
  ## Propaganda taxonomy
31
 
32
- The propaganda techniques we want to identify are classified in 5 categories:
33
 
34
- 1. Self-Identification Techniques:
35
  These techniques exploit the audience's feelings of association (or desire to be associated) with a larger group. They suggest that the audience should feel united, motivated, or threatened by the same factors that unite, motivate, or threaten that group.
36
 
37
 
38
- 2. Defamation Techniques:
39
  These techniques represent direct or indirect attacks against an entity's reputation and worth.
40
 
41
- 3. Legitimisation Techniques:
42
  These techniques attempt to prove and legitimise the propagandist's statements by using arguments that cannot be falsified because they are based on moral values or personal experiences.
43
 
44
- 4. Logical Fallacies:
45
  These techniques appeal to the audience's reason and masquerade as objective and factual arguments, but in reality, they exploit distractions and flawed logic.
46
 
47
- 5. Rhetorical Devices:
48
  These techniques seek to influence the audience and control the conversation by using linguistic methods.
49
 
50
 
@@ -68,8 +70,8 @@ Then the model can be downloaded and used for inference:
68
  ```py
69
  from transformers import AutoModelForSequenceClassification, AutoTokenizer
70
 
71
- model = AutoModelForSequenceClassification.from_pretrained("identrics/BG_propaganda_classifier", num_labels=5)
72
- tokenizer = AutoTokenizer.from_pretrained("identrics/EN_propaganda_classifier")
73
 
74
  tokens = tokenizer("Our country is the most powerful country in the world!", return_tensors="pt")
75
  output = model(**tokens)
@@ -92,14 +94,19 @@ print(output.logits)
92
  ## Training Details
93
 
94
 
95
- During the training stage, our objective is to train the multi-label classifier on different types of propaganda using a dataset that includes both real and artificially generated samples. In the case of English, there are 214 organic examples and 206 synthetic examples.
96
- The data is carefully classified by domain experts based on our predetermined taxonomy, which covers five primary classifications. Certain examples are classified under just one category, others have several different groups, highlighting the complex structure of propaganda, where multiple techniques can be found inside a single text.
 
 
 
 
97
 
98
 
 
99
 
100
- The model was then tested on a smaller evaluation dataset, achieving an F1 score of
101
 
102
- ## Citation
103
 
104
  If you find our work useful, please consider citing WASPer:
105
 
 
1
  ---
2
+ base_model: mistralai/Mistral-7B-v0.1
3
  library_name: peft
4
  license: apache-2.0
5
  language:
6
+ - en
7
  tags:
8
  - propaganda
9
  ---
10
 
11
+ # Model Card for identrics/wasper_propaganda_classifier_en
12
 
13
 
14
 
 
16
  ## Model Description
17
 
18
  - **Developed by:** [`Identrics`](https://identrics.ai/)
19
+ - **Language:** English
20
  - **License:** apache-2.0
21
+ - **Finetuned from model:** [`mistralai/Mistral-7B-v0.1`](https://huggingface.co/mistralai/Mistral-7B-v0.1)
22
  - **Context window :** 8192 tokens
23
 
24
  ## Model Description
25
 
26
+ This model consists of a fine-tuned version of mistralai/Mistral-7B-v0.1 for a propaganda detection task. It is effectively a multilabel classifier, determining whether a given propaganda text in English contains or not 5 predefined propaganda types.
27
+
28
+
29
+ This model was created by [`Identrics`](https://identrics.ai/), in the scope of the WASPer project. The detailed taxonomy of the full pipeline could be found [here](https://github.com/Identrics/wasper/).
30
 
31
 
32
  ## Propaganda taxonomy
33
 
34
+ The propaganda techniques identifiable with this model are classified into five categories:
35
 
36
+ 1. **Self-Identification Techniques**:
37
  These techniques exploit the audience's feelings of association (or desire to be associated) with a larger group. They suggest that the audience should feel united, motivated, or threatened by the same factors that unite, motivate, or threaten that group.
38
 
39
 
40
+ 2. **Defamation Techniques**:
41
  These techniques represent direct or indirect attacks against an entity's reputation and worth.
42
 
43
+ 3. **Legitimisation Techniques**:
44
  These techniques attempt to prove and legitimise the propagandist's statements by using arguments that cannot be falsified because they are based on moral values or personal experiences.
45
 
46
+ 4. **Logical Fallacies**:
47
  These techniques appeal to the audience's reason and masquerade as objective and factual arguments, but in reality, they exploit distractions and flawed logic.
48
 
49
+ 5. **Rhetorical Devices**:
50
  These techniques seek to influence the audience and control the conversation by using linguistic methods.
51
 
52
 
 
70
  ```py
71
  from transformers import AutoModelForSequenceClassification, AutoTokenizer
72
 
73
+ model = AutoModelForSequenceClassification.from_pretrained("identrics/wasper_propaganda_classifier_en", num_labels=5)
74
+ tokenizer = AutoTokenizer.from_pretrained("identrics/wasper_propaganda_classifier_en")
75
 
76
  tokens = tokenizer("Our country is the most powerful country in the world!", return_tensors="pt")
77
  output = model(**tokens)
 
94
  ## Training Details
95
 
96
 
97
+ During the training stage, the objective was to develop the multi-label classifier to identify different types of propaganda using a dataset containing both real and artificially generated samples.
98
+
99
+ The data has been carefully annotated by domain experts based on a predefined taxonomy, which covers five primary categories. Some examples are assigned to a single category, while others are classified into multiple categories, reflecting the nuanced nature of propaganda where multiple techniques can be found within a single text.
100
+
101
+
102
+ The model reached an F1-weighted score of **0.464** during training.
103
 
104
 
105
+ ## Compute Infrastructure
106
 
107
+ This model was fine-tuned using a **GPU / 2xNVIDIA Tesla V100 32GB**.
108
 
109
+ ## Citation [this section is to be updated soon]
110
 
111
  If you find our work useful, please consider citing WASPer:
112