Nikola299 commited on
Commit
304130b
·
verified ·
1 Parent(s): da173ea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -8
README.md CHANGED
@@ -23,8 +23,8 @@ tags:
23
 
24
  ## Model Description
25
 
26
- This model consists of a fine-tuned version of BgGPT-7B-Instruct-v0.2 for a propaganda detection task. It is effectively a multilabel classifier, determining wether a given propaganda text contains or not 5 predefined propaganda types.
27
- This model was created by [`Identrics`](https://identrics.ai/), in the scope of the Wasper project.
28
 
29
 
30
  ## Propaganda taxonomy
@@ -52,7 +52,7 @@ These techniques seek to influence the audience and control the conversation by
52
 
53
  ## Uses
54
 
55
- To be used as a multilabel classifier to identify if the sample text contains one or more of the five propaganda techniques mentioned above.
56
 
57
  ### Example
58
 
@@ -69,7 +69,7 @@ Then the model can be downloaded and used for inference:
69
  from transformers import AutoModelForSequenceClassification, AutoTokenizer
70
 
71
  model = AutoModelForSequenceClassification.from_pretrained("identrics/BG_propaganda_classifier", num_labels=5)
72
- tokenizer = AutoTokenizer.from_pretrained("identrics/BG_propaganda_classifier")
73
 
74
  tokens = tokenizer("Our country is the most powerful country in the world!", return_tensors="pt")
75
  output = model(**tokens)
@@ -91,12 +91,11 @@ print(output.logits)
91
 
92
  ## Training Details
93
 
94
- The training datasets for the model consist of a balanced set totaling 734 Bulgarian examples that include both propaganda and non-propaganda content. These examples are collected from a variety of traditional media and social media sources, ensuring a diverse range of content. Aditionally, the training dataset is enriched with AI-generated samples. The total distribution of the training data is shown in the table below:
95
 
 
 
96
 
97
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66741cdd8123010b8f63f965/71vN4yLV9vyA5Cqc_WRRD.png)
98
 
99
 
100
- The model was then tested on a smaller evaluation dataset, achieving an f1 score of 0.836. The evaluation dataset is distributed as such:
101
 
102
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66741cdd8123010b8f63f965/DunBsCJMZSFezNVB0Vo3a.png)
 
23
 
24
  ## Model Description
25
 
26
+ This model consists of a fine-tuned version of BgGPT-7B-Instruct-v0.2 for a propaganda detection task. It is effectively a multilabel classifier, determining wether a given propaganda text in English contains or not 5 predefined propaganda types.
27
+ This model was created by [`Identrics`](https://identrics.ai/), in the scope of the WASPer project.
28
 
29
 
30
  ## Propaganda taxonomy
 
52
 
53
  ## Uses
54
 
55
+ To be used as a multilabel classifier to identify if the English sample text contains one or more of the five propaganda techniques mentioned above.
56
 
57
  ### Example
58
 
 
69
  from transformers import AutoModelForSequenceClassification, AutoTokenizer
70
 
71
  model = AutoModelForSequenceClassification.from_pretrained("identrics/BG_propaganda_classifier", num_labels=5)
72
+ tokenizer = AutoTokenizer.from_pretrained("identrics/EN_propaganda_classifier")
73
 
74
  tokens = tokenizer("Our country is the most powerful country in the world!", return_tensors="pt")
75
  output = model(**tokens)
 
91
 
92
  ## Training Details
93
 
 
94
 
95
+ During the training stage, our objective is to train the multi-label classifier on different types of propaganda using a dataset that includes both real and artificially generated samples. In the case of English, there are 214 organic examples and 206 synthetic examples.
96
+ The data is carefully classified by domain experts based on our predetermined taxonomy, which covers five primary classifications. Certain examples are classified under just one category, others have several different groups, highlighting the complex structure of propaganda, where multiple techniques can be found inside a single text.
97
 
 
98
 
99
 
100
+ The model was then tested on a smaller evaluation dataset, achieving an f1 score of
101