cornelius commited on
Commit
5094afd
1 Parent(s): 8beacc4

Upload TFBertForSequenceClassification

Browse files
Files changed (3) hide show
  1. README.md +23 -75
  2. config.json +1 -1
  3. tf_model.h5 +3 -0
README.md CHANGED
@@ -1,100 +1,48 @@
1
  ---
2
  license: cc-by-sa-4.0
3
- language:
4
- - de
5
- - en
6
- - es
7
- - da
8
- - pl
9
- - sv
10
- - nl
11
- metrics:
12
- - accuracy
13
- pipeline_tag: text-classification
14
  tags:
15
- - partypress
16
- - political science
17
- - parties
18
- - press releases
19
  ---
20
 
21
- *currently the model only works on German texts*
 
22
 
23
- # PARTYPRESS multilingual
24
 
25
- Fine-tuned model in seven languages on texts from nine countries, based on [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased). It used in Erfort et al. (2023).
 
26
 
27
 
28
  ## Model description
29
 
30
- The PARTYPRESS multilingual model builds on [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) but has a supervised component. This means, it was fine-tuned using texts labeled by humans. The labels indicate 23 different political issue categories derived from the Comparative Agendas Project (CAP).
31
-
32
-
33
- ## Model variations
34
-
35
- We plan to release monolingual models for each of the languages covered by this multilingual model.
36
 
37
  ## Intended uses & limitations
38
 
39
- The main use of the model is for text classification of press releases from political parties. It may also be useful for other political texts.
40
-
41
- ### How to use
42
-
43
- This model can be used directly with a pipeline for text classification:
44
-
45
- ```python
46
- >>> from transformers import pipeline
47
- >>> partypress = pipeline("text-classification", model = "cornelius/partypress-multilingual", tokenizer = "cornelius/partypress-multilingual")
48
- >>> partypress("We urgently need to fight climate change and reduce carbon emissions. This is what our party stands for.")
49
-
50
- ```
51
-
52
- ### Limitations and bias
53
-
54
- The model was trained with data from parties in nine countries. For use in other countries, the model may be further fine-tuned. Without further fine-tuning, the performance of the model may be lower.
55
 
56
- The model may have biased predictions. We discuss some biases by country, party, and over time in the release paper for the PARTYPRESS database.
57
 
58
- ## Training data
59
-
60
- The PARTYPRESS multilingual model was fine-tuned with 27,243 press releases in seven languages on texts from 68 European parties in nine countries. The press releases were labeled by two expert human coders per country.
61
-
62
- For the training data of the underlying model, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)
63
 
64
  ## Training procedure
65
 
66
- ### Preprocessing
67
-
68
- For the preprocessing, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)
69
-
70
- ### Pretraining
71
-
72
- For the pretraining, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)
73
-
74
- ### Fine-tuning
75
-
76
-
77
- ## Evaluation results
78
-
79
- Fine-tuned on our downstream task, this model achieves the following results in a five-fold cross validation:
80
 
81
- | Accuracy | Precision | Recall | F1 score |
82
- |:--------:|:---------:|:-------:|:--------:|
83
- | 69.52 | 67.99 | 67.60 | 66.77 |
84
 
85
- ### BibTeX entry and citation info
86
 
87
- ```bibtex
88
- @article{erfort_partypress_2023,
89
- author = {Cornelius Erfort and
90
- Lukas F. Stoetzer and
91
- Heike Klüver},
92
- title = {The PARTYPRESS Database: A New Comparative Database of Parties’ Press Releases},
93
- journal = {Research and Politics},
94
- volume = {forthcoming},
95
- year = {2023},
96
- }
97
- ```
98
 
99
 
 
100
 
 
 
 
 
 
1
  ---
2
  license: cc-by-sa-4.0
 
 
 
 
 
 
 
 
 
 
 
3
  tags:
4
+ - generated_from_keras_callback
5
+ model-index:
6
+ - name: partypress-multilingual
7
+ results: []
8
  ---
9
 
10
+ <!-- This model card has been generated automatically according to the information Keras had access to. You should
11
+ probably proofread and complete it, then remove this comment. -->
12
 
13
+ # partypress-multilingual
14
 
15
+ This model is a fine-tuned version of [cornelius/partypress-multilingual](https://huggingface.co/cornelius/partypress-multilingual) on an unknown dataset.
16
+ It achieves the following results on the evaluation set:
17
 
18
 
19
  ## Model description
20
 
21
+ More information needed
 
 
 
 
 
22
 
23
  ## Intended uses & limitations
24
 
25
+ More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
+ ## Training and evaluation data
28
 
29
+ More information needed
 
 
 
 
30
 
31
  ## Training procedure
32
 
33
+ ### Training hyperparameters
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
+ The following hyperparameters were used during training:
36
+ - optimizer: None
37
+ - training_precision: float32
38
 
39
+ ### Training results
40
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
 
43
+ ### Framework versions
44
 
45
+ - Transformers 4.28.0
46
+ - TensorFlow 2.12.0
47
+ - Datasets 2.12.0
48
+ - Tokenizers 0.13.3
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "bert-base-multilingual-cased",
3
  "architectures": [
4
  "BertForSequenceClassification"
5
  ],
 
1
  {
2
+ "_name_or_path": "cornelius/partypress-multilingual",
3
  "architectures": [
4
  "BertForSequenceClassification"
5
  ],
tf_model.h5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7b54a95666573fd9b29e4d7c1e6ec4bcc92a43a078357a5d1ca6f0fa4b1f6d4f
3
+ size 711772524