r1

Browse files

Files changed (7) hide show

README.md +70 -0
config.json +35 -0
pytorch_model.bin +3 -0
special_tokens_map.json +1 -0
tf_model.h5 +3 -0
tokenizer_config.json +1 -0
vocab.txt +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,70 @@

+---
+language: sv
+---
+## Swedish BERT models for sentiment analysis
+ [Recorded Future](https://www.recordedfuture.com/) together with [AI Sweden](https://www.ai.se/en) releases two language models for sentiment analysis in Swedish. The two models are based on the [KB\/bert-base-swedish-cased](https://huggingface.co/KB/bert-base-swedish-cased) model and has been fine-tuned to solve a multi-label sentiment analysis task.
+The models have been fine-tuned for the sentiments fear and violence. The models output three floats corresponding to the labels "Negative", "Weak sentiment", and "Strong Sentiment" at the respective indexes.
+The models have been trained on Swedish data with a conversational focus, collected from various internet sources and forums.
+The models are only trained on Swedish data and only supports inference of Swedish input texts. The models inference metrics for all non-Swedish inputs are not defined, these inputs are considered as out of domain data.
+The current models are supported at Transformers version >= 4.3.3 and Torch version 1.8.0, compatibility with older versions are not verified.
+### Swedish-Sentiment-Fear
+The model can be imported from the transformers library by running
+    from transformers import BertForSequenceClassification, BertTokenizerFast
+    tokenizer = BertTokenizerFast.from_pretrained("fredrikmollerRF/Swedish-Sentiment-Fear")
+    classifier_fear= load_classifier("fredrikmollerRF/Swedish-Sentiment-Fear")
+When the model and tokenizer are initialized the model can be used for inference.
+#### Sentiment definitions
+#### The strong sentiment includes but are not limited to
+Texts that:
+ - Hold an expressive emphasis on fear and/ or anxiety
+#### The weak sentiment includes but are not limited to
+Texts that:
+- Express fear and/ or anxiety in a neutral way
+#### Verification metrics
+During training, the model had maximized validation metrics at the following classification breakpoint.
+| Classification Breakpoint | F-score | Precision | Recall |
+|:-------------------------:|:-------:|:---------:|:------:|
+|               0.45 |  0.8754 |   0.8618  | 0.8895 |
+#### Swedish-Sentiment-Violence
+The model be can imported from the transformers library by running
+    from transformers import BertForSequenceClassification, BertTokenizerFast
+    tokenizer = BertTokenizerFast.from_pretrained("fredrikmollerRF/Swedish-Sentiment-Violence")
+    classifier_violence = load_classifier("fredrikmollerRF/Swedish-Sentiment-Violence")
+When the model and tokenizer are initialized the model can be used for inference.
+### Sentiment definitions
+#### The strong sentiment includes but are not limited to
+Texts that:
+ -  Referencing highly violent acts
+-   Hold an aggressive tone
+#### The weak sentiment includes but are not limited to
+Texts that:
+-   Include general violent statements that do not fall under the strong sentiment
+#### Verification metrics
+During training, the model had maximized validation metrics at the following classification breakpoint.
+| Classification Breakpoint | F-score | Precision | Recall |
+|:-------------------------:|:-------:|:---------:|:------:|
+|            0.35           |  0.7677 |   0.7456  |  0.791 |

config.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "_name_or_path": "fredrikmollerRF/Swedish-Sentiment-Violence",
+  "architectures": [
+    "BertForSequenceClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "id2label": {
+    "0": "LABEL_0",
+    "1": "LABEL_1",
+    "2": "LABEL_2"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "label2id": {
+    "LABEL_0": 0,
+    "LABEL_1": 1,
+    "LABEL_2": 2
+  },
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "output_past": true,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "transformers_version": "4.3.3",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 50325
+}

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:70569a503ae2ea6655de1a6155670302623e61d39d73e9a9f9133deb0ea63d7c
+size 498860293

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}

tf_model.h5 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ccd15274620a8b45ea3bb872927493e94928e7635325d9a817f2271c87ad8502
+size 499044084

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1 @@

+ {"do_lower_case": false, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "tokenize_chinese_chars": true, "strip_accents": false, "special_tokens_map_file": "C:\\Users\\Fredrik Möller/.cache\\huggingface\\transformers\\37f2eab7cd9b3716ce0160ea9562138ae9247fb3ea61a2fd0190b16d0970444e.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d", "name_or_path": "KB/bert-base-swedish-cased", "do_basic_tokenize": true, "never_split": null}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff