larkkin commited on
Commit
af67bab
·
verified ·
1 Parent(s): 5a807f4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -1
README.md CHANGED
@@ -25,4 +25,49 @@ model-index:
25
  - name: Relative polarity precision
26
  type: Relative polarity precision
27
  value: 93.19%
28
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  - name: Relative polarity precision
26
  type: Relative polarity precision
27
  value: 93.19%
28
+ ---
29
+
30
+
31
+
32
+ This repository contains a pretrained model (and an easy-to-run wrapper for it) for structured sentiment analysis in Norwegian language, pre-trained on the [NoReC dataset](https://huggingface.co/datasets/norec).
33
+ This is an implementation of the method described in "Direct parsing to sentiment graphs" (Samuel _et al._, ACL 2022). The main repository that also contains the scripts for training the model, can be found on the project [github](https://github.com/jerbarnes/direct_parsing_to_sent_graph).
34
+ The model is also available in the form of a [HF space](https://huggingface.co/spaces/ltg/ssa-perin).
35
+
36
+
37
+ The sentiment graph model is based on an underlying masked language model – [NorBERT 2](https://huggingface.co/ltg/norbert2).
38
+ The proposed method suggests three different ways to encode the sentiment graph: "node-centric", "labeled-edge", and "opinion-tuple".
39
+ The current model
40
+ - uses "labeled-edge" graph encoding
41
+ - does not use character-level embedding
42
+ - all other hyperparameters are set to [default values](https://github.com/jerbarnes/direct_parsing_to_sent_graph/blob/main/perin/config/edge_norec.yaml)
43
+ , and it achieves the following results on the held-out set of the NoReC dataset:
44
+
45
+ | Unlabeled sentiment tuple F1 | Target F1 | Relative polarity precision |
46
+ |:----------------------------:|:----------:|:---------------------------:|
47
+ | 0.434 | 0.541 | 0.926 |
48
+
49
+
50
+ In "Word Substitution with Masked Language Models as Data Augmentation for Sentiment Analysis", we analyzed data augmentation strategies for improving performance of the model. Using masked-language modeling (MLM), we augmented the sentences with MLM-substituted words inside, outside, or inside+outside the actual sentiment tuples. The results below show that augmentation may be improve the model performance. This space, however, runs the original model trained without augmentation.
51
+
52
+ | | Augmentation rate | Unlabeled sentiment tuple F1 | Target F1 | Relative polarity precision |
53
+ |----------------|-------------------|------------------------------|-----------|-----------------------------|
54
+ | Baseline | 0% | 43.39 | 54.13 | 92.59 |
55
+ | Outside | 59% | **45.08** | 56.18 | 92.95 |
56
+ | Inside | 9% | 43.38 | 55.62 | 92.49 |
57
+ | Inside+Outside | 27% | 44.12 | **56.44** | **93.19** |
58
+
59
+
60
+
61
+ The model can be easily used for predicting sentiment tuples as follows:
62
+
63
+ ```python
64
+ >>> import model_wrapper
65
+ >>> model = model_wrapper.PredictionModel()
66
+ >>> model.predict(['vi liker svart kaffe'])
67
+ [{'sent_id': '0',
68
+ 'text': 'vi liker svart kaffe',
69
+ 'opinions': [{'Source': [['vi'], ['0:2']],
70
+ 'Target': [['svart', 'kaffe'], ['9:14', '15:20']],
71
+ 'Polar_expression': [['liker'], ['3:8']],
72
+ 'Polarity': 'Positive'}]}]
73
+ ```