File size: 4,335 Bytes
6499cea 3fe3997 6ce2907 3fe3997 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
---
license: mit
language:
- nl
base_model:
- pdelobelle/robbert-v2-dutch-base
pipeline_tag: text-classification
tags:
- Robbert
- Angry
- finetune
---
# Model Card for AngryBERT
<!-- Provide a quick summary of what the model is/does. -->
This model is a finetuning of [pdelobelle/robbert-v2-dutch-base](https://huggingface.co/pdelobelle/robbert-v2-dutch-base) for the classificaion of text as angry or non-angry.
## Model Details
### Model Description
This model is a finetuning of [pdelobelle/robbert-v2-dutch-base](https://huggingface.co/pdelobelle/robbert-v2-dutch-base) on a selection of paragraphs mined from the Dutch novel "Ik ga leven" by Lale Gül. (Lale Gül,*Ik ga leven*. 2021. Amsterdam: Prometheus. ISBN 978-9044646870. An English translation of the novel exists: Lale Gül, *I Will Live*. 2023. London: Little, Brown Book Group. ISBN 978-1408716809). The intention of the model is to be able to classify sentences and paragraphs of the book as angry or non-angry. A selection of paragraph was annotated by two individual annotators for angriness (55 paragraphs, Cohen's Kappa of 0.48).
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** Joris J. van Zundert and Julia Neugarten
- **Funded by [optional]:** Huygens Institute
- **Shared by [optional]:** {{ shared_by | default("[More Information Needed]", true)}}
- **Model type:** text classification
- **Language(s) (NLP):** Dutch
- **License:** MIT
- **Finetuned from model [optional]:** robbert-v2-dutch-base
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
This model should really **only** be used in the context of research towards the full text of the Dutch version of Lale Güls "Ik ga leven". Any other application is disadvised as the model has only been fine tuned on this specific novel. All results obtained with this model otherwise should be treated witht the greatest care and skeptism.
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
The model is biased towards the language of Lale Gül in her novel "Ik ga leven". This may include skew towards explicit and aggressive language.
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
This model should really **only** be used in the context of research towards the full text of the Dutch version of Lale Güls "Ik ga leven". Any other application is disadvised as the model has only been fine tuned on this specific novel. All results obtained with this model otherwise should be treated witht the greatest care and skeptism.
## How to Get Started with the Model
Use the code below to get started with the model.
```
from transformers import RobertaTokenizer, RobertaForSequenceClassification
from transformers import TextClassificationPipeline
model = RobertaForSequenceClassification.from_pretrained( "./model/angryBERT-v1" )
tokenizer = RobertaTokenizer.from_pretrained( "./model/angryBERT-v1" )
# Just cheking if the model works
# LABEL_1 means angry
# LABEL-0 means non-angry
input_text = "Ik was kwaad." # en.: "I was angry."
pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True)
pipe( input_text )
# =>
# [[{'label': 'LABEL_0', 'score': 0.026506226509809494},
# {'label': 'LABEL_1', 'score': 0.9734938144683838}]]
```
## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
All paragraphs of Lale Gül's novel (Dutch) *Ik ga leven*. Paratext (copyright, title page, etc.) removed, also removed the section of poems at the back of the text.
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
Trained on 55 paragraphs labeled as either angry (1) or non_angry (0).
## Model Card Authors [optional]
Joris J. van Zundert, Julia Neugarten
## Model Card Contact
[Joris J. van Zundert](https://huggingface.co/jorisvanzundert)
|