|
--- |
|
language: |
|
- en |
|
tags: |
|
- formality |
|
datasets: |
|
- GYAFC |
|
- Pavlick-Tetreault-2016 |
|
license: cc-by-nc-sa-4.0 |
|
--- |
|
|
|
The model has been trained to predict for English sentences, whether they are formal or informal. |
|
|
|
Base model: `roberta-base` |
|
|
|
Datasets: [GYAFC](https://github.com/raosudha89/GYAFC-corpus) from [Rao and Tetreault, 2018](https://aclanthology.org/N18-1012) and [online formality corpus](http://www.seas.upenn.edu/~nlp/resources/formality-corpus.tgz) from [Pavlick and Tetreault, 2016](https://aclanthology.org/Q16-1005). |
|
|
|
Data augmentation: changing texts to upper or lower case; removing all punctuation, adding dot at the end of a sentence. It was applied because otherwise the model is over-reliant on punctuation and capitalization and does not pay enough attention to other features. |
|
|
|
Loss: binary classification (on GYAFC), in-batch ranking (on PT data). |
|
|
|
Performance metrics on the test data: |
|
|
|
| dataset | ROC AUC | precision | recall | fscore | accuracy | Spearman | |
|
|----------------------------------------------|---------|-----------|--------|--------|----------|------------| |
|
| GYAFC | 0.9779 | 0.90 | 0.91 | 0.90 | 0.9087 | 0.8233 | |
|
| GYAFC normalized (lowercase + remove punct.) | 0.9234 | 0.85 | 0.81 | 0.82 | 0.8218 | 0.7294 | |
|
|
|
| P&T subset | Spearman R | |
|
| - | - | |
|
news | 0.4003 |
|
answers | 0.7500 |
|
blog | 0.7334 |
|
email | 0.7606 |
|
|
|
## Citation |
|
If you are using the model in your research, please cite the following |
|
[paper](https://doi.org/10.1007/978-3-031-35320-8_4) where it was introduced: |
|
``` |
|
@InProceedings{10.1007/978-3-031-35320-8_4, |
|
author="Babakov, Nikolay |
|
and Dale, David |
|
and Gusev, Ilya |
|
and Krotova, Irina |
|
and Panchenko, Alexander", |
|
editor="M{\'e}tais, Elisabeth |
|
and Meziane, Farid |
|
and Sugumaran, Vijayan |
|
and Manning, Warren |
|
and Reiff-Marganiec, Stephan", |
|
title="Don't Lose the Message While Paraphrasing: A Study on Content Preserving Style Transfer", |
|
booktitle="Natural Language Processing and Information Systems", |
|
year="2023", |
|
publisher="Springer Nature Switzerland", |
|
address="Cham", |
|
pages="47--61", |
|
isbn="978-3-031-35320-8" |
|
} |
|
``` |
|
|
|
## Licensing Information |
|
|
|
[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa]. |
|
|
|
[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa] |
|
|
|
[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/ |
|
[cc-by-nc-sa-image]: https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png |