File size: 4,002 Bytes
ee9bf09 27be02e 6a4a11f 27be02e 6a4a11f 27be02e 6a4a11f 27be02e 6a4a11f 27be02e f48e248 5e35deb f48e248 704fea1 f48e248 704fea1 f48e248 704fea1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
---
license: mit
datasets:
- abullard1/steam-reviews-constructiveness-binary-label-annotations-1.5k
language:
- en
base_model: albert/albert-base-v2
pipeline_tag: text-classification
library_name: transformers
tags:
- steam-reviews
- BERT
- albert-base-v2
- text-classification
- sentiment-analysis
- constructiveness
- gaming
- sentiment-analysis
- text-classification
- fine-tuned
model-index:
- name: albert-v2-steam-review-constructiveness-classifier
results:
- task:
type: text-classification
dataset:
name: abullard1/steam-reviews-constructiveness-binary-label-annotations-1.5k
type: abullard1/steam-reviews-constructiveness-binary-label-annotations-1.5k
metrics:
- name: Accuracy
type: accuracy
value: 0.796
- name: Precision
type: precision
value: 0.800
- name: Recall
type: recall
value: 0.818
- name: F1-score
type: f1
value: 0.794
---
# Fine-tuned ALBERT Model for Constructiveness Detection in Steam Reviews
## Model Summary
This model is a fine-tuned version of **albert-base-v2**, designed to classify whether Steam game reviews are constructive or non-constructive. The model was trained on the [1.5K Steam Reviews Binary Labeled for Constructiveness dataset](https://huggingface.co/datasets/abullard1/steam-reviews-constructiveness-binary-label-annotations-1.5k), which consists of user-generated game reviews (along other features) labeled with binary labels (`1 for constructive` or `0 for non-constructive`).
The datasets featues were concatenated into Strings with the following format: "Review: **{review}**, Playtime: **{author_playtime_at_review}**, Voted Up: **{voted_up}**, Upvotes: **{votes_up}**, Votes Funny: **{votes_funny}**" and then fed to the model accompanied by the respective ***constructive*** labels. This approach of concatenating the features into a simple String offers a good trade-off between complexity and performance, compared to other options.
### Intended Use
The model can be applied in any scenario where it's important to distinguish between helpful and unhelpful textual feedback, particularly in the context of gaming communities or online reviews. Potential use cases are platforms like **Steam**, **Discord**, or any community-driven feedback systems where understanding the quality of feedback is critical.
### Limitations
The model may be less effective in domains outside of gaming, as it was trained specifically on Steam reviews. Additionally, a slightly **imbalanced dataset** was used for training (approximately 63% non-constructive, 37% constructive).
## Evaluation Results
The model was trained and evaluated using an 80/10/10 Train/Dev/Test split, achieving the following performance metrics during evaluation using the test set:
- **Accuracy**: 0.80
- **Precision**: 0.80
- **Recall**: 0.82
- **F1-score**: 0.79
These results indicate that the model performs reasonably well at identifying the correct label. (~80%)
## How to Use
### Via the Huggingface Space
The easiest way to test and try out the model is via its' [Huggingface Space](https://huggingface.co/spaces/abullard1/steam-review-constructiveness-classifier).
### Via the HF Transformers Library
You can also use this model through the Hugging Face transformers `pipeline` API for easy classification. Here's how to do it in Python:
```python
from transformers import pipeline
import torch
device = 0 if torch.cuda.is_available() else -1
torch_d_type = torch.float16 if torch.cuda.is_available() else torch.float32
base_model_name = "albert-base-v2"
finetuned_model_name = "abullard1/albert-v2-steam-review-constructiveness-classifier"
classifier = pipeline(
task="text-classification",
model=finetuned_model_name,
tokenizer=base_model_name,
device=device,
top_k=None,
truncation=True,
max_length=512,
torch_dtype=torch_d_type) |