|
--- |
|
language: vi |
|
tags: |
|
- spam-detection |
|
- vietnamese |
|
- bartpho |
|
license: apache-2.0 |
|
datasets: |
|
- visolex/ViSpamReviews |
|
metrics: |
|
- accuracy |
|
- f1 |
|
model-index: |
|
- name: bartpho-spam-binary |
|
results: |
|
- task: |
|
type: text-classification |
|
name: Spam Detection (Binary) |
|
dataset: |
|
name: ViSpamReviews |
|
type: custom |
|
metrics: |
|
- name: Accuracy |
|
type: accuracy |
|
value: <INSERT_ACCURACY> |
|
- name: F1 Score |
|
type: f1 |
|
value: <INSERT_F1_SCORE> |
|
base_model: |
|
- vinai/bartpho-syllable |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# BARTPho-Spam-Binary |
|
|
|
Fine-tuned from [`vinai/bartpho-syllable`](https://huggingface.co/vinai/bartpho-syllable) on **ViSpamReviews** (binary). |
|
|
|
* **Task**: Binary classification |
|
* **Dataset**: [ViSpamReviews](https://huggingface.co/datasets/visolex/ViSpamReviews) |
|
* **Hyperparameters** |
|
|
|
* Batch size: 32 |
|
* LR: 3e-5 |
|
* Epochs: 100 |
|
* Max seq len: 256 |
|
## Usage |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("visolex/bartpho-spam-binary") |
|
model = AutoModelForSequenceClassification.from_pretrained("visolex/bartpho-spam-binary") |
|
|
|
text = "Review n脿y kh么ng c贸 th岷璽." |
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256) |
|
pred = model(**inputs).logits.argmax(dim=-1).item() |
|
print("Spam" if pred==1 else "Non-spam") |
|
``` |
|
|