metadata
license: apache-2.0
base_model: distilbert-base-uncased
tags:
- generated_from_trainer
metrics:
- accuracy
model-index:
- name: infoquality
results: []
infoquality
This model is a fine-tuned version of distilbert-base-uncased on a custom dataset curated by the model engineer. It achieves the following results on the evaluation set:
- Loss: 0.0015
- Accuracy: 0.9999
Model description
A binary classifier of text inputs (messages) designed to represent the quality of information with "High"
and "Low"
categories.
High
represents meaningful natural languageLow
represents cliché or otherwise meaningless natural language
Intended uses & limitations
Designed for natural language detection and/or weighting of natural language messages.
Training and evaluation data
Algorithmically curated from millions of publicly available social messages and, in some cases, programatically generated to reflect theoretical design principles.
Training procedure
# label maps
id2label = {0: "low", 1: "high"}
label2id = {"low": 0, "high": 1}
# auto model
model = AutoModelForSequenceClassification.from_pretrained(
"distilbert-base-uncased",
num_labels=2,
id2label=id2label,
label2id=label2id,
)
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 10
- eval_batch_size: 10
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 0.2
Training results
Epoch | Step | Val. Loss | Accuracy |
---|---|---|---|
0.01 | 10 | 0.4780 | 0.96 |
0.02 | 20 | 0.1759 | 0.965 |
0.03 | 30 | 0.0477 | 0.995 |
0.04 | 40 | 0.1199 | 0.95 |
0.05 | 50 | 0.0413 | 0.99 |
0.06 | 60 | 0.0068 | 1.0 |
0.07 | 70 | 0.0056 | 1.0 |
0.08 | 80 | 0.0220 | 0.995 |
0.09 | 90 | 0.0081 | 1.0 |
0.1 | 100 | 0.0074 | 0.995 |
0.11 | 110 | 0.0035 | 1.0 |
0.12 | 120 | 0.0030 | 1.0 |
0.13 | 130 | 0.0022 | 1.0 |
0.14 | 140 | 0.0024 | 1.0 |
0.15 | 150 | 0.0021 | 1.0 |
0.16 | 160 | 0.0016 | 1.0 |
0.17 | 170 | 0.0016 | 1.0 |
0.18 | 180 | 0.0016 | 1.0 |
0.19 | 190 | 0.0015 | 1.0 |
0.2 | 200 | 0.0015 | 1.0 |
Framework versions
- Transformers 4.32.1
- Pytorch 2.0.1
- Datasets 2.14.4
- Tokenizers 0.13.3