aychang's picture
Add evaluation results on trec dataset (#1)
70e14db
|
raw
history blame
3.58 kB
metadata
language:
  - en
thumbnail: null
tags:
  - text-classification
license: mit
datasets:
  - trec
metrics: null
model-index:
  - name: aychang/distilbert-base-cased-trec-coarse
    results:
      - task:
          type: text-classification
          name: Text Classification
        dataset:
          name: trec
          type: trec
          config: default
          split: test
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.97
            verified: true
          - name: Precision Macro
            type: precision
            value: 0.9742915631870833
            verified: true
          - name: Precision Micro
            type: precision
            value: 0.97
            verified: true
          - name: Precision Weighted
            type: precision
            value: 0.9699546283251607
            verified: true
          - name: Recall Macro
            type: recall
            value: 0.972626762268805
            verified: true
          - name: Recall Micro
            type: recall
            value: 0.97
            verified: true
          - name: Recall Weighted
            type: recall
            value: 0.97
            verified: true
          - name: F1 Macro
            type: f1
            value: 0.9729834427867218
            verified: true
          - name: F1 Micro
            type: f1
            value: 0.97
            verified: true
          - name: F1 Weighted
            type: f1
            value: 0.9694196751375908
            verified: true
          - name: loss
            type: loss
            value: 0.14272506535053253
            verified: true

TREC 6-class Task: distilbert-base-cased

Model description

A simple base distilBERT model trained on the "trec" dataset.

Intended uses & limitations

How to use

Transformers
# Load model and tokenizer
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Use pipeline
from transformers import pipeline

model_name = "aychang/distilbert-base-cased-trec-coarse"

nlp = pipeline("sentiment-analysis", model=model_name, tokenizer=model_name)

results = nlp(["Where did the queen go?", "Why did the Queen hire 1000 ML Engineers?"])
AdaptNLP
from adaptnlp import EasySequenceClassifier

model_name = "aychang/distilbert-base-cased-trec-coarse"
texts = ["Where did the queen go?", "Why did the Queen hire 1000 ML Engineers?"]

classifer = EasySequenceClassifier
results = classifier.tag_text(text=texts, model_name_or_path=model_name, mini_batch_size=2)

Limitations and bias

This is minimal language model trained on a benchmark dataset.

Training data

TREC https://huggingface.co/datasets/trec

Training procedure

Preprocessing, hardware used, hyperparameters...

Hardware

One V100

Hyperparameters and Training Args

from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir='./models',
    overwrite_output_dir=False,
    num_train_epochs=2,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    warmup_steps=500,
    weight_decay=0.01,
    evaluation_strategy="steps",
    logging_dir='./logs',
    fp16=False,
    eval_steps=500,
    save_steps=300000
)

Eval results

{'epoch': 2.0,
 'eval_accuracy': 0.97,
 'eval_f1': array([0.98220641, 0.91620112, 1.        , 0.97709924, 0.98678414,
        0.97560976]),
 'eval_loss': 0.14275787770748138,
 'eval_precision': array([0.96503497, 0.96470588, 1.        , 0.96969697, 0.98245614,
        0.96385542]),
 'eval_recall': array([1.        , 0.87234043, 1.        , 0.98461538, 0.99115044,
        0.98765432]),
 'eval_runtime': 0.9731,
 'eval_samples_per_second': 513.798}