metadata

license: mit
tags:
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: deberta-v3-base__sst2__all-train
    results: []

deberta-v3-basesst2all-train

This model is a fine-tuned version of microsoft/deberta-v3-base on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.6964
Accuracy: 0.49

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 50
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	7	0.6964	0.49
No log	2.0	14	0.7010	0.49
No log	3.0	21	0.7031	0.49
No log	4.0	28	0.7054	0.49

Framework versions

Transformers 4.15.0
Pytorch 1.10.2+cu102
Datasets 1.18.2
Tokenizers 0.10.3

Model Recycling

Evaluation on 36 datasets using SetFit/deberta-v3-base__sst2__all-train as a base model yields average score of 79.14 in comparison to 79.04 by microsoft/deberta-v3-base.

The model is ranked 3rd among all tested models for the microsoft/deberta-v3-base architecture as of 09/01/2023 Results:

20_newsgroup	ag_news	amazon_reviews_multi	anli	boolq	cb	cola	copa	dbpedia	esnli	financial_phrasebank	imdb	isear	mnli	mrpc	multirc	poem_sentiment	qnli	qqp	rotten_tomatoes	rte	sst2	sst_5bins	stsb	trec_coarse	trec_fine	tweet_ev_emoji	tweet_ev_emotion	tweet_ev_hate	tweet_ev_irony	tweet_ev_offensive	tweet_ev_sentiment	wic	wnli	wsc	yahoo_answers
86.4711	90.8	66.94	59.4063	84.4343	78.5714	86.9607	57	80	91.3986	86	94.452	71.6428	89.5952	90.1961	64.2533	87.5	93.3187	91.9936	90.2439	81.5884	94.7248	56.3801	89.96	98	90.8	47.014	84.4476	52.2896	78.8265	84.8837	70.8401	72.4138	67.6056	66.3462	71.7667

For more information, see: Model Recycling