dennisjooo's picture
Update README.md
b948167
metadata
license: apache-2.0
base_model: google/vit-base-patch16-224-in21k
tags:
  - generated_from_trainer
datasets:
  - FastJobs/Visual_Emotional_Analysis
metrics:
  - accuracy
  - precision
  - f1
model-index:
  - name: emotion_classification
    results:
      - task:
          name: Image Classification
          type: image-classification
        dataset:
          name: FastJobs/Visual_Emotional_Analysis
          type: FastJobs/Visual_Emotional_Analysis
          config: FastJobs--Visual_Emotional_Analysis
          split: train
          args: FastJobs--Visual_Emotional_Analysis
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.66875
          - name: Precision
            type: precision
            value: 0.7104119480438352
          - name: F1
            type: f1
            value: 0.6712765732314218

Emotion Classification

This model is a fine-tuned version of google/vit-base-patch16-224-in21k on the FastJobs/Visual_Emotional_Analysis dataset.

In theory, the accuracy for a random guess on this dataset is 0.125 (8 labels and you need to choose one).

It achieves the following results on the evaluation set:

  • Loss: 1.0511
  • Accuracy: 0.6687
  • Precision: 0.7104
  • F1: 0.6713

Model description

The Vision Transformer base version trained on ImageNet-21K released by Google. Further details can be found on their repo.

Training and evaluation data

Data Split

Trained on FastJobs/Visual_Emotional_Analysis dataset. Used a 4:1 ratio for training and development sets and a random seed of 42. Also used a seed of 42 for batching the data, completely unrelated lol.

Pre-processing Augmentation

The main pre-processing phase for both training and evaluation includes:

  • Bilinear interpolation to resize the image to (224, 224, 3) because it uses ImageNet images to train the original model
  • Normalizing images using a mean and standard deviation of [0.5, 0.5, 0.5] just like the original model

Other than the aforementioned pre-processing, the training set was augmented using:

  • Random horizontal & vertical flip
  • Color jitter
  • Random resized crop

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine_with_restarts
  • lr_scheduler_warmup_steps: 150
  • num_epochs: 300

Training results

Training Loss Epoch Step Validation Loss Accuracy Precision F1
2.079 1.0 10 2.0895 0.0563 0.0604 0.0521
2.0789 2.0 20 2.0851 0.0563 0.0602 0.0529
2.0717 3.0 30 2.0773 0.0813 0.0858 0.0783
2.0613 4.0 40 2.0658 0.125 0.1997 0.1333
2.0445 5.0 50 2.0483 0.1875 0.2569 0.1934
2.0176 6.0 60 2.0206 0.2313 0.2692 0.2384
1.9894 7.0 70 1.9763 0.3063 0.3033 0.2983
1.9232 8.0 80 1.8912 0.3625 0.3307 0.3194
1.8256 9.0 90 1.7775 0.4062 0.3531 0.3600
1.732 10.0 100 1.6580 0.4688 0.4158 0.4133
1.6406 11.0 110 1.5597 0.5 0.4358 0.4370
1.5584 12.0 120 1.4855 0.5125 0.4792 0.4784
1.4898 13.0 130 1.4248 0.5437 0.5011 0.5098
1.4216 14.0 140 1.3692 0.5687 0.5255 0.5289
1.3701 15.0 150 1.3158 0.5687 0.5346 0.5360
1.3438 16.0 160 1.2842 0.5437 0.5451 0.5098
1.2799 17.0 170 1.2620 0.5625 0.5169 0.5194
1.2481 18.0 180 1.2321 0.5938 0.6003 0.5811
1.1993 19.0 190 1.2108 0.5687 0.5640 0.5412
1.1599 20.0 200 1.1853 0.55 0.5434 0.5259
1.1087 21.0 210 1.1839 0.5563 0.5670 0.5380
1.0757 22.0 220 1.1905 0.55 0.5682 0.5308
0.9985 23.0 230 1.1509 0.6375 0.6714 0.6287
0.9776 24.0 240 1.1048 0.6188 0.6222 0.6127
0.9331 25.0 250 1.1196 0.6125 0.6345 0.6072
0.8887 26.0 260 1.1424 0.5938 0.6174 0.5867
0.879 27.0 270 1.1232 0.6062 0.6342 0.5978
0.8369 28.0 280 1.1172 0.6 0.6480 0.5865
0.7864 29.0 290 1.1285 0.5938 0.6819 0.5763
0.7775 30.0 300 1.0511 0.6687 0.7104 0.6713
0.7281 31.0 310 1.0295 0.6562 0.6596 0.6514
0.7348 32.0 320 1.0398 0.6375 0.6353 0.6319
0.6896 33.0 330 1.0729 0.6062 0.6205 0.6062
0.613 34.0 340 1.0505 0.6438 0.6595 0.6421
0.6034 35.0 350 1.0827 0.6375 0.6593 0.6376
0.6236 36.0 360 1.1271 0.6125 0.6238 0.6087
0.5607 37.0 370 1.0985 0.6062 0.6254 0.6015
0.5835 38.0 380 1.0791 0.6375 0.6624 0.6370
0.5889 39.0 390 1.1300 0.6062 0.6529 0.6092
0.5137 40.0 400 1.1062 0.625 0.6457 0.6226
0.4804 41.0 410 1.1452 0.6188 0.6403 0.6158
0.4811 42.0 420 1.1271 0.6375 0.6478 0.6347
0.5179 43.0 430 1.1942 0.5875 0.6185 0.5874
0.4744 44.0 440 1.1515 0.6125 0.6329 0.6160
0.4327 45.0 450 1.1321 0.6375 0.6669 0.6412
0.4565 46.0 460 1.1742 0.625 0.6478 0.6251
0.4006 47.0 470 1.1675 0.6062 0.6361 0.6079
0.4541 48.0 480 1.1542 0.6125 0.6404 0.6152
0.3689 49.0 490 1.2190 0.5875 0.6134 0.5896
0.3794 50.0 500 1.2002 0.6062 0.6155 0.6005
0.429 51.0 510 1.2904 0.575 0.6207 0.5849
0.431 52.0 520 1.2416 0.5875 0.6028 0.5794
0.3813 53.0 530 1.2073 0.6125 0.6449 0.6142
0.365 54.0 540 1.2083 0.6062 0.6454 0.6075
0.3714 55.0 550 1.1627 0.6375 0.6576 0.6390
0.3393 56.0 560 1.1620 0.6438 0.6505 0.6389
0.3676 57.0 570 1.1501 0.625 0.6294 0.6258
0.3371 58.0 580 1.2779 0.5875 0.6000 0.5792
0.3325 59.0 590 1.2719 0.575 0.5843 0.5651
0.3509 60.0 600 1.2956 0.6 0.6422 0.6059

Framework versions

  • Transformers 4.33.0
  • Pytorch 2.0.0
  • Datasets 2.1.0
  • Tokenizers 0.13.3