--- license: apache-2.0 base_model: google/vit-base-patch16-224-in21k tags: - generated_from_trainer metrics: - accuracy model-index: - name: Facial Expression Recognition results: - task: name: Image Classification type: image-classification metrics: - name: Accuracy type: accuracy value: 0.8571428571428571 --- <!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. --> # Vision Transformer (ViT) for Facial Expression Recognition Model Card ## Model Overview - **Model Name:** [motheecreator/vit-Facial-Expression-Recognition](https://huggingface.co/motheecreator/vit-Facial-Expression-Recognition) - **Task:** Facial Expression/Emotion Recognition - **Datasets:** [FER2013](https://www.kaggle.com/datasets/msambare/fer2013), [MMI Facial Expression Database](https://mmifacedb.eu) - **Model Architecture:** [Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit) - **Finetuned from model:** [vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) - Loss: 0.4353 - Accuracy: 0.8571 ## Model description The vit-face-expression model is a Vision Transformer fine-tuned for the task of facial emotion recognition. It is trained on the FER2013 and MMI facial Expression datasets , which consist of facial images categorized into seven different emotions: - Angry - Disgust - Fear - Happy - Sad - Surprise - Neutral ## Data Preprocessing The input images are preprocessed before being fed into the model. The preprocessing steps include: - **Resizing:** Images are resized to the specified input size. - **Normalization:** Pixel values are normalized to a specific range. - **Data Augmentation:** Random transformations such as rotations, flips, and zooms are applied to augment the training dataset. ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 32 - eval_batch_size: 32 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 10 ### Training results | Training Loss | Epoch | Step | Accuracy | Validation Loss | |:-------------:|:-----:|:----:|:--------:|:---------------:| | 0.7964 | 1.0 | 798 | 0.7271 | 0.7869 | | 0.6567 | 2.0 | 1596 | 0.7380 | 0.7539 | | 0.6842 | 3.0 | 2394 | 0.7837 | 0.6287 | | 0.5242 | 4.0 | 3192 | 0.7839 | 0.6282 | | 0.4321 | 5.0 | 3990 | 0.7823 | 0.6423 | | 0.3129 | 6.0 | 4788 | 0.7838 | 0.6533 | | 0.4245 | 7.0 | 5586 | 0.8542 | 0.4382 | | 0.3806 | 8.0 | 6384 | 0.8531 | 0.4375 | | 0.3112 | 9.0 | 7182 | 0.8557 | 0.4372 | | 0.2692 | 10.0 | 7980 | 0.8571 | 0.4353 | ### Framework versions - Transformers 4.36.0 - Pytorch 2.0.0 - Datasets 2.1.0 - Tokenizers 0.15.0