fylex's picture
Update README.md
a8b5702 verified
metadata
datasets:
  - fuliucansheng/pascal_voc
metrics:
  - roc_auc
base_model:
  - BobMcDear/swin_s3_base_224
pipeline_tag: image-classification
license: apache-2.0

🦒 Swin S3 Base (224) - Pascal VOC

A Swin S3 Base model fine-tuned on the Pascal VOC 2012 dataset for multi-class image classification.


🧠 Model Details

  • Architecture: Swin S3 Base (224x224 input size)
  • Pretrained on: ImageNet-1k
  • Fine-tuned on: Pascal VOC 2012
  • Framework: PyTorch (timm implementation)
  • Format: safetensors

🎯 Intended Use

  • Primary task: Image classification of natural scenes featuring objects from 20 Pascal VOC categories.
  • Users: Researchers, developers working on computer vision applications, model benchmarking.
  • Not intended for: Real-time decision making in critical applications (e.g., autonomous vehicles, medical diagnosis).

⚠️ Limitations and Ethical Considerations

  • Biases: The model inherits biases present in Pascal VOC, such as underrepresentation of certain object types, contexts, or demographics. It may perform poorly on out-of-distribution samples.
  • Ethical Use: Avoid using this model for applications that could reinforce harmful stereotypes, cause social harm, or violate privacy (e.g., surveillance).
  • Transparency: This model is shared for research and educational use and should not be deployed without thorough fairness, robustness, and security evaluations.

βš™οΈ Training Details

  • Training library: timm + PyTorch
  • Epochs: 5
  • Batch size: 16
  • Optimizer: AdamW
  • Learning rate: 5e-5
  • Scheduler: Cosine Annealing
  • Loss function: BCE
  • Hardware: 1x NVIDIA A100 on Google Colab Pro

ℹ️ Link to experiment tracking dashboard (e.g., Weights & Biases) (optional)


πŸ“Š Evaluation Results

Evaluated on Pascal VOC 2012 test set:

Metric Value
roc_auc 98.9%

Note: Evaluation performed using standard multi-class metrics. Model was not evaluated on cross-domain generalization.


πŸ“š Dataset

  • Name: Pascal VOC 2012
  • License: Creative Commons Attribution 4.0 International
  • Labels: 20 object categories (person, car, dog, etc.)
  • Split used: Training for fine-tuning, validation for evaluation

πŸ’Ύ Files in This Repository

  • model.safetensors: Model weights
  • README.md: Model card (this file)

πŸ”— Citations

@inproceedings{liu2021swin,
  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yu and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  booktitle={ICCV},
  year={2021}
}

@article{Everingham10,
  author = {Everingham, M. and Van Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.},
  title = {The Pascal Visual Object Classes (VOC) Challenge},
  journal = {IJCV},
  year = {2010},
  volume = {88},
  number = {2},
  pages = {303--338}
}