|
--- |
|
datasets: |
|
- fuliucansheng/pascal_voc |
|
metrics: |
|
- roc_auc |
|
base_model: |
|
- BobMcDear/swin_s3_base_224 |
|
pipeline_tag: image-classification |
|
license: apache-2.0 |
|
--- |
|
# π¦’ Swin S3 Base (224) - Pascal VOC |
|
|
|
A Swin S3 Base model fine-tuned on the Pascal VOC 2012 dataset for multi-class image classification. |
|
|
|
--- |
|
|
|
## π§ Model Details |
|
|
|
- **Architecture**: Swin S3 Base (`224x224` input size) |
|
- **Pretrained on**: ImageNet-1k |
|
- **Fine-tuned on**: Pascal VOC 2012 |
|
- **Framework**: PyTorch (`timm` implementation) |
|
- **Format**: `safetensors` |
|
|
|
--- |
|
|
|
## π― Intended Use |
|
|
|
- **Primary task**: Image classification of natural scenes featuring objects from 20 Pascal VOC categories. |
|
- **Users**: Researchers, developers working on computer vision applications, model benchmarking. |
|
- **Not intended for**: Real-time decision making in critical applications (e.g., autonomous vehicles, medical diagnosis). |
|
|
|
--- |
|
|
|
## β οΈ Limitations and Ethical Considerations |
|
|
|
- **Biases**: The model inherits biases present in Pascal VOC, such as underrepresentation of certain object types, contexts, or demographics. It may perform poorly on out-of-distribution samples. |
|
- **Ethical Use**: Avoid using this model for applications that could reinforce harmful stereotypes, cause social harm, or violate privacy (e.g., surveillance). |
|
- **Transparency**: This model is shared for research and educational use and should not be deployed without thorough fairness, robustness, and security evaluations. |
|
|
|
--- |
|
|
|
## βοΈ Training Details |
|
|
|
- **Training library**: `timm` + PyTorch |
|
- **Epochs**: 5 |
|
- **Batch size**: 16 |
|
- **Optimizer**: AdamW |
|
- **Learning rate**: 5e-5 |
|
- **Scheduler**: Cosine Annealing |
|
- **Loss function**: BCE |
|
- **Hardware**: 1x NVIDIA A100 on Google Colab Pro |
|
|
|
> βΉοΈ [Link to experiment tracking dashboard (e.g., Weights & Biases)](https://wandb.ai/your-project/your-run-id) *(optional)* |
|
|
|
--- |
|
|
|
## π Evaluation Results |
|
|
|
Evaluated on Pascal VOC 2012 test set: |
|
|
|
| Metric | Value | |
|
|----------------|-------------| |
|
|roc_auc | 98.9% | |
|
|
|
> *Note: Evaluation performed using standard multi-class metrics. Model was not evaluated on cross-domain generalization.* |
|
|
|
--- |
|
|
|
## π Dataset |
|
|
|
- **Name**: Pascal VOC 2012 |
|
- **License**: Creative Commons Attribution 4.0 International |
|
- **Labels**: 20 object categories (person, car, dog, etc.) |
|
- **Split used**: Training for fine-tuning, validation for evaluation |
|
|
|
--- |
|
|
|
## πΎ Files in This Repository |
|
|
|
- `model.safetensors`: Model weights |
|
- `README.md`: Model card (this file) |
|
|
|
--- |
|
|
|
## π Citations |
|
|
|
```bibtex |
|
@inproceedings{liu2021swin, |
|
title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows}, |
|
author={Liu, Ze and Lin, Yutong and Cao, Yu and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining}, |
|
booktitle={ICCV}, |
|
year={2021} |
|
} |
|
|
|
@article{Everingham10, |
|
author = {Everingham, M. and Van Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.}, |
|
title = {The Pascal Visual Object Classes (VOC) Challenge}, |
|
journal = {IJCV}, |
|
year = {2010}, |
|
volume = {88}, |
|
number = {2}, |
|
pages = {303--338} |
|
} |