File size: 3,114 Bytes
01db41f a8b5702 01db41f a8b5702 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
---
datasets:
- fuliucansheng/pascal_voc
metrics:
- roc_auc
base_model:
- BobMcDear/swin_s3_base_224
pipeline_tag: image-classification
license: apache-2.0
---
# ๐ฆข Swin S3 Base (224) - Pascal VOC
A Swin S3 Base model fine-tuned on the Pascal VOC 2012 dataset for multi-class image classification.
---
## ๐ง Model Details
- **Architecture**: Swin S3 Base (`224x224` input size)
- **Pretrained on**: ImageNet-1k
- **Fine-tuned on**: Pascal VOC 2012
- **Framework**: PyTorch (`timm` implementation)
- **Format**: `safetensors`
---
## ๐ฏ Intended Use
- **Primary task**: Image classification of natural scenes featuring objects from 20 Pascal VOC categories.
- **Users**: Researchers, developers working on computer vision applications, model benchmarking.
- **Not intended for**: Real-time decision making in critical applications (e.g., autonomous vehicles, medical diagnosis).
---
## โ ๏ธ Limitations and Ethical Considerations
- **Biases**: The model inherits biases present in Pascal VOC, such as underrepresentation of certain object types, contexts, or demographics. It may perform poorly on out-of-distribution samples.
- **Ethical Use**: Avoid using this model for applications that could reinforce harmful stereotypes, cause social harm, or violate privacy (e.g., surveillance).
- **Transparency**: This model is shared for research and educational use and should not be deployed without thorough fairness, robustness, and security evaluations.
---
## โ๏ธ Training Details
- **Training library**: `timm` + PyTorch
- **Epochs**: 5
- **Batch size**: 16
- **Optimizer**: AdamW
- **Learning rate**: 5e-5
- **Scheduler**: Cosine Annealing
- **Loss function**: BCE
- **Hardware**: 1x NVIDIA A100 on Google Colab Pro
> โน๏ธ [Link to experiment tracking dashboard (e.g., Weights & Biases)](https://wandb.ai/your-project/your-run-id) *(optional)*
---
## ๐ Evaluation Results
Evaluated on Pascal VOC 2012 test set:
| Metric | Value |
|----------------|-------------|
|roc_auc | 98.9% |
> *Note: Evaluation performed using standard multi-class metrics. Model was not evaluated on cross-domain generalization.*
---
## ๐ Dataset
- **Name**: Pascal VOC 2012
- **License**: Creative Commons Attribution 4.0 International
- **Labels**: 20 object categories (person, car, dog, etc.)
- **Split used**: Training for fine-tuning, validation for evaluation
---
## ๐พ Files in This Repository
- `model.safetensors`: Model weights
- `README.md`: Model card (this file)
---
## ๐ Citations
```bibtex
@inproceedings{liu2021swin,
title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
author={Liu, Ze and Lin, Yutong and Cao, Yu and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
booktitle={ICCV},
year={2021}
}
@article{Everingham10,
author = {Everingham, M. and Van Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.},
title = {The Pascal Visual Object Classes (VOC) Challenge},
journal = {IJCV},
year = {2010},
volume = {88},
number = {2},
pages = {303--338}
} |