File size: 3,114 Bytes
01db41f
 
 
 
 
 
 
 
a8b5702
01db41f
a8b5702
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
datasets:
- fuliucansheng/pascal_voc
metrics:
- roc_auc
base_model:
- BobMcDear/swin_s3_base_224
pipeline_tag: image-classification
license: apache-2.0
---
# ๐Ÿฆข Swin S3 Base (224) - Pascal VOC

A Swin S3 Base model fine-tuned on the Pascal VOC 2012 dataset for multi-class image classification.

---

## ๐Ÿง  Model Details

- **Architecture**: Swin S3 Base (`224x224` input size)
- **Pretrained on**: ImageNet-1k
- **Fine-tuned on**: Pascal VOC 2012
- **Framework**: PyTorch (`timm` implementation)
- **Format**: `safetensors`

---

## ๐ŸŽฏ Intended Use

- **Primary task**: Image classification of natural scenes featuring objects from 20 Pascal VOC categories.
- **Users**: Researchers, developers working on computer vision applications, model benchmarking.
- **Not intended for**: Real-time decision making in critical applications (e.g., autonomous vehicles, medical diagnosis).

---

## โš ๏ธ Limitations and Ethical Considerations

- **Biases**: The model inherits biases present in Pascal VOC, such as underrepresentation of certain object types, contexts, or demographics. It may perform poorly on out-of-distribution samples.
- **Ethical Use**: Avoid using this model for applications that could reinforce harmful stereotypes, cause social harm, or violate privacy (e.g., surveillance).
- **Transparency**: This model is shared for research and educational use and should not be deployed without thorough fairness, robustness, and security evaluations.

---

## โš™๏ธ Training Details

- **Training library**: `timm` + PyTorch
- **Epochs**: 5
- **Batch size**: 16
- **Optimizer**: AdamW
- **Learning rate**: 5e-5
- **Scheduler**: Cosine Annealing
- **Loss function**: BCE
- **Hardware**: 1x NVIDIA A100 on Google Colab Pro

> โ„น๏ธ [Link to experiment tracking dashboard (e.g., Weights & Biases)](https://wandb.ai/your-project/your-run-id) *(optional)*

---

## ๐Ÿ“Š Evaluation Results

Evaluated on Pascal VOC 2012 test set:

| Metric         | Value       |
|----------------|-------------|
|roc_auc         | 98.9%       |

> *Note: Evaluation performed using standard multi-class metrics. Model was not evaluated on cross-domain generalization.*

---

## ๐Ÿ“š Dataset

- **Name**: Pascal VOC 2012
- **License**: Creative Commons Attribution 4.0 International
- **Labels**: 20 object categories (person, car, dog, etc.)
- **Split used**: Training for fine-tuning, validation for evaluation

---

## ๐Ÿ’พ Files in This Repository

- `model.safetensors`: Model weights
- `README.md`: Model card (this file)

---

## ๐Ÿ”— Citations

```bibtex
@inproceedings{liu2021swin,
  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yu and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  booktitle={ICCV},
  year={2021}
}

@article{Everingham10,
  author = {Everingham, M. and Van Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.},
  title = {The Pascal Visual Object Classes (VOC) Challenge},
  journal = {IJCV},
  year = {2010},
  volume = {88},
  number = {2},
  pages = {303--338}
}