fylex
/

swin-s3-base-pascal_test

Image Classification

Model card Files Files and versions Community

swin-s3-base-pascal_test / README.md

fylex's picture

Update README.md

a8b5702 verified about 2 months ago

|

history blame contribute delete

3.11 kB

	---
	datasets:
	- fuliucansheng/pascal_voc
	metrics:
	- roc_auc
	base_model:
	- BobMcDear/swin_s3_base_224
	pipeline_tag: image-classification
	license: apache-2.0
	---
	# 🦢 Swin S3 Base (224) - Pascal VOC

	A Swin S3 Base model fine-tuned on the Pascal VOC 2012 dataset for multi-class image classification.

	---

	## 🧠 Model Details

	- Architecture: Swin S3 Base (`224x224` input size)
	- Pretrained on: ImageNet-1k
	- Fine-tuned on: Pascal VOC 2012
	- Framework: PyTorch (`timm` implementation)
	- Format: `safetensors`

	---

	## 🎯 Intended Use

	- Primary task: Image classification of natural scenes featuring objects from 20 Pascal VOC categories.
	- Users: Researchers, developers working on computer vision applications, model benchmarking.
	- Not intended for: Real-time decision making in critical applications (e.g., autonomous vehicles, medical diagnosis).

	---

	## ⚠️ Limitations and Ethical Considerations

	- Biases: The model inherits biases present in Pascal VOC, such as underrepresentation of certain object types, contexts, or demographics. It may perform poorly on out-of-distribution samples.
	- Ethical Use: Avoid using this model for applications that could reinforce harmful stereotypes, cause social harm, or violate privacy (e.g., surveillance).
	- Transparency: This model is shared for research and educational use and should not be deployed without thorough fairness, robustness, and security evaluations.

	---

	## ⚙️ Training Details

	- Training library: `timm` + PyTorch
	- Epochs: 5
	- Batch size: 16
	- Optimizer: AdamW
	- Learning rate: 5e-5
	- Scheduler: Cosine Annealing
	- Loss function: BCE
	- Hardware: 1x NVIDIA A100 on Google Colab Pro

	> ℹ️ [Link to experiment tracking dashboard (e.g., Weights & Biases)](https://wandb.ai/your-project/your-run-id) (optional)

	---

	## 📊 Evaluation Results

	Evaluated on Pascal VOC 2012 test set:

	\| Metric \| Value \|
	\|----------------\|-------------\|
	\|roc_auc \| 98.9% \|

	> Note: Evaluation performed using standard multi-class metrics. Model was not evaluated on cross-domain generalization.

	---

	## 📚 Dataset

	- Name: Pascal VOC 2012
	- License: Creative Commons Attribution 4.0 International
	- Labels: 20 object categories (person, car, dog, etc.)
	- Split used: Training for fine-tuning, validation for evaluation

	---

	## 💾 Files in This Repository

	- `model.safetensors`: Model weights
	- `README.md`: Model card (this file)

	---

	## 🔗 Citations

	```bibtex
	@inproceedings{liu2021swin,
	title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
	author={Liu, Ze and Lin, Yutong and Cao, Yu and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
	booktitle={ICCV},
	year={2021}
	}

	@article{Everingham10,
	author = {Everingham, M. and Van Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.},
	title = {The Pascal Visual Object Classes (VOC) Challenge},
	journal = {IJCV},
	year = {2010},
	volume = {88},
	number = {2},
	pages = {303--338}
	}