fylex
/

swin-s3-base-pascal_test

Image Classification

Safetensors

Model card Files Files and versions Community

fylex commited on Jun 27

Commit

a8b5702

verified ·

1 Parent(s): 01db41f

Update README.md

Browse files

Files changed (1) hide show

README.md +95 -2

README.md CHANGED Viewed

@@ -6,6 +6,99 @@ metrics:
 base_model:
 - BobMcDear/swin_s3_base_224
 pipeline_tag: image-classification
 ---
-# Simple Image Classification Swin Model
-### Trained on Pascal VOC2017

 base_model:
 - BobMcDear/swin_s3_base_224
 pipeline_tag: image-classification
+license: apache-2.0
 ---
+# 🦢 Swin S3 Base (224) - Pascal VOC
+A Swin S3 Base model fine-tuned on the Pascal VOC 2012 dataset for multi-class image classification.
+---
+## 🧠 Model Details
+- **Architecture**: Swin S3 Base (`224x224` input size)
+- **Pretrained on**: ImageNet-1k
+- **Fine-tuned on**: Pascal VOC 2012
+- **Framework**: PyTorch (`timm` implementation)
+- **Format**: `safetensors`
+---
+## 🎯 Intended Use
+- **Primary task**: Image classification of natural scenes featuring objects from 20 Pascal VOC categories.
+- **Users**: Researchers, developers working on computer vision applications, model benchmarking.
+- **Not intended for**: Real-time decision making in critical applications (e.g., autonomous vehicles, medical diagnosis).
+---
+## ⚠️ Limitations and Ethical Considerations
+- **Biases**: The model inherits biases present in Pascal VOC, such as underrepresentation of certain object types, contexts, or demographics. It may perform poorly on out-of-distribution samples.
+- **Ethical Use**: Avoid using this model for applications that could reinforce harmful stereotypes, cause social harm, or violate privacy (e.g., surveillance).
+- **Transparency**: This model is shared for research and educational use and should not be deployed without thorough fairness, robustness, and security evaluations.
+---
+## ⚙️ Training Details
+- **Training library**: `timm` + PyTorch
+- **Epochs**: 5
+- **Batch size**: 16
+- **Optimizer**: AdamW
+- **Learning rate**: 5e-5
+- **Scheduler**: Cosine Annealing
+- **Loss function**: BCE
+- **Hardware**: 1x NVIDIA A100 on Google Colab Pro
+> ℹ️ [Link to experiment tracking dashboard (e.g., Weights & Biases)](https://wandb.ai/your-project/your-run-id) *(optional)*
+---
+## 📊 Evaluation Results
+Evaluated on Pascal VOC 2012 test set:
+| Metric         | Value       |
+|----------------|-------------|
+|roc_auc         | 98.9%       |
+> *Note: Evaluation performed using standard multi-class metrics. Model was not evaluated on cross-domain generalization.*
+---
+## 📚 Dataset
+- **Name**: Pascal VOC 2012
+- **License**: Creative Commons Attribution 4.0 International
+- **Labels**: 20 object categories (person, car, dog, etc.)
+- **Split used**: Training for fine-tuning, validation for evaluation
+---
+## 💾 Files in This Repository
+- `model.safetensors`: Model weights
+- `README.md`: Model card (this file)
+---
+## 🔗 Citations
+```bibtex
+@inproceedings{liu2021swin,
+  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
+  author={Liu, Ze and Lin, Yutong and Cao, Yu and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
+  booktitle={ICCV},
+  year={2021}
+}
+@article{Everingham10,
+  author = {Everingham, M. and Van Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.},
+  title = {The Pascal Visual Object Classes (VOC) Challenge},
+  journal = {IJCV},
+  year = {2010},
+  volume = {88},
+  number = {2},
+  pages = {303--338}
+}