File size: 3,680 Bytes
d6faf92 950abd6 548412e 950abd6 31c6a7b 9aa3c3d 31c6a7b f34ee51 31c6a7b f34ee51 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
---
license: apache-2.0
datasets:
- prithivMLmods/OpenScene-Classification
language:
- en
base_model:
- google/siglip-base-patch16-512
pipeline_tag: image-classification
library_name: transformers
tags:
- SigLIP2
- Scene-Detection
- buildings
- forest
- glacier
- mountain
- sea
- street
---

# open-scene-detection
> open-scene-detection is a vision-language encoder model fine-tuned from [`siglip2-base-patch16-512`](https://huggingface.co/google/siglip-base-patch16-512) for multi-class scene classification. It is trained to recognize and categorize natural and urban scenes using a curated visual dataset. The model uses the `SiglipForImageClassification` architecture.
```py
Classification Report:
precision recall f1-score support
buildings 0.9755 0.9570 0.9662 2625
forest 0.9989 0.9955 0.9972 2694
glacier 0.9564 0.9517 0.9540 2671
mountain 0.9540 0.9592 0.9566 2723
sea 0.9934 0.9898 0.9916 2758
street 0.9595 0.9819 0.9706 2874
accuracy 0.9728 16345
macro avg 0.9730 0.9725 0.9727 16345
weighted avg 0.9729 0.9728 0.9728 16345
```

---
## Label Space: 6 Classes
The model classifies an image into one of the following scenes:
```
Class 0: Buildings
Class 1: Forest
Class 2: Glacier
Class 3: Mountain
Class 4: Sea
Class 5: Street
```
---
## Install Dependencies
```bash
pip install -q transformers torch pillow gradio hf_xet
```
---
## Inference Code
```python
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch
# Load model and processor
model_name = "prithivMLmods/open-scene-detection" # Updated model name
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
# Updated label mapping
id2label = {
"0": "Buildings",
"1": "Forest",
"2": "Glacier",
"3": "Mountain",
"4": "Sea",
"5": "Street"
}
def classify_image(image):
image = Image.fromarray(image).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
prediction = {
id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
}
return prediction
# Gradio Interface
iface = gr.Interface(
fn=classify_image,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(num_top_classes=6, label="Scene Classification"),
title="open-scene-detection",
description="Upload an image to classify the scene into one of six categories: Buildings, Forest, Glacier, Mountain, Sea, or Street."
)
if __name__ == "__main__":
iface.launch()
```
---
## Intended Use
`open-scene-detection` is designed for:
* **Scene Recognition** – Automatically classify natural and urban scenes.
* **Environmental Mapping** – Support geographic and ecological analysis from visual data.
* **Dataset Annotation** – Efficiently label large-scale image datasets by scene.
* **Visual Search and Organization** – Enable smart scene-based filtering or retrieval.
* **Autonomous Systems** – Assist navigation and perception modules with scene understanding. |