File size: 3,680 Bytes
d6faf92
 
 
950abd6
 
 
 
548412e
950abd6
 
 
 
 
 
 
 
 
 
 
31c6a7b
9aa3c3d
 
31c6a7b
f34ee51
 
 
 
31c6a7b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f34ee51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
license: apache-2.0
datasets:
- prithivMLmods/OpenScene-Classification
language:
- en
base_model:
- google/siglip-base-patch16-512
pipeline_tag: image-classification
library_name: transformers
tags:
- SigLIP2
- Scene-Detection
- buildings
- forest
- glacier
- mountain
- sea
- street
---
 
![scene.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/31-sygJAsY1LaPKIeCylh.png)

# open-scene-detection

> open-scene-detection is a vision-language encoder model fine-tuned from [`siglip2-base-patch16-512`](https://huggingface.co/google/siglip-base-patch16-512) for multi-class scene classification. It is trained to recognize and categorize natural and urban scenes using a curated visual dataset. The model uses the `SiglipForImageClassification` architecture.

```py
Classification Report:
              precision    recall  f1-score   support

   buildings     0.9755    0.9570    0.9662      2625
      forest     0.9989    0.9955    0.9972      2694
     glacier     0.9564    0.9517    0.9540      2671
    mountain     0.9540    0.9592    0.9566      2723
         sea     0.9934    0.9898    0.9916      2758
      street     0.9595    0.9819    0.9706      2874

    accuracy                         0.9728     16345
   macro avg     0.9730    0.9725    0.9727     16345
weighted avg     0.9729    0.9728    0.9728     16345
```

![download.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/oqlb8a1p6zJuNZSI9PgZO.png)

---

## Label Space: 6 Classes

The model classifies an image into one of the following scenes:

```
Class 0: Buildings  
Class 1: Forest  
Class 2: Glacier  
Class 3: Mountain  
Class 4: Sea  
Class 5: Street
```

---

## Install Dependencies

```bash
pip install -q transformers torch pillow gradio hf_xet
```

---

## Inference Code

```python
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch

# Load model and processor
model_name = "prithivMLmods/open-scene-detection"  # Updated model name
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)

# Updated label mapping
id2label = {
    "0": "Buildings",
    "1": "Forest",
    "2": "Glacier",
    "3": "Mountain",
    "4": "Sea",
    "5": "Street"
}

def classify_image(image):
    image = Image.fromarray(image).convert("RGB")
    inputs = processor(images=image, return_tensors="pt")

    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()

    prediction = {
        id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
    }

    return prediction

# Gradio Interface
iface = gr.Interface(
    fn=classify_image,
    inputs=gr.Image(type="numpy"),
    outputs=gr.Label(num_top_classes=6, label="Scene Classification"),
    title="open-scene-detection",
    description="Upload an image to classify the scene into one of six categories: Buildings, Forest, Glacier, Mountain, Sea, or Street."
)

if __name__ == "__main__":
    iface.launch()
```

---

## Intended Use

`open-scene-detection` is designed for:

* **Scene Recognition** – Automatically classify natural and urban scenes.
* **Environmental Mapping** – Support geographic and ecological analysis from visual data.
* **Dataset Annotation** – Efficiently label large-scale image datasets by scene.
* **Visual Search and Organization** – Enable smart scene-based filtering or retrieval.
* **Autonomous Systems** – Assist navigation and perception modules with scene understanding.