Segformer-b0-scene-parse-150

This model is a fine-tuned version of the nvidia/mit-b0 model, specifically trained on the scene_parse_150 dataset. The goal of this model is to perform semantic segmentation for various scene parsing tasks.

Evaluation Results:

The model achieved the following results on the evaluation dataset:

  • Loss: 1.8435
  • Mean IoU: 0.0881
  • Mean Accuracy: 0.1619
  • Overall Accuracy: 0.6663

Per-Category IoU and Per-Category Accuracy values are available but sparse, indicating performance variability across different categories.

Model Description

Segformer-b0 is based on a modified version of the Vision Transformer (ViT) architecture, adapted for efficient segmentation tasks. It incorporates hierarchical features to generate high-quality segmentation maps.

More detailed model descriptions, including architectural adjustments or preprocessing requirements, are needed.

Intended Uses & Limitations

  • Use Cases: Suitable for scene parsing and segmentation tasks in environments with diverse visual categories.
  • Limitations: Performance varies significantly between categories, as seen from sparse accuracy and IoU metrics. The model may struggle with underrepresented classes or categories with fewer visual distinctions.
  • Further details on intended domains and limitations are needed.

Training and Evaluation Data

The model was trained on the scene_parse_150 dataset, which consists of diverse visual scenes with 150 unique semantic categories. Further information on dataset specifics and any preprocessing steps is needed.

Training Procedure

Hyperparameters:

  • Learning Rate: 6e-05
  • Training Batch Size: 2
  • Evaluation Batch Size: 2
  • Seed: 42
  • Optimizer: Adam (betas=(0.9, 0.999), epsilon=1e-08)
  • Learning Rate Scheduler: Linear
  • Number of Epochs: 50

Training Results:

The model was trained over 50 epochs, but further details regarding its convergence behavior, training duration, and hardware environment could provide additional insights.

Framework Versions:

  • Transformers 4.44.2
  • PyTorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1
Downloads last month
12
Safetensors
Model size
3.75M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ashaduzzaman/segformer-b0-scene-parse-150

Base model

nvidia/mit-b0
Finetuned
(335)
this model

Dataset used to train ashaduzzaman/segformer-b0-scene-parse-150