Object-segmentation / README.md
Alex Hortua
Readme
e98ae33
|
raw
history blame
4.07 kB

3D Person Segmentation and Anaglyph Generation

title: Object Segmentation emoji: πŸ‘ colorFrom: gray colorTo: pink sdk: gradio sdk_version: 5.22.0 app_file: src/app.py pinned: false

Lab Report

Introduction

This project implements a sophisticated 3D image processing system that combines person segmentation with stereoscopic and anaglyph image generation. The main objectives were to:

  1. Accurately segment people from images using advanced AI models
  2. Generate stereoscopic 3D effects from 2D images
  3. Create red-cyan anaglyph images for 3D viewing
  4. Provide an interactive web interface for real-time processing

Methodology

Tools and Technologies Used

  • SegFormer (nvidia/segformer-b0): State-of-the-art transformer-based model for semantic segmentation
  • PyTorch: Deep learning framework for running the SegFormer model
  • OpenCV: Image processing operations and mask refinement
  • Gradio: Web interface development
  • NumPy: Efficient array operations for image manipulation
  • PIL (Python Imaging Library): Image loading and basic transformations

Implementation Steps

  1. Person Segmentation

    • Utilized SegFormer model fine-tuned on ADE20K dataset
    • Applied post-processing with erosion and Gaussian blur for mask refinement
    • Implemented mask scaling and centering for various input sizes
  2. Stereoscopic Processing

    • Created depth simulation through horizontal pixel shifting
    • Implemented parallel view stereo pair generation
    • Added configurable interaxial distance for 3D effect adjustment
  3. Anaglyph Generation

    • Combined left and right eye views into red-cyan anaglyph
    • Implemented color channel separation and recombination
    • Added background image support with proper masking
  4. User Interface

    • Developed interactive web interface using Gradio
    • Added real-time parameter adjustment capabilities
    • Implemented support for custom background images

Results

The system produces three main outputs:

  1. Segmentation mask showing the isolated person
  2. Side-by-side stereo pair for parallel viewing
  3. Red-cyan anaglyph image for 3D glasses viewing

Key Features:

  • Adjustable person size (10-200%)
  • Configurable interaxial distance (0-10 pixels)
  • Optional custom background support
  • Real-time processing and preview

Discussion

Technical Challenges

  1. Mask Alignment: Ensuring proper alignment between segmentation masks and background images required careful consideration of image dimensions and aspect ratios.
  2. Stereo Effect Quality: Balancing the interaxial distance for comfortable viewing while maintaining the 3D effect.
  3. Performance Optimization: Efficient processing of large images while maintaining real-time interaction.

Learning Outcomes

  • Deep understanding of stereoscopic image generation
  • Experience with state-of-the-art segmentation models
  • Practical knowledge of image processing techniques
  • Web interface development for ML applications

Conclusion

This project successfully demonstrates the integration of modern AI-powered segmentation with classical stereoscopic image processing techniques. The system provides an accessible way to create 3D effects from regular 2D images.

Future Work

  • Implementation of depth-aware 3D effect generation
  • Support for video processing
  • Additional 3D viewing formats (side-by-side, over-under)
  • Enhanced background replacement options
  • Mobile device optimization

Setup

pip install -r requirements.txt

Usage

cd src
python app.py

Parameters

  • Person Image: Upload an image containing a person
  • Background Image: (Optional) Custom background image
  • Interaxial Distance: Adjust the 3D effect strength (0-10)
  • Person Size: Adjust the size of the person in the output (10-200%)

Output Types

  1. Segmentation Mask: Shows the isolated person
  2. Stereo Pair: Side-by-side stereo image for parallel viewing
  3. Anaglyph: Red-cyan 3D image viewable with anaglyph glasses