3D Person Segmentation and Anaglyph Generation

title: Object Segmentation emoji: 👁 colorFrom: gray colorTo: pink sdk: gradio sdk_version: 5.22.0 app_file: src/app.py pinned: false

Lab Report

Introduction

This project implements a sophisticated 3D image processing system that combines person segmentation with stereoscopic and anaglyph image generation. The main objectives were to:

Accurately segment people from images using advanced AI models
Generate stereoscopic 3D effects from 2D images
Create red-cyan anaglyph images for 3D viewing
Provide an interactive web interface for real-time processing

Methodology

Tools and Technologies Used

SegFormer (nvidia/segformer-b0): State-of-the-art transformer-based model for semantic segmentation
PyTorch: Deep learning framework for running the SegFormer model
OpenCV: Image processing operations and mask refinement
Gradio: Web interface development
NumPy: Efficient array operations for image manipulation
PIL (Python Imaging Library): Image loading and basic transformations

Implementation Steps

Person Segmentation
- Utilized SegFormer model fine-tuned on ADE20K dataset
- Applied post-processing with erosion and Gaussian blur for mask refinement
- Implemented mask scaling and centering for various input sizes
Stereoscopic Processing
- Created depth simulation through horizontal pixel shifting
- Implemented parallel view stereo pair generation
- Added configurable interaxial distance for 3D effect adjustment
Anaglyph Generation
- Combined left and right eye views into red-cyan anaglyph
- Implemented color channel separation and recombination
- Added background image support with proper masking
User Interface
- Developed interactive web interface using Gradio
- Added real-time parameter adjustment capabilities
- Implemented support for custom background images

Results

The system produces three main outputs:

Segmentation mask showing the isolated person
Side-by-side stereo pair for parallel viewing
Red-cyan anaglyph image for 3D glasses viewing

Key Features:

Adjustable person size (10-200%)
Configurable interaxial distance (0-10 pixels)
Optional custom background support
Real-time processing and preview

Discussion

Technical Challenges

Mask Alignment: Ensuring proper alignment between segmentation masks and background images required careful consideration of image dimensions and aspect ratios.
Stereo Effect Quality: Balancing the interaxial distance for comfortable viewing while maintaining the 3D effect.
Performance Optimization: Efficient processing of large images while maintaining real-time interaction.

Learning Outcomes

Deep understanding of stereoscopic image generation
Experience with state-of-the-art segmentation models
Practical knowledge of image processing techniques
Web interface development for ML applications

Conclusion

This project successfully demonstrates the integration of modern AI-powered segmentation with classical stereoscopic image processing techniques. The system provides an accessible way to create 3D effects from regular 2D images.

Future Work

Implementation of depth-aware 3D effect generation
Support for video processing
Additional 3D viewing formats (side-by-side, over-under)
Enhanced background replacement options
Mobile device optimization

Setup

pip install -r requirements.txt

Usage

cd src
python app.py

Parameters

Person Image: Upload an image containing a person
Background Image: (Optional) Custom background image
Interaxial Distance: Adjust the 3D effect strength (0-10)
Person Size: Adjust the size of the person in the output (10-200%)

Output Types

Segmentation Mask: Shows the isolated person
Stereo Pair: Side-by-side stereo image for parallel viewing
Anaglyph: Red-cyan 3D image viewable with anaglyph glasses