Spaces:
Running
Running
3D Person Segmentation and Anaglyph Generation
title: Object Segmentation emoji: π colorFrom: gray colorTo: pink sdk: gradio sdk_version: 5.22.0 app_file: src/app.py pinned: false
Lab Report
Introduction
This project implements a sophisticated 3D image processing system that combines person segmentation with stereoscopic and anaglyph image generation. The main objectives were to:
- Accurately segment people from images using advanced AI models
- Generate stereoscopic 3D effects from 2D images
- Create red-cyan anaglyph images for 3D viewing
- Provide an interactive web interface for real-time processing
Methodology
Tools and Technologies Used
- SegFormer (nvidia/segformer-b0): State-of-the-art transformer-based model for semantic segmentation
- PyTorch: Deep learning framework for running the SegFormer model
- OpenCV: Image processing operations and mask refinement
- Gradio: Web interface development
- NumPy: Efficient array operations for image manipulation
- PIL (Python Imaging Library): Image loading and basic transformations
Implementation Steps
Person Segmentation
- Utilized SegFormer model fine-tuned on ADE20K dataset
- Applied post-processing with erosion and Gaussian blur for mask refinement
- Implemented mask scaling and centering for various input sizes
Stereoscopic Processing
- Created depth simulation through horizontal pixel shifting
- Implemented parallel view stereo pair generation
- Added configurable interaxial distance for 3D effect adjustment
Anaglyph Generation
- Combined left and right eye views into red-cyan anaglyph
- Implemented color channel separation and recombination
- Added background image support with proper masking
User Interface
- Developed interactive web interface using Gradio
- Added real-time parameter adjustment capabilities
- Implemented support for custom background images
Results
The system produces three main outputs:
- Segmentation mask showing the isolated person
- Side-by-side stereo pair for parallel viewing
- Red-cyan anaglyph image for 3D glasses viewing
Key Features:
- Adjustable person size (10-200%)
- Configurable interaxial distance (0-10 pixels)
- Optional custom background support
- Real-time processing and preview
Discussion
Technical Challenges
- Mask Alignment: Ensuring proper alignment between segmentation masks and background images required careful consideration of image dimensions and aspect ratios.
- Stereo Effect Quality: Balancing the interaxial distance for comfortable viewing while maintaining the 3D effect.
- Performance Optimization: Efficient processing of large images while maintaining real-time interaction.
Learning Outcomes
- Deep understanding of stereoscopic image generation
- Experience with state-of-the-art segmentation models
- Practical knowledge of image processing techniques
- Web interface development for ML applications
Conclusion
This project successfully demonstrates the integration of modern AI-powered segmentation with classical stereoscopic image processing techniques. The system provides an accessible way to create 3D effects from regular 2D images.
Future Work
- Implementation of depth-aware 3D effect generation
- Support for video processing
- Additional 3D viewing formats (side-by-side, over-under)
- Enhanced background replacement options
- Mobile device optimization
Setup
pip install -r requirements.txt
Usage
cd src
python app.py
Parameters
- Person Image: Upload an image containing a person
- Background Image: (Optional) Custom background image
- Interaxial Distance: Adjust the 3D effect strength (0-10)
- Person Size: Adjust the size of the person in the output (10-200%)
Output Types
- Segmentation Mask: Shows the isolated person
- Stereo Pair: Side-by-side stereo image for parallel viewing
- Anaglyph: Red-cyan 3D image viewable with anaglyph glasses