Spaces:
Running
Running
Alex Hortua
commited on
Commit
Β·
e98ae33
1
Parent(s):
ef7ef27
Readme
Browse files
README.md
CHANGED
@@ -1,12 +1,116 @@
|
|
1 |
-
|
|
|
2 |
title: Object Segmentation
|
3 |
emoji: π
|
4 |
colorFrom: gray
|
5 |
colorTo: pink
|
6 |
sdk: gradio
|
7 |
sdk_version: 5.22.0
|
8 |
-
app_file: app.py
|
9 |
pinned: false
|
10 |
-
---
|
11 |
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# 3D Person Segmentation and Anaglyph Generation
|
2 |
+
|
3 |
title: Object Segmentation
|
4 |
emoji: π
|
5 |
colorFrom: gray
|
6 |
colorTo: pink
|
7 |
sdk: gradio
|
8 |
sdk_version: 5.22.0
|
9 |
+
app_file: src/app.py
|
10 |
pinned: false
|
|
|
11 |
|
12 |
+
|
13 |
+
## Lab Report
|
14 |
+
|
15 |
+
### Introduction
|
16 |
+
This project implements a sophisticated 3D image processing system that combines person segmentation with stereoscopic and anaglyph image generation. The main objectives were to:
|
17 |
+
1. Accurately segment people from images using advanced AI models
|
18 |
+
2. Generate stereoscopic 3D effects from 2D images
|
19 |
+
3. Create red-cyan anaglyph images for 3D viewing
|
20 |
+
4. Provide an interactive web interface for real-time processing
|
21 |
+
|
22 |
+
### Methodology
|
23 |
+
|
24 |
+
#### Tools and Technologies Used
|
25 |
+
- **SegFormer (nvidia/segformer-b0)**: State-of-the-art transformer-based model for semantic segmentation
|
26 |
+
- **PyTorch**: Deep learning framework for running the SegFormer model
|
27 |
+
- **OpenCV**: Image processing operations and mask refinement
|
28 |
+
- **Gradio**: Web interface development
|
29 |
+
- **NumPy**: Efficient array operations for image manipulation
|
30 |
+
- **PIL (Python Imaging Library)**: Image loading and basic transformations
|
31 |
+
|
32 |
+
#### Implementation Steps
|
33 |
+
|
34 |
+
1. **Person Segmentation**
|
35 |
+
- Utilized SegFormer model fine-tuned on ADE20K dataset
|
36 |
+
- Applied post-processing with erosion and Gaussian blur for mask refinement
|
37 |
+
- Implemented mask scaling and centering for various input sizes
|
38 |
+
|
39 |
+
2. **Stereoscopic Processing**
|
40 |
+
- Created depth simulation through horizontal pixel shifting
|
41 |
+
- Implemented parallel view stereo pair generation
|
42 |
+
- Added configurable interaxial distance for 3D effect adjustment
|
43 |
+
|
44 |
+
3. **Anaglyph Generation**
|
45 |
+
- Combined left and right eye views into red-cyan anaglyph
|
46 |
+
- Implemented color channel separation and recombination
|
47 |
+
- Added background image support with proper masking
|
48 |
+
|
49 |
+
4. **User Interface**
|
50 |
+
- Developed interactive web interface using Gradio
|
51 |
+
- Added real-time parameter adjustment capabilities
|
52 |
+
- Implemented support for custom background images
|
53 |
+
|
54 |
+
### Results
|
55 |
+
|
56 |
+
The system produces three main outputs:
|
57 |
+
1. Segmentation mask showing the isolated person
|
58 |
+
2. Side-by-side stereo pair for parallel viewing
|
59 |
+
3. Red-cyan anaglyph image for 3D glasses viewing
|
60 |
+
|
61 |
+
Key Features:
|
62 |
+
- Adjustable person size (10-200%)
|
63 |
+
- Configurable interaxial distance (0-10 pixels)
|
64 |
+
- Optional custom background support
|
65 |
+
- Real-time processing and preview
|
66 |
+
|
67 |
+
### Discussion
|
68 |
+
|
69 |
+
#### Technical Challenges
|
70 |
+
1. **Mask Alignment**: Ensuring proper alignment between segmentation masks and background images required careful consideration of image dimensions and aspect ratios.
|
71 |
+
2. **Stereo Effect Quality**: Balancing the interaxial distance for comfortable viewing while maintaining the 3D effect.
|
72 |
+
3. **Performance Optimization**: Efficient processing of large images while maintaining real-time interaction.
|
73 |
+
|
74 |
+
#### Learning Outcomes
|
75 |
+
- Deep understanding of stereoscopic image generation
|
76 |
+
- Experience with state-of-the-art segmentation models
|
77 |
+
- Practical knowledge of image processing techniques
|
78 |
+
- Web interface development for ML applications
|
79 |
+
|
80 |
+
### Conclusion
|
81 |
+
|
82 |
+
This project successfully demonstrates the integration of modern AI-powered segmentation with classical stereoscopic image processing techniques. The system provides an accessible way to create 3D effects from regular 2D images.
|
83 |
+
|
84 |
+
#### Future Work
|
85 |
+
- Implementation of depth-aware 3D effect generation
|
86 |
+
- Support for video processing
|
87 |
+
- Additional 3D viewing formats (side-by-side, over-under)
|
88 |
+
- Enhanced background replacement options
|
89 |
+
- Mobile device optimization
|
90 |
+
|
91 |
+
## Setup
|
92 |
+
|
93 |
+
```bash
|
94 |
+
pip install -r requirements.txt
|
95 |
+
```
|
96 |
+
|
97 |
+
## Usage
|
98 |
+
|
99 |
+
```bash
|
100 |
+
cd src
|
101 |
+
python app.py
|
102 |
+
```
|
103 |
+
|
104 |
+
## Parameters
|
105 |
+
|
106 |
+
- **Person Image**: Upload an image containing a person
|
107 |
+
- **Background Image**: (Optional) Custom background image
|
108 |
+
- **Interaxial Distance**: Adjust the 3D effect strength (0-10)
|
109 |
+
- **Person Size**: Adjust the size of the person in the output (10-200%)
|
110 |
+
|
111 |
+
## Output Types
|
112 |
+
|
113 |
+
1. **Segmentation Mask**: Shows the isolated person
|
114 |
+
2. **Stereo Pair**: Side-by-side stereo image for parallel viewing
|
115 |
+
3. **Anaglyph**: Red-cyan 3D image viewable with anaglyph glasses
|
116 |
+
|