Alex Hortua commited on
Commit
8b89a64
Β·
1 Parent(s): b4454fe

Adding Readme With the latest documentation

Browse files
Files changed (1) hide show
  1. README.md +129 -7
README.md CHANGED
@@ -19,6 +19,7 @@ This project implements a sophisticated 3D image processing system that combines
19
  2. Generate stereoscopic 3D effects from 2D images
20
  3. Create red-cyan anaglyph images for 3D viewing
21
  4. Provide an interactive web interface for real-time processing
 
22
 
23
  ### Methodology
24
 
@@ -30,33 +31,125 @@ This project implements a sophisticated 3D image processing system that combines
30
  - **NumPy**: Efficient array operations for image manipulation
31
  - **PIL (Python Imaging Library)**: Image loading and basic transformations
32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  #### Implementation Steps
34
 
35
  1. **Person Segmentation**
36
  - Utilized SegFormer model fine-tuned on ADE20K dataset
37
  - Applied post-processing with erosion and Gaussian blur for mask refinement
38
  - Implemented mask scaling and centering for various input sizes
 
 
 
 
 
 
 
39
 
40
- 2. **Stereoscopic Processing**
41
  - Created depth simulation through horizontal pixel shifting
42
  - Implemented parallel view stereo pair generation
43
  - Added configurable interaxial distance for 3D effect adjustment
 
44
 
45
- 3. **Anaglyph Generation**
46
  - Combined left and right eye views into red-cyan anaglyph
47
  - Implemented color channel separation and recombination
48
  - Added background image support with proper masking
 
49
 
50
- 4. **User Interface**
51
  - Developed interactive web interface using Gradio
52
  - Added real-time parameter adjustment capabilities
53
  - Implemented support for custom background images
 
54
 
55
  ### Results
56
 
57
  The system produces three main outputs:
58
- 1. Segmentation mask showing the isolated person
59
- 2. Side-by-side stereo pair for parallel viewing
60
  3. Red-cyan anaglyph image for 3D glasses viewing
61
 
62
  Key Features:
@@ -64,6 +157,8 @@ Key Features:
64
  - Configurable interaxial distance (0-10 pixels)
65
  - Optional custom background support
66
  - Real-time processing and preview
 
 
67
 
68
  ### Discussion
69
 
@@ -71,16 +166,19 @@ Key Features:
71
  1. **Mask Alignment**: Ensuring proper alignment between segmentation masks and background images required careful consideration of image dimensions and aspect ratios.
72
  2. **Stereo Effect Quality**: Balancing the interaxial distance for comfortable viewing while maintaining the 3D effect.
73
  3. **Performance Optimization**: Efficient processing of large images while maintaining real-time interaction.
 
 
74
 
75
  #### Learning Outcomes
76
  - Deep understanding of stereoscopic image generation
77
  - Experience with state-of-the-art segmentation models
78
  - Practical knowledge of image processing techniques
79
  - Web interface development for ML applications
 
80
 
81
  ### Conclusion
82
 
83
- This project successfully demonstrates the integration of modern AI-powered segmentation with classical stereoscopic image processing techniques. The system provides an accessible way to create 3D effects from regular 2D images.
84
 
85
  #### Future Work
86
  - Implementation of depth-aware 3D effect generation
@@ -88,6 +186,8 @@ This project successfully demonstrates the integration of modern AI-powered segm
88
  - Additional 3D viewing formats (side-by-side, over-under)
89
  - Enhanced background replacement options
90
  - Mobile device optimization
 
 
91
 
92
  ## Setup
93
 
@@ -111,7 +211,29 @@ python app.py
111
 
112
  ## Output Types
113
 
114
- 1. **Segmentation Mask**: Shows the isolated person
115
  2. **Stereo Pair**: Side-by-side stereo image for parallel viewing
116
  3. **Anaglyph**: Red-cyan 3D image viewable with anaglyph glasses
117
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  2. Generate stereoscopic 3D effects from 2D images
20
  3. Create red-cyan anaglyph images for 3D viewing
21
  4. Provide an interactive web interface for real-time processing
22
+ 5. Handle varying image sizes with intelligent mask alignment
23
 
24
  ### Methodology
25
 
 
31
  - **NumPy**: Efficient array operations for image manipulation
32
  - **PIL (Python Imaging Library)**: Image loading and basic transformations
33
 
34
+ #### Mask Processing Deep Dive
35
+
36
+ The mask processing is a crucial component of our system, designed to handle various challenges in creating high-quality 3D effects:
37
+
38
+ 1. **Why Mask Resizing is Necessary**
39
+ - **Input Variability**: User-uploaded images come in different sizes and aspect ratios
40
+ - **Model Constraints**: SegFormer outputs masks at a fixed resolution (512x512)
41
+ - **Background Compatibility**: Backgrounds may have different dimensions than person images
42
+ - **3D Effect Quality**: Proper alignment is crucial for convincing stereoscopic effects
43
+
44
+ 2. **Mask Processing Pipeline**
45
+ ```
46
+ Original Image β†’ SegFormer Segmentation β†’ Initial Mask (512x512)
47
+ ↓
48
+ Resize to Match Background
49
+ ↓
50
+ Add Transparent Padding
51
+ ↓
52
+ Center Alignment
53
+ ↓
54
+ Final Processed Mask
55
+ ```
56
+
57
+ 3. **Technical Implementation**
58
+ ```python
59
+ # Pseudocode for mask processing
60
+ def process_mask(mask, background_size):
61
+ # Calculate padding dimensions
62
+ pad_top = (background_height - mask_height) // 2
63
+ pad_bottom = background_height - mask_height - pad_top
64
+ pad_left = (background_width - mask_width) // 2
65
+ pad_right = background_width - mask_width - pad_left
66
+
67
+ # Add padding with transparency
68
+ padded_mask = np.pad(mask,
69
+ ((pad_top, pad_bottom),
70
+ (pad_left, pad_right),
71
+ (0,0)),
72
+ mode='constant')
73
+
74
+ return padded_mask
75
+ ```
76
+
77
+ #### Visual Process Explanation
78
+
79
+ ```
80
+ +----------------+ +----------------+ +----------------+
81
+ | Original | | Segmented | | Padded |
82
+ | Image | --> | Mask | --> | Mask |
83
+ | (Variable) | | (512x512) | | (Background) |
84
+ +----------------+ +----------------+ +----------------+
85
+ | |
86
+ v v
87
+ +----------------+ +----------------+ +----------------+
88
+ | Left View | | Stereo Pair | | Anaglyph |
89
+ | Shifted | --> | Combined | --> | Output |
90
+ | | | | | |
91
+ +----------------+ +----------------+ +----------------+
92
+ ```
93
+
94
+ **Key Processing Steps Visualization:**
95
+
96
+ 1. **Mask Generation and Sizing:**
97
+ ```
98
+ +------------+ +-----------+ +-------------+
99
+ | Raw Image | | Raw Mask | | Sized Mask |
100
+ | ****** | -> | ######## | -> | ######## |
101
+ | *Image * | | #Mask # | | #Mask # |
102
+ | ****** | | ######## | | ######## |
103
+ +------------+ +-----------+ +-------------+
104
+ ```
105
+
106
+ 2. **Transparency Handling:**
107
+ ```
108
+ Original Padded Final
109
+ +----+ +------+ +------+
110
+ |####| | | | ## |
111
+ |####| -> |#### | -> |######|
112
+ |####| |#### | | ## |
113
+ +----+ +------+ +------+
114
+ ```
115
+
116
  #### Implementation Steps
117
 
118
  1. **Person Segmentation**
119
  - Utilized SegFormer model fine-tuned on ADE20K dataset
120
  - Applied post-processing with erosion and Gaussian blur for mask refinement
121
  - Implemented mask scaling and centering for various input sizes
122
+ - Added transparent padding for proper background integration
123
+
124
+ 2. **Mask Processing and Alignment**
125
+ - Implemented dynamic mask resizing to match background dimensions
126
+ - Added centered padding for smaller masks
127
+ - Preserved transparency in padded regions
128
+ - Ensured proper aspect ratio maintenance
129
 
130
+ 3. **Stereoscopic Processing**
131
  - Created depth simulation through horizontal pixel shifting
132
  - Implemented parallel view stereo pair generation
133
  - Added configurable interaxial distance for 3D effect adjustment
134
+ - Enhanced alignment between stereo pairs with mask centering
135
 
136
+ 4. **Anaglyph Generation**
137
  - Combined left and right eye views into red-cyan anaglyph
138
  - Implemented color channel separation and recombination
139
  - Added background image support with proper masking
140
+ - Improved blending between foreground and background
141
 
142
+ 5. **User Interface**
143
  - Developed interactive web interface using Gradio
144
  - Added real-time parameter adjustment capabilities
145
  - Implemented support for custom background images
146
+ - Added size adjustment controls
147
 
148
  ### Results
149
 
150
  The system produces three main outputs:
151
+ 1. Segmentation mask showing the isolated person with proper transparency
152
+ 2. Side-by-side stereo pair for parallel viewing with centered alignment
153
  3. Red-cyan anaglyph image for 3D glasses viewing
154
 
155
  Key Features:
 
157
  - Configurable interaxial distance (0-10 pixels)
158
  - Optional custom background support
159
  - Real-time processing and preview
160
+ - Intelligent mask alignment and padding
161
+ - Transparent background handling
162
 
163
  ### Discussion
164
 
 
166
  1. **Mask Alignment**: Ensuring proper alignment between segmentation masks and background images required careful consideration of image dimensions and aspect ratios.
167
  2. **Stereo Effect Quality**: Balancing the interaxial distance for comfortable viewing while maintaining the 3D effect.
168
  3. **Performance Optimization**: Efficient processing of large images while maintaining real-time interaction.
169
+ 4. **Transparency Handling**: Implementing proper transparency in padded regions while maintaining mask quality.
170
+ 5. **Size Adaptation**: Managing different input image sizes while preserving aspect ratios and alignment.
171
 
172
  #### Learning Outcomes
173
  - Deep understanding of stereoscopic image generation
174
  - Experience with state-of-the-art segmentation models
175
  - Practical knowledge of image processing techniques
176
  - Web interface development for ML applications
177
+ - Advanced mask manipulation and alignment strategies
178
 
179
  ### Conclusion
180
 
181
+ This project successfully demonstrates the integration of modern AI-powered segmentation with classical stereoscopic image processing techniques. The system provides an accessible way to create 3D effects from regular 2D images, with robust handling of different image sizes and proper transparency management.
182
 
183
  #### Future Work
184
  - Implementation of depth-aware 3D effect generation
 
186
  - Additional 3D viewing formats (side-by-side, over-under)
187
  - Enhanced background replacement options
188
  - Mobile device optimization
189
+ - Advanced depth map generation
190
+ - Multi-person segmentation support
191
 
192
  ## Setup
193
 
 
211
 
212
  ## Output Types
213
 
214
+ 1. **Segmentation Mask**: Shows the isolated person with proper transparency
215
  2. **Stereo Pair**: Side-by-side stereo image for parallel viewing
216
  3. **Anaglyph**: Red-cyan 3D image viewable with anaglyph glasses
217
 
218
+ ## Technical Notes
219
+
220
+ - **Mask Processing Details**:
221
+ - Initial mask is generated at 512x512 resolution
222
+ - Dynamic padding calculation: `pad = (background_size - mask_size) // 2`
223
+ - Transparency preservation using NumPy's constant padding mode
224
+ - Aspect ratio maintained through centered scaling
225
+ - Real-time size adjustments (10-200%) applied before padding
226
+
227
+ - **Size Handling Algorithm**:
228
+ 1. Calculate target dimensions based on background
229
+ 2. Resize mask while maintaining aspect ratio
230
+ 3. Add transparent padding to match background
231
+ 4. Center the mask content
232
+ 5. Apply any user-specified size adjustments
233
+
234
+ - The system automatically handles different input image sizes
235
+ - Masks are dynamically padded and centered for optimal alignment
236
+ - Transparent regions are properly preserved in the final output
237
+ - Background images are automatically scaled to match the person image
238
+ - Real-time preview updates as parameters are adjusted
239
+