Spaces:
Running
Running
Alex Hortua
commited on
Commit
Β·
8b89a64
1
Parent(s):
b4454fe
Adding Readme With the latest documentation
Browse files
README.md
CHANGED
@@ -19,6 +19,7 @@ This project implements a sophisticated 3D image processing system that combines
|
|
19 |
2. Generate stereoscopic 3D effects from 2D images
|
20 |
3. Create red-cyan anaglyph images for 3D viewing
|
21 |
4. Provide an interactive web interface for real-time processing
|
|
|
22 |
|
23 |
### Methodology
|
24 |
|
@@ -30,33 +31,125 @@ This project implements a sophisticated 3D image processing system that combines
|
|
30 |
- **NumPy**: Efficient array operations for image manipulation
|
31 |
- **PIL (Python Imaging Library)**: Image loading and basic transformations
|
32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
#### Implementation Steps
|
34 |
|
35 |
1. **Person Segmentation**
|
36 |
- Utilized SegFormer model fine-tuned on ADE20K dataset
|
37 |
- Applied post-processing with erosion and Gaussian blur for mask refinement
|
38 |
- Implemented mask scaling and centering for various input sizes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
-
|
41 |
- Created depth simulation through horizontal pixel shifting
|
42 |
- Implemented parallel view stereo pair generation
|
43 |
- Added configurable interaxial distance for 3D effect adjustment
|
|
|
44 |
|
45 |
-
|
46 |
- Combined left and right eye views into red-cyan anaglyph
|
47 |
- Implemented color channel separation and recombination
|
48 |
- Added background image support with proper masking
|
|
|
49 |
|
50 |
-
|
51 |
- Developed interactive web interface using Gradio
|
52 |
- Added real-time parameter adjustment capabilities
|
53 |
- Implemented support for custom background images
|
|
|
54 |
|
55 |
### Results
|
56 |
|
57 |
The system produces three main outputs:
|
58 |
-
1. Segmentation mask showing the isolated person
|
59 |
-
2. Side-by-side stereo pair for parallel viewing
|
60 |
3. Red-cyan anaglyph image for 3D glasses viewing
|
61 |
|
62 |
Key Features:
|
@@ -64,6 +157,8 @@ Key Features:
|
|
64 |
- Configurable interaxial distance (0-10 pixels)
|
65 |
- Optional custom background support
|
66 |
- Real-time processing and preview
|
|
|
|
|
67 |
|
68 |
### Discussion
|
69 |
|
@@ -71,16 +166,19 @@ Key Features:
|
|
71 |
1. **Mask Alignment**: Ensuring proper alignment between segmentation masks and background images required careful consideration of image dimensions and aspect ratios.
|
72 |
2. **Stereo Effect Quality**: Balancing the interaxial distance for comfortable viewing while maintaining the 3D effect.
|
73 |
3. **Performance Optimization**: Efficient processing of large images while maintaining real-time interaction.
|
|
|
|
|
74 |
|
75 |
#### Learning Outcomes
|
76 |
- Deep understanding of stereoscopic image generation
|
77 |
- Experience with state-of-the-art segmentation models
|
78 |
- Practical knowledge of image processing techniques
|
79 |
- Web interface development for ML applications
|
|
|
80 |
|
81 |
### Conclusion
|
82 |
|
83 |
-
This project successfully demonstrates the integration of modern AI-powered segmentation with classical stereoscopic image processing techniques. The system provides an accessible way to create 3D effects from regular 2D images.
|
84 |
|
85 |
#### Future Work
|
86 |
- Implementation of depth-aware 3D effect generation
|
@@ -88,6 +186,8 @@ This project successfully demonstrates the integration of modern AI-powered segm
|
|
88 |
- Additional 3D viewing formats (side-by-side, over-under)
|
89 |
- Enhanced background replacement options
|
90 |
- Mobile device optimization
|
|
|
|
|
91 |
|
92 |
## Setup
|
93 |
|
@@ -111,7 +211,29 @@ python app.py
|
|
111 |
|
112 |
## Output Types
|
113 |
|
114 |
-
1. **Segmentation Mask**: Shows the isolated person
|
115 |
2. **Stereo Pair**: Side-by-side stereo image for parallel viewing
|
116 |
3. **Anaglyph**: Red-cyan 3D image viewable with anaglyph glasses
|
117 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
2. Generate stereoscopic 3D effects from 2D images
|
20 |
3. Create red-cyan anaglyph images for 3D viewing
|
21 |
4. Provide an interactive web interface for real-time processing
|
22 |
+
5. Handle varying image sizes with intelligent mask alignment
|
23 |
|
24 |
### Methodology
|
25 |
|
|
|
31 |
- **NumPy**: Efficient array operations for image manipulation
|
32 |
- **PIL (Python Imaging Library)**: Image loading and basic transformations
|
33 |
|
34 |
+
#### Mask Processing Deep Dive
|
35 |
+
|
36 |
+
The mask processing is a crucial component of our system, designed to handle various challenges in creating high-quality 3D effects:
|
37 |
+
|
38 |
+
1. **Why Mask Resizing is Necessary**
|
39 |
+
- **Input Variability**: User-uploaded images come in different sizes and aspect ratios
|
40 |
+
- **Model Constraints**: SegFormer outputs masks at a fixed resolution (512x512)
|
41 |
+
- **Background Compatibility**: Backgrounds may have different dimensions than person images
|
42 |
+
- **3D Effect Quality**: Proper alignment is crucial for convincing stereoscopic effects
|
43 |
+
|
44 |
+
2. **Mask Processing Pipeline**
|
45 |
+
```
|
46 |
+
Original Image β SegFormer Segmentation β Initial Mask (512x512)
|
47 |
+
β
|
48 |
+
Resize to Match Background
|
49 |
+
β
|
50 |
+
Add Transparent Padding
|
51 |
+
β
|
52 |
+
Center Alignment
|
53 |
+
β
|
54 |
+
Final Processed Mask
|
55 |
+
```
|
56 |
+
|
57 |
+
3. **Technical Implementation**
|
58 |
+
```python
|
59 |
+
# Pseudocode for mask processing
|
60 |
+
def process_mask(mask, background_size):
|
61 |
+
# Calculate padding dimensions
|
62 |
+
pad_top = (background_height - mask_height) // 2
|
63 |
+
pad_bottom = background_height - mask_height - pad_top
|
64 |
+
pad_left = (background_width - mask_width) // 2
|
65 |
+
pad_right = background_width - mask_width - pad_left
|
66 |
+
|
67 |
+
# Add padding with transparency
|
68 |
+
padded_mask = np.pad(mask,
|
69 |
+
((pad_top, pad_bottom),
|
70 |
+
(pad_left, pad_right),
|
71 |
+
(0,0)),
|
72 |
+
mode='constant')
|
73 |
+
|
74 |
+
return padded_mask
|
75 |
+
```
|
76 |
+
|
77 |
+
#### Visual Process Explanation
|
78 |
+
|
79 |
+
```
|
80 |
+
+----------------+ +----------------+ +----------------+
|
81 |
+
| Original | | Segmented | | Padded |
|
82 |
+
| Image | --> | Mask | --> | Mask |
|
83 |
+
| (Variable) | | (512x512) | | (Background) |
|
84 |
+
+----------------+ +----------------+ +----------------+
|
85 |
+
| |
|
86 |
+
v v
|
87 |
+
+----------------+ +----------------+ +----------------+
|
88 |
+
| Left View | | Stereo Pair | | Anaglyph |
|
89 |
+
| Shifted | --> | Combined | --> | Output |
|
90 |
+
| | | | | |
|
91 |
+
+----------------+ +----------------+ +----------------+
|
92 |
+
```
|
93 |
+
|
94 |
+
**Key Processing Steps Visualization:**
|
95 |
+
|
96 |
+
1. **Mask Generation and Sizing:**
|
97 |
+
```
|
98 |
+
+------------+ +-----------+ +-------------+
|
99 |
+
| Raw Image | | Raw Mask | | Sized Mask |
|
100 |
+
| ****** | -> | ######## | -> | ######## |
|
101 |
+
| *Image * | | #Mask # | | #Mask # |
|
102 |
+
| ****** | | ######## | | ######## |
|
103 |
+
+------------+ +-----------+ +-------------+
|
104 |
+
```
|
105 |
+
|
106 |
+
2. **Transparency Handling:**
|
107 |
+
```
|
108 |
+
Original Padded Final
|
109 |
+
+----+ +------+ +------+
|
110 |
+
|####| | | | ## |
|
111 |
+
|####| -> |#### | -> |######|
|
112 |
+
|####| |#### | | ## |
|
113 |
+
+----+ +------+ +------+
|
114 |
+
```
|
115 |
+
|
116 |
#### Implementation Steps
|
117 |
|
118 |
1. **Person Segmentation**
|
119 |
- Utilized SegFormer model fine-tuned on ADE20K dataset
|
120 |
- Applied post-processing with erosion and Gaussian blur for mask refinement
|
121 |
- Implemented mask scaling and centering for various input sizes
|
122 |
+
- Added transparent padding for proper background integration
|
123 |
+
|
124 |
+
2. **Mask Processing and Alignment**
|
125 |
+
- Implemented dynamic mask resizing to match background dimensions
|
126 |
+
- Added centered padding for smaller masks
|
127 |
+
- Preserved transparency in padded regions
|
128 |
+
- Ensured proper aspect ratio maintenance
|
129 |
|
130 |
+
3. **Stereoscopic Processing**
|
131 |
- Created depth simulation through horizontal pixel shifting
|
132 |
- Implemented parallel view stereo pair generation
|
133 |
- Added configurable interaxial distance for 3D effect adjustment
|
134 |
+
- Enhanced alignment between stereo pairs with mask centering
|
135 |
|
136 |
+
4. **Anaglyph Generation**
|
137 |
- Combined left and right eye views into red-cyan anaglyph
|
138 |
- Implemented color channel separation and recombination
|
139 |
- Added background image support with proper masking
|
140 |
+
- Improved blending between foreground and background
|
141 |
|
142 |
+
5. **User Interface**
|
143 |
- Developed interactive web interface using Gradio
|
144 |
- Added real-time parameter adjustment capabilities
|
145 |
- Implemented support for custom background images
|
146 |
+
- Added size adjustment controls
|
147 |
|
148 |
### Results
|
149 |
|
150 |
The system produces three main outputs:
|
151 |
+
1. Segmentation mask showing the isolated person with proper transparency
|
152 |
+
2. Side-by-side stereo pair for parallel viewing with centered alignment
|
153 |
3. Red-cyan anaglyph image for 3D glasses viewing
|
154 |
|
155 |
Key Features:
|
|
|
157 |
- Configurable interaxial distance (0-10 pixels)
|
158 |
- Optional custom background support
|
159 |
- Real-time processing and preview
|
160 |
+
- Intelligent mask alignment and padding
|
161 |
+
- Transparent background handling
|
162 |
|
163 |
### Discussion
|
164 |
|
|
|
166 |
1. **Mask Alignment**: Ensuring proper alignment between segmentation masks and background images required careful consideration of image dimensions and aspect ratios.
|
167 |
2. **Stereo Effect Quality**: Balancing the interaxial distance for comfortable viewing while maintaining the 3D effect.
|
168 |
3. **Performance Optimization**: Efficient processing of large images while maintaining real-time interaction.
|
169 |
+
4. **Transparency Handling**: Implementing proper transparency in padded regions while maintaining mask quality.
|
170 |
+
5. **Size Adaptation**: Managing different input image sizes while preserving aspect ratios and alignment.
|
171 |
|
172 |
#### Learning Outcomes
|
173 |
- Deep understanding of stereoscopic image generation
|
174 |
- Experience with state-of-the-art segmentation models
|
175 |
- Practical knowledge of image processing techniques
|
176 |
- Web interface development for ML applications
|
177 |
+
- Advanced mask manipulation and alignment strategies
|
178 |
|
179 |
### Conclusion
|
180 |
|
181 |
+
This project successfully demonstrates the integration of modern AI-powered segmentation with classical stereoscopic image processing techniques. The system provides an accessible way to create 3D effects from regular 2D images, with robust handling of different image sizes and proper transparency management.
|
182 |
|
183 |
#### Future Work
|
184 |
- Implementation of depth-aware 3D effect generation
|
|
|
186 |
- Additional 3D viewing formats (side-by-side, over-under)
|
187 |
- Enhanced background replacement options
|
188 |
- Mobile device optimization
|
189 |
+
- Advanced depth map generation
|
190 |
+
- Multi-person segmentation support
|
191 |
|
192 |
## Setup
|
193 |
|
|
|
211 |
|
212 |
## Output Types
|
213 |
|
214 |
+
1. **Segmentation Mask**: Shows the isolated person with proper transparency
|
215 |
2. **Stereo Pair**: Side-by-side stereo image for parallel viewing
|
216 |
3. **Anaglyph**: Red-cyan 3D image viewable with anaglyph glasses
|
217 |
|
218 |
+
## Technical Notes
|
219 |
+
|
220 |
+
- **Mask Processing Details**:
|
221 |
+
- Initial mask is generated at 512x512 resolution
|
222 |
+
- Dynamic padding calculation: `pad = (background_size - mask_size) // 2`
|
223 |
+
- Transparency preservation using NumPy's constant padding mode
|
224 |
+
- Aspect ratio maintained through centered scaling
|
225 |
+
- Real-time size adjustments (10-200%) applied before padding
|
226 |
+
|
227 |
+
- **Size Handling Algorithm**:
|
228 |
+
1. Calculate target dimensions based on background
|
229 |
+
2. Resize mask while maintaining aspect ratio
|
230 |
+
3. Add transparent padding to match background
|
231 |
+
4. Center the mask content
|
232 |
+
5. Apply any user-specified size adjustments
|
233 |
+
|
234 |
+
- The system automatically handles different input image sizes
|
235 |
+
- Masks are dynamically padded and centered for optimal alignment
|
236 |
+
- Transparent regions are properly preserved in the final output
|
237 |
+
- Background images are automatically scaled to match the person image
|
238 |
+
- Real-time preview updates as parameters are adjusted
|
239 |
+
|