Spaces:
Running
Running
Alex Hortua
commited on
Commit
·
2f4c8a9
1
Parent(s):
6eb4ed6
Adding Report and additional information
Browse files- public/Project 4 Report.pdf +0 -0
- public/lab_report.txt +131 -0
public/Project 4 Report.pdf
ADDED
Binary file (91.5 kB). View file
|
|
public/lab_report.txt
ADDED
@@ -0,0 +1,131 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
3D Person Segmentation and Anaglyph Generation - Lab Report
|
2 |
+
=================================================
|
3 |
+
|
4 |
+
Introduction
|
5 |
+
------------
|
6 |
+
In this project, I developed a sophisticated 3D image processing system that combines modern AI-powered person segmentation with classical stereoscopic image processing. The main objectives were successfully accomplished:
|
7 |
+
|
8 |
+
1. Implementation of accurate person segmentation using SegFormer AI model
|
9 |
+
2. Creation of stereoscopic 3D effects from 2D images
|
10 |
+
3. Generation of red-cyan anaglyph images for 3D viewing
|
11 |
+
4. Development of an interactive web interface
|
12 |
+
5. Implementation of intelligent mask alignment for varying image sizes
|
13 |
+
|
14 |
+
The project is accessible at: https://huggingface.co/spaces/axelhortua/Object-segmentation
|
15 |
+
|
16 |
+
Methodology
|
17 |
+
-----------
|
18 |
+
The implementation followed a systematic approach using various tools and technologies:
|
19 |
+
|
20 |
+
1. Tools Selection:
|
21 |
+
- SegFormer (nvidia/segformer-b0) for semantic segmentation
|
22 |
+
- PyTorch for deep learning implementation
|
23 |
+
- OpenCV for image processing
|
24 |
+
- Gradio for web interface
|
25 |
+
- NumPy for array operations
|
26 |
+
- PIL for image handling
|
27 |
+
|
28 |
+
2. Implementation Process:
|
29 |
+
|
30 |
+
a) Person Segmentation:
|
31 |
+
- Used SegFormer model fine-tuned on ADE20K dataset
|
32 |
+
- Applied post-processing with erosion and Gaussian blur
|
33 |
+
- Implemented dynamic mask scaling and centering
|
34 |
+
|
35 |
+
b) Mask Processing:
|
36 |
+
- Developed dynamic mask resizing system
|
37 |
+
- Implemented transparent padding
|
38 |
+
- Ensured proper aspect ratio maintenance
|
39 |
+
- Created centered alignment algorithm
|
40 |
+
|
41 |
+
c) Stereoscopic Processing:
|
42 |
+
- Implemented horizontal pixel shifting for depth simulation
|
43 |
+
- Created parallel view stereo pair generation
|
44 |
+
- Added configurable interaxial distance
|
45 |
+
- Enhanced stereo pair alignment
|
46 |
+
|
47 |
+
d) Anaglyph Generation:
|
48 |
+
- Implemented color channel separation
|
49 |
+
- Created background integration system
|
50 |
+
- Developed foreground-background blending
|
51 |
+
- Optimized 3D effect quality
|
52 |
+
|
53 |
+
Results
|
54 |
+
-------
|
55 |
+
The system successfully produces three main outputs:
|
56 |
+
|
57 |
+
1. Segmentation Mask:
|
58 |
+
- Clean person isolation
|
59 |
+
- Proper transparency handling
|
60 |
+
- Accurate edge detection
|
61 |
+
- Smooth mask transitions
|
62 |
+
|
63 |
+
2. Stereo Pair:
|
64 |
+
- Side-by-side stereo image
|
65 |
+
- Configurable depth effect
|
66 |
+
- Proper alignment between pairs
|
67 |
+
- Maintained image quality
|
68 |
+
|
69 |
+
3. Anaglyph Output:
|
70 |
+
- Red-cyan 3D image
|
71 |
+
- Adjustable 3D effect strength
|
72 |
+
- Clean color separation
|
73 |
+
- Minimal ghosting artifacts
|
74 |
+
|
75 |
+
Key Features Achieved:
|
76 |
+
- Person size adjustment (10-200%)
|
77 |
+
- Interaxial distance control (0-10 pixels)
|
78 |
+
- Custom background support
|
79 |
+
- Real-time processing and preview
|
80 |
+
- Intelligent mask alignment
|
81 |
+
- Transparent background handling
|
82 |
+
|
83 |
+
Discussion
|
84 |
+
----------
|
85 |
+
Technical Challenges Faced:
|
86 |
+
|
87 |
+
1. Mask Alignment:
|
88 |
+
- Complex handling of different image dimensions
|
89 |
+
- Maintaining proper aspect ratios
|
90 |
+
- Ensuring consistent centering
|
91 |
+
- Handling edge cases
|
92 |
+
|
93 |
+
2. Stereo Effect Quality:
|
94 |
+
- Balancing interaxial distance
|
95 |
+
- Minimizing visual artifacts
|
96 |
+
- Maintaining comfortable viewing experience
|
97 |
+
- Preserving image details
|
98 |
+
|
99 |
+
3. Performance Optimization:
|
100 |
+
- Efficient large image processing
|
101 |
+
- Real-time interface responsiveness
|
102 |
+
- Memory management
|
103 |
+
- Processing speed optimization
|
104 |
+
|
105 |
+
4. Transparency Handling:
|
106 |
+
- Proper alpha channel management
|
107 |
+
- Clean edge preservation
|
108 |
+
- Consistent transparency across operations
|
109 |
+
- Background integration
|
110 |
+
|
111 |
+
Learning Outcomes:
|
112 |
+
- Deep understanding of stereoscopic image generation
|
113 |
+
- Practical experience with AI models
|
114 |
+
- Advanced image processing techniques
|
115 |
+
- Web interface development skills
|
116 |
+
- Complex system integration experience
|
117 |
+
|
118 |
+
Conclusion
|
119 |
+
----------
|
120 |
+
The project successfully demonstrates the integration of AI-powered segmentation with classical stereoscopic techniques. The system provides an accessible way to create 3D effects from regular 2D images, with robust handling of different image sizes and proper transparency management.
|
121 |
+
|
122 |
+
Future Work:
|
123 |
+
1. Implementation of depth-aware 3D effect generation
|
124 |
+
2. Addition of video processing capabilities
|
125 |
+
3. Support for additional 3D viewing formats
|
126 |
+
4. Enhanced background replacement options
|
127 |
+
5. Mobile device optimization
|
128 |
+
6. Advanced depth map generation
|
129 |
+
7. Multi-person segmentation support
|
130 |
+
|
131 |
+
The project has laid a strong foundation for future developments in 3D image processing and demonstrates the potential of combining AI with traditional image processing techniques.
|