Alex Hortua commited on
Commit
ef7ef27
·
2 Parent(s): 9df842f 2ec985a

Merge branch 'master' into main

Browse files
Files changed (7) hide show
  1. .gitignore +3 -0
  2. README.MD +116 -0
  3. requirements.txt +8 -0
  4. src/anaglyphGenerator.py +39 -0
  5. src/app.py +88 -0
  6. src/testing.py +4 -0
  7. src/utils.py +96 -0
.gitignore ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ .qodo
2
+ /src/__pycache__
3
+ /venv
README.MD ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 3D Person Segmentation and Anaglyph Generation
2
+
3
+ title: Object Segmentation
4
+ emoji: 👁
5
+ colorFrom: gray
6
+ colorTo: pink
7
+ sdk: gradio
8
+ sdk_version: 5.22.0
9
+ app_file: src/app.py
10
+ pinned: false
11
+
12
+
13
+ ## Lab Report
14
+
15
+ ### Introduction
16
+ This project implements a sophisticated 3D image processing system that combines person segmentation with stereoscopic and anaglyph image generation. The main objectives were to:
17
+ 1. Accurately segment people from images using advanced AI models
18
+ 2. Generate stereoscopic 3D effects from 2D images
19
+ 3. Create red-cyan anaglyph images for 3D viewing
20
+ 4. Provide an interactive web interface for real-time processing
21
+
22
+ ### Methodology
23
+
24
+ #### Tools and Technologies Used
25
+ - **SegFormer (nvidia/segformer-b0)**: State-of-the-art transformer-based model for semantic segmentation
26
+ - **PyTorch**: Deep learning framework for running the SegFormer model
27
+ - **OpenCV**: Image processing operations and mask refinement
28
+ - **Gradio**: Web interface development
29
+ - **NumPy**: Efficient array operations for image manipulation
30
+ - **PIL (Python Imaging Library)**: Image loading and basic transformations
31
+
32
+ #### Implementation Steps
33
+
34
+ 1. **Person Segmentation**
35
+ - Utilized SegFormer model fine-tuned on ADE20K dataset
36
+ - Applied post-processing with erosion and Gaussian blur for mask refinement
37
+ - Implemented mask scaling and centering for various input sizes
38
+
39
+ 2. **Stereoscopic Processing**
40
+ - Created depth simulation through horizontal pixel shifting
41
+ - Implemented parallel view stereo pair generation
42
+ - Added configurable interaxial distance for 3D effect adjustment
43
+
44
+ 3. **Anaglyph Generation**
45
+ - Combined left and right eye views into red-cyan anaglyph
46
+ - Implemented color channel separation and recombination
47
+ - Added background image support with proper masking
48
+
49
+ 4. **User Interface**
50
+ - Developed interactive web interface using Gradio
51
+ - Added real-time parameter adjustment capabilities
52
+ - Implemented support for custom background images
53
+
54
+ ### Results
55
+
56
+ The system produces three main outputs:
57
+ 1. Segmentation mask showing the isolated person
58
+ 2. Side-by-side stereo pair for parallel viewing
59
+ 3. Red-cyan anaglyph image for 3D glasses viewing
60
+
61
+ Key Features:
62
+ - Adjustable person size (10-200%)
63
+ - Configurable interaxial distance (0-10 pixels)
64
+ - Optional custom background support
65
+ - Real-time processing and preview
66
+
67
+ ### Discussion
68
+
69
+ #### Technical Challenges
70
+ 1. **Mask Alignment**: Ensuring proper alignment between segmentation masks and background images required careful consideration of image dimensions and aspect ratios.
71
+ 2. **Stereo Effect Quality**: Balancing the interaxial distance for comfortable viewing while maintaining the 3D effect.
72
+ 3. **Performance Optimization**: Efficient processing of large images while maintaining real-time interaction.
73
+
74
+ #### Learning Outcomes
75
+ - Deep understanding of stereoscopic image generation
76
+ - Experience with state-of-the-art segmentation models
77
+ - Practical knowledge of image processing techniques
78
+ - Web interface development for ML applications
79
+
80
+ ### Conclusion
81
+
82
+ This project successfully demonstrates the integration of modern AI-powered segmentation with classical stereoscopic image processing techniques. The system provides an accessible way to create 3D effects from regular 2D images.
83
+
84
+ #### Future Work
85
+ - Implementation of depth-aware 3D effect generation
86
+ - Support for video processing
87
+ - Additional 3D viewing formats (side-by-side, over-under)
88
+ - Enhanced background replacement options
89
+ - Mobile device optimization
90
+
91
+ ## Setup
92
+
93
+ ```bash
94
+ pip install -r requirements.txt
95
+ ```
96
+
97
+ ## Usage
98
+
99
+ ```bash
100
+ cd src
101
+ python app.py
102
+ ```
103
+
104
+ ## Parameters
105
+
106
+ - **Person Image**: Upload an image containing a person
107
+ - **Background Image**: (Optional) Custom background image
108
+ - **Interaxial Distance**: Adjust the 3D effect strength (0-10)
109
+ - **Person Size**: Adjust the size of the person in the output (10-200%)
110
+
111
+ ## Output Types
112
+
113
+ 1. **Segmentation Mask**: Shows the isolated person
114
+ 2. **Stereo Pair**: Side-by-side stereo image for parallel viewing
115
+ 3. **Anaglyph**: Red-cyan 3D image viewable with anaglyph glasses
116
+
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ transformers
2
+ torch
3
+ Pillow
4
+ datasets
5
+ opencv-python
6
+ gradio
7
+ numpy
8
+ scikit-image
src/anaglyphGenerator.py ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import numpy as np
3
+ from PIL import Image
4
+ from utils import load_model, segment_person
5
+
6
+ def create_anaglyph(person_img_path, background_img_path, output_path="output_anaglyph.png"):
7
+ image = Image.open(person_img_path).convert("RGB")
8
+ background = Image.open(background_img_path).convert("RGB").resize(image.size)
9
+
10
+ processor, model = load_model()
11
+ mask = segment_person(image, processor, model)
12
+
13
+ image_np = np.array(image)
14
+ background_np = np.array(background)
15
+
16
+ person_only = image_np * mask
17
+ background_only = background_np * (1 - mask)
18
+
19
+ # Stereoscopic shift
20
+ shift_pixels = 10
21
+ person_left = np.roll(person_only, shift=-shift_pixels, axis=1)
22
+ person_right = np.roll(person_only, shift=shift_pixels, axis=1)
23
+
24
+ left_eye = np.clip(person_left + background_only, 0, 255).astype(np.uint8)
25
+ right_eye = np.clip(person_right + background_only, 0, 255).astype(np.uint8)
26
+
27
+ # Merge into red-cyan anaglyph
28
+ anaglyph = np.stack([
29
+ left_eye[:, :, 0],
30
+ right_eye[:, :, 1],
31
+ right_eye[:, :, 2]
32
+ ], axis=2)
33
+
34
+ anaglyph_img = Image.fromarray(anaglyph.astype(np.uint8))
35
+ anaglyph_img.save(output_path)
36
+ print(f"✅ Anaglyph image saved to: {output_path}")
37
+
38
+ if __name__ == "__main__":
39
+ create_anaglyph("person.png", "bg.png")
src/app.py ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import numpy as np
3
+ from PIL import Image
4
+ from utils import load_model, segment_person, resize_image, split_stereo_image
5
+
6
+ # Load model and processor once
7
+ processor, model = load_model()
8
+
9
+ # Default background (solid color)
10
+ default_bg = Image.new("RGB", (512, 512), color=(95, 147, 89))
11
+
12
+
13
+
14
+
15
+
16
+ def generate_3d_outputs(person_img, background_img=None, shift_pixels=10, person_size=100):
17
+ # Resize images to match
18
+ image = resize_image(person_img, person_size)
19
+ background_img = background_img if background_img is not None else default_bg
20
+
21
+
22
+ # Split background image into left and right halves
23
+ leftBackground, rightBackground = split_stereo_image(Image.fromarray(background_img))
24
+
25
+ # Resize image to match background dimensions
26
+
27
+
28
+ image = Image.fromarray(np.array(image)).resize((leftBackground.shape[1], leftBackground.shape[0]))
29
+ # Step 1: Segment person
30
+ mask = segment_person(image, processor, model)
31
+
32
+ image_np = np.array(image)
33
+
34
+ leftBackground_np = np.array(leftBackground)
35
+ rightBackground_np = np.array(rightBackground)
36
+
37
+
38
+ person_only = image_np * mask
39
+ leftBackground_only = leftBackground_np * (1 - mask)
40
+ rightBackground_only = rightBackground_np * (1 - mask)
41
+
42
+ # Step 2: Create stereo pair
43
+ person_left = np.roll(person_only, shift=-shift_pixels, axis=1)
44
+ person_right = np.roll(person_only, shift=shift_pixels, axis=1)
45
+
46
+
47
+ left_eye = np.clip(person_right + leftBackground_only, 0, 255).astype(np.uint8)
48
+ right_eye = np.clip(person_left + rightBackground_only, 0, 255).astype(np.uint8)
49
+ person_segmentation = np.clip(person_only, 0, 255).astype(np.uint8)
50
+
51
+ # --- Combine left and right images side by side ---
52
+ stereo_pair = np.concatenate([left_eye, right_eye], axis=1)
53
+ stereo_image = Image.fromarray(stereo_pair)
54
+
55
+ # Step 3: Create anaglyph
56
+ anaglyph = np.stack([
57
+ left_eye[:, :, 0], # Red from left
58
+ right_eye[:, :, 1], # Green from right
59
+ right_eye[:, :, 2] # Blue from right
60
+ ], axis=2)
61
+
62
+ anaglyph_img = Image.fromarray(anaglyph.astype(np.uint8))
63
+ left_img = Image.fromarray(left_eye)
64
+ right_img = Image.fromarray(right_eye)
65
+
66
+ return person_segmentation, stereo_image, anaglyph_img
67
+
68
+ # Gradio Interface
69
+ demo = gr.Interface(
70
+ fn=generate_3d_outputs,
71
+ inputs=[
72
+ gr.Image(label="Person Image"),
73
+ gr.Image(label="Optional Background Image"),
74
+ gr.Slider(minimum=0, maximum=10, step=1, value=10, label="interaxial distance"),
75
+ gr.Slider(minimum=10, maximum=200, step=10, value=100, label="Person Size %"),
76
+
77
+ ],
78
+ outputs=[
79
+ gr.Image(label="segmentation mask"),
80
+ gr.Image(label="Stereo_pair"),
81
+ gr.Image(label="3D Anaglyph Image")
82
+ ],
83
+ title="3D Person Segmentation Viewer",
84
+ description="Upload a person photo and optionally a background image. Outputs anaglyph and stereo views."
85
+ )
86
+
87
+ if __name__ == "__main__":
88
+ demo.launch()
src/testing.py ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ from app import create_anaglyph
2
+
3
+ # Provide paths to your test images
4
+ create_anaglyph("person.png", "bg.png", "test_anaglyph.png")
src/utils.py ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import numpy as np
3
+ from PIL import Image
4
+ import cv2
5
+ from transformers import AutoImageProcessor, SegformerForSemanticSegmentation
6
+ from imagehash import average_hash
7
+
8
+ def load_model():
9
+ processor = AutoImageProcessor.from_pretrained("nvidia/segformer-b0-finetuned-ade-512-512")
10
+ model = SegformerForSemanticSegmentation.from_pretrained("nvidia/segformer-b0-finetuned-ade-512-512")
11
+ return processor, model
12
+
13
+ def segment_person(image: Image.Image, processor, model):
14
+ inputs = processor(images=image, return_tensors="pt")
15
+ with torch.no_grad():
16
+ outputs = model(**inputs)
17
+
18
+ logits = outputs.logits
19
+ upsampled_logits = torch.nn.functional.interpolate(
20
+ logits,
21
+ size=image.size[::-1],
22
+ mode="bilinear",
23
+ align_corners=False,
24
+ )
25
+ pred_classes = upsampled_logits.argmax(dim=1)[0].cpu().numpy()
26
+ mask = (pred_classes == 12).astype(np.uint8) * 255 # Class 12 = person
27
+
28
+ # Clean mask
29
+ kernel = np.ones((7, 7), np.uint8)
30
+ eroded_mask = cv2.erode(mask, kernel, iterations=1)
31
+ blurred_mask = cv2.GaussianBlur(eroded_mask, (3, 3), sigmaX=0, sigmaY=0)
32
+
33
+ final_mask = blurred_mask.astype(np.float32) / 255.0
34
+ final_mask_3ch = np.stack([final_mask]*3, axis=-1)
35
+
36
+ return final_mask_3ch
37
+
38
+
39
+ def resize_image(image, size_percent):
40
+ # Convert image to RGB if it's RGBA
41
+ image = Image.fromarray(image).convert("RGB")
42
+ width, height = image.size
43
+ new_width = int(width * size_percent / 100)
44
+ new_height = int(height * size_percent / 100)
45
+
46
+ # Create new transparent image with original dimensions
47
+ resized_image = Image.new('RGB', (width, height), (0, 0, 0))
48
+
49
+ # Resize original image
50
+ scaled_content = image.resize((new_width, new_height))
51
+
52
+ # Calculate position to paste resized content in center
53
+ x = (width - new_width) // 2
54
+ y = (height - new_height) // 2
55
+
56
+ # Paste resized content onto transparent background
57
+ resized_image.paste(scaled_content, (x, y))
58
+
59
+ return resized_image
60
+
61
+ # Check if two images are similar
62
+ def check_image_similarity(image1, image2):
63
+
64
+ hash1 = average_hash(Image.fromarray(image1))
65
+ hash2 = average_hash(Image.fromarray(image2))
66
+ return hash1 - hash2 < 10
67
+
68
+
69
+ def split_stereo_image(image):
70
+ """
71
+ Splits an image into left and right halves for stereographic viewing.
72
+
73
+ Args:
74
+ image: PIL Image or numpy array
75
+
76
+ Returns:
77
+ tuple: (left_half, right_half) as numpy arrays
78
+ """
79
+ # Convert to numpy array if PIL Image
80
+ if isinstance(image, Image.Image):
81
+ image = np.array(image)
82
+
83
+ # Get width and calculate split point
84
+ width = image.shape[1]
85
+ split_point = width // 2
86
+
87
+ # Split into left and right halves
88
+ left_half = image[:, :split_point]
89
+ right_half = image[:, split_point:]
90
+
91
+ #If stereo image is provided, return left and right halves
92
+ if check_image_similarity(left_half, right_half):
93
+ return left_half, right_half
94
+ else:
95
+ return image, resize_image(image, 99)
96
+