Spaces:
Running
Running
Merge branch 'master' into main
Browse files- .gitignore +3 -0
- README.MD +116 -0
- requirements.txt +8 -0
- src/anaglyphGenerator.py +39 -0
- src/app.py +88 -0
- src/testing.py +4 -0
- src/utils.py +96 -0
.gitignore
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
.qodo
|
2 |
+
/src/__pycache__
|
3 |
+
/venv
|
README.MD
ADDED
@@ -0,0 +1,116 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# 3D Person Segmentation and Anaglyph Generation
|
2 |
+
|
3 |
+
title: Object Segmentation
|
4 |
+
emoji: 👁
|
5 |
+
colorFrom: gray
|
6 |
+
colorTo: pink
|
7 |
+
sdk: gradio
|
8 |
+
sdk_version: 5.22.0
|
9 |
+
app_file: src/app.py
|
10 |
+
pinned: false
|
11 |
+
|
12 |
+
|
13 |
+
## Lab Report
|
14 |
+
|
15 |
+
### Introduction
|
16 |
+
This project implements a sophisticated 3D image processing system that combines person segmentation with stereoscopic and anaglyph image generation. The main objectives were to:
|
17 |
+
1. Accurately segment people from images using advanced AI models
|
18 |
+
2. Generate stereoscopic 3D effects from 2D images
|
19 |
+
3. Create red-cyan anaglyph images for 3D viewing
|
20 |
+
4. Provide an interactive web interface for real-time processing
|
21 |
+
|
22 |
+
### Methodology
|
23 |
+
|
24 |
+
#### Tools and Technologies Used
|
25 |
+
- **SegFormer (nvidia/segformer-b0)**: State-of-the-art transformer-based model for semantic segmentation
|
26 |
+
- **PyTorch**: Deep learning framework for running the SegFormer model
|
27 |
+
- **OpenCV**: Image processing operations and mask refinement
|
28 |
+
- **Gradio**: Web interface development
|
29 |
+
- **NumPy**: Efficient array operations for image manipulation
|
30 |
+
- **PIL (Python Imaging Library)**: Image loading and basic transformations
|
31 |
+
|
32 |
+
#### Implementation Steps
|
33 |
+
|
34 |
+
1. **Person Segmentation**
|
35 |
+
- Utilized SegFormer model fine-tuned on ADE20K dataset
|
36 |
+
- Applied post-processing with erosion and Gaussian blur for mask refinement
|
37 |
+
- Implemented mask scaling and centering for various input sizes
|
38 |
+
|
39 |
+
2. **Stereoscopic Processing**
|
40 |
+
- Created depth simulation through horizontal pixel shifting
|
41 |
+
- Implemented parallel view stereo pair generation
|
42 |
+
- Added configurable interaxial distance for 3D effect adjustment
|
43 |
+
|
44 |
+
3. **Anaglyph Generation**
|
45 |
+
- Combined left and right eye views into red-cyan anaglyph
|
46 |
+
- Implemented color channel separation and recombination
|
47 |
+
- Added background image support with proper masking
|
48 |
+
|
49 |
+
4. **User Interface**
|
50 |
+
- Developed interactive web interface using Gradio
|
51 |
+
- Added real-time parameter adjustment capabilities
|
52 |
+
- Implemented support for custom background images
|
53 |
+
|
54 |
+
### Results
|
55 |
+
|
56 |
+
The system produces three main outputs:
|
57 |
+
1. Segmentation mask showing the isolated person
|
58 |
+
2. Side-by-side stereo pair for parallel viewing
|
59 |
+
3. Red-cyan anaglyph image for 3D glasses viewing
|
60 |
+
|
61 |
+
Key Features:
|
62 |
+
- Adjustable person size (10-200%)
|
63 |
+
- Configurable interaxial distance (0-10 pixels)
|
64 |
+
- Optional custom background support
|
65 |
+
- Real-time processing and preview
|
66 |
+
|
67 |
+
### Discussion
|
68 |
+
|
69 |
+
#### Technical Challenges
|
70 |
+
1. **Mask Alignment**: Ensuring proper alignment between segmentation masks and background images required careful consideration of image dimensions and aspect ratios.
|
71 |
+
2. **Stereo Effect Quality**: Balancing the interaxial distance for comfortable viewing while maintaining the 3D effect.
|
72 |
+
3. **Performance Optimization**: Efficient processing of large images while maintaining real-time interaction.
|
73 |
+
|
74 |
+
#### Learning Outcomes
|
75 |
+
- Deep understanding of stereoscopic image generation
|
76 |
+
- Experience with state-of-the-art segmentation models
|
77 |
+
- Practical knowledge of image processing techniques
|
78 |
+
- Web interface development for ML applications
|
79 |
+
|
80 |
+
### Conclusion
|
81 |
+
|
82 |
+
This project successfully demonstrates the integration of modern AI-powered segmentation with classical stereoscopic image processing techniques. The system provides an accessible way to create 3D effects from regular 2D images.
|
83 |
+
|
84 |
+
#### Future Work
|
85 |
+
- Implementation of depth-aware 3D effect generation
|
86 |
+
- Support for video processing
|
87 |
+
- Additional 3D viewing formats (side-by-side, over-under)
|
88 |
+
- Enhanced background replacement options
|
89 |
+
- Mobile device optimization
|
90 |
+
|
91 |
+
## Setup
|
92 |
+
|
93 |
+
```bash
|
94 |
+
pip install -r requirements.txt
|
95 |
+
```
|
96 |
+
|
97 |
+
## Usage
|
98 |
+
|
99 |
+
```bash
|
100 |
+
cd src
|
101 |
+
python app.py
|
102 |
+
```
|
103 |
+
|
104 |
+
## Parameters
|
105 |
+
|
106 |
+
- **Person Image**: Upload an image containing a person
|
107 |
+
- **Background Image**: (Optional) Custom background image
|
108 |
+
- **Interaxial Distance**: Adjust the 3D effect strength (0-10)
|
109 |
+
- **Person Size**: Adjust the size of the person in the output (10-200%)
|
110 |
+
|
111 |
+
## Output Types
|
112 |
+
|
113 |
+
1. **Segmentation Mask**: Shows the isolated person
|
114 |
+
2. **Stereo Pair**: Side-by-side stereo image for parallel viewing
|
115 |
+
3. **Anaglyph**: Red-cyan 3D image viewable with anaglyph glasses
|
116 |
+
|
requirements.txt
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
transformers
|
2 |
+
torch
|
3 |
+
Pillow
|
4 |
+
datasets
|
5 |
+
opencv-python
|
6 |
+
gradio
|
7 |
+
numpy
|
8 |
+
scikit-image
|
src/anaglyphGenerator.py
ADDED
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
import numpy as np
|
3 |
+
from PIL import Image
|
4 |
+
from utils import load_model, segment_person
|
5 |
+
|
6 |
+
def create_anaglyph(person_img_path, background_img_path, output_path="output_anaglyph.png"):
|
7 |
+
image = Image.open(person_img_path).convert("RGB")
|
8 |
+
background = Image.open(background_img_path).convert("RGB").resize(image.size)
|
9 |
+
|
10 |
+
processor, model = load_model()
|
11 |
+
mask = segment_person(image, processor, model)
|
12 |
+
|
13 |
+
image_np = np.array(image)
|
14 |
+
background_np = np.array(background)
|
15 |
+
|
16 |
+
person_only = image_np * mask
|
17 |
+
background_only = background_np * (1 - mask)
|
18 |
+
|
19 |
+
# Stereoscopic shift
|
20 |
+
shift_pixels = 10
|
21 |
+
person_left = np.roll(person_only, shift=-shift_pixels, axis=1)
|
22 |
+
person_right = np.roll(person_only, shift=shift_pixels, axis=1)
|
23 |
+
|
24 |
+
left_eye = np.clip(person_left + background_only, 0, 255).astype(np.uint8)
|
25 |
+
right_eye = np.clip(person_right + background_only, 0, 255).astype(np.uint8)
|
26 |
+
|
27 |
+
# Merge into red-cyan anaglyph
|
28 |
+
anaglyph = np.stack([
|
29 |
+
left_eye[:, :, 0],
|
30 |
+
right_eye[:, :, 1],
|
31 |
+
right_eye[:, :, 2]
|
32 |
+
], axis=2)
|
33 |
+
|
34 |
+
anaglyph_img = Image.fromarray(anaglyph.astype(np.uint8))
|
35 |
+
anaglyph_img.save(output_path)
|
36 |
+
print(f"✅ Anaglyph image saved to: {output_path}")
|
37 |
+
|
38 |
+
if __name__ == "__main__":
|
39 |
+
create_anaglyph("person.png", "bg.png")
|
src/app.py
ADDED
@@ -0,0 +1,88 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gradio as gr
|
2 |
+
import numpy as np
|
3 |
+
from PIL import Image
|
4 |
+
from utils import load_model, segment_person, resize_image, split_stereo_image
|
5 |
+
|
6 |
+
# Load model and processor once
|
7 |
+
processor, model = load_model()
|
8 |
+
|
9 |
+
# Default background (solid color)
|
10 |
+
default_bg = Image.new("RGB", (512, 512), color=(95, 147, 89))
|
11 |
+
|
12 |
+
|
13 |
+
|
14 |
+
|
15 |
+
|
16 |
+
def generate_3d_outputs(person_img, background_img=None, shift_pixels=10, person_size=100):
|
17 |
+
# Resize images to match
|
18 |
+
image = resize_image(person_img, person_size)
|
19 |
+
background_img = background_img if background_img is not None else default_bg
|
20 |
+
|
21 |
+
|
22 |
+
# Split background image into left and right halves
|
23 |
+
leftBackground, rightBackground = split_stereo_image(Image.fromarray(background_img))
|
24 |
+
|
25 |
+
# Resize image to match background dimensions
|
26 |
+
|
27 |
+
|
28 |
+
image = Image.fromarray(np.array(image)).resize((leftBackground.shape[1], leftBackground.shape[0]))
|
29 |
+
# Step 1: Segment person
|
30 |
+
mask = segment_person(image, processor, model)
|
31 |
+
|
32 |
+
image_np = np.array(image)
|
33 |
+
|
34 |
+
leftBackground_np = np.array(leftBackground)
|
35 |
+
rightBackground_np = np.array(rightBackground)
|
36 |
+
|
37 |
+
|
38 |
+
person_only = image_np * mask
|
39 |
+
leftBackground_only = leftBackground_np * (1 - mask)
|
40 |
+
rightBackground_only = rightBackground_np * (1 - mask)
|
41 |
+
|
42 |
+
# Step 2: Create stereo pair
|
43 |
+
person_left = np.roll(person_only, shift=-shift_pixels, axis=1)
|
44 |
+
person_right = np.roll(person_only, shift=shift_pixels, axis=1)
|
45 |
+
|
46 |
+
|
47 |
+
left_eye = np.clip(person_right + leftBackground_only, 0, 255).astype(np.uint8)
|
48 |
+
right_eye = np.clip(person_left + rightBackground_only, 0, 255).astype(np.uint8)
|
49 |
+
person_segmentation = np.clip(person_only, 0, 255).astype(np.uint8)
|
50 |
+
|
51 |
+
# --- Combine left and right images side by side ---
|
52 |
+
stereo_pair = np.concatenate([left_eye, right_eye], axis=1)
|
53 |
+
stereo_image = Image.fromarray(stereo_pair)
|
54 |
+
|
55 |
+
# Step 3: Create anaglyph
|
56 |
+
anaglyph = np.stack([
|
57 |
+
left_eye[:, :, 0], # Red from left
|
58 |
+
right_eye[:, :, 1], # Green from right
|
59 |
+
right_eye[:, :, 2] # Blue from right
|
60 |
+
], axis=2)
|
61 |
+
|
62 |
+
anaglyph_img = Image.fromarray(anaglyph.astype(np.uint8))
|
63 |
+
left_img = Image.fromarray(left_eye)
|
64 |
+
right_img = Image.fromarray(right_eye)
|
65 |
+
|
66 |
+
return person_segmentation, stereo_image, anaglyph_img
|
67 |
+
|
68 |
+
# Gradio Interface
|
69 |
+
demo = gr.Interface(
|
70 |
+
fn=generate_3d_outputs,
|
71 |
+
inputs=[
|
72 |
+
gr.Image(label="Person Image"),
|
73 |
+
gr.Image(label="Optional Background Image"),
|
74 |
+
gr.Slider(minimum=0, maximum=10, step=1, value=10, label="interaxial distance"),
|
75 |
+
gr.Slider(minimum=10, maximum=200, step=10, value=100, label="Person Size %"),
|
76 |
+
|
77 |
+
],
|
78 |
+
outputs=[
|
79 |
+
gr.Image(label="segmentation mask"),
|
80 |
+
gr.Image(label="Stereo_pair"),
|
81 |
+
gr.Image(label="3D Anaglyph Image")
|
82 |
+
],
|
83 |
+
title="3D Person Segmentation Viewer",
|
84 |
+
description="Upload a person photo and optionally a background image. Outputs anaglyph and stereo views."
|
85 |
+
)
|
86 |
+
|
87 |
+
if __name__ == "__main__":
|
88 |
+
demo.launch()
|
src/testing.py
ADDED
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from app import create_anaglyph
|
2 |
+
|
3 |
+
# Provide paths to your test images
|
4 |
+
create_anaglyph("person.png", "bg.png", "test_anaglyph.png")
|
src/utils.py
ADDED
@@ -0,0 +1,96 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import torch
|
2 |
+
import numpy as np
|
3 |
+
from PIL import Image
|
4 |
+
import cv2
|
5 |
+
from transformers import AutoImageProcessor, SegformerForSemanticSegmentation
|
6 |
+
from imagehash import average_hash
|
7 |
+
|
8 |
+
def load_model():
|
9 |
+
processor = AutoImageProcessor.from_pretrained("nvidia/segformer-b0-finetuned-ade-512-512")
|
10 |
+
model = SegformerForSemanticSegmentation.from_pretrained("nvidia/segformer-b0-finetuned-ade-512-512")
|
11 |
+
return processor, model
|
12 |
+
|
13 |
+
def segment_person(image: Image.Image, processor, model):
|
14 |
+
inputs = processor(images=image, return_tensors="pt")
|
15 |
+
with torch.no_grad():
|
16 |
+
outputs = model(**inputs)
|
17 |
+
|
18 |
+
logits = outputs.logits
|
19 |
+
upsampled_logits = torch.nn.functional.interpolate(
|
20 |
+
logits,
|
21 |
+
size=image.size[::-1],
|
22 |
+
mode="bilinear",
|
23 |
+
align_corners=False,
|
24 |
+
)
|
25 |
+
pred_classes = upsampled_logits.argmax(dim=1)[0].cpu().numpy()
|
26 |
+
mask = (pred_classes == 12).astype(np.uint8) * 255 # Class 12 = person
|
27 |
+
|
28 |
+
# Clean mask
|
29 |
+
kernel = np.ones((7, 7), np.uint8)
|
30 |
+
eroded_mask = cv2.erode(mask, kernel, iterations=1)
|
31 |
+
blurred_mask = cv2.GaussianBlur(eroded_mask, (3, 3), sigmaX=0, sigmaY=0)
|
32 |
+
|
33 |
+
final_mask = blurred_mask.astype(np.float32) / 255.0
|
34 |
+
final_mask_3ch = np.stack([final_mask]*3, axis=-1)
|
35 |
+
|
36 |
+
return final_mask_3ch
|
37 |
+
|
38 |
+
|
39 |
+
def resize_image(image, size_percent):
|
40 |
+
# Convert image to RGB if it's RGBA
|
41 |
+
image = Image.fromarray(image).convert("RGB")
|
42 |
+
width, height = image.size
|
43 |
+
new_width = int(width * size_percent / 100)
|
44 |
+
new_height = int(height * size_percent / 100)
|
45 |
+
|
46 |
+
# Create new transparent image with original dimensions
|
47 |
+
resized_image = Image.new('RGB', (width, height), (0, 0, 0))
|
48 |
+
|
49 |
+
# Resize original image
|
50 |
+
scaled_content = image.resize((new_width, new_height))
|
51 |
+
|
52 |
+
# Calculate position to paste resized content in center
|
53 |
+
x = (width - new_width) // 2
|
54 |
+
y = (height - new_height) // 2
|
55 |
+
|
56 |
+
# Paste resized content onto transparent background
|
57 |
+
resized_image.paste(scaled_content, (x, y))
|
58 |
+
|
59 |
+
return resized_image
|
60 |
+
|
61 |
+
# Check if two images are similar
|
62 |
+
def check_image_similarity(image1, image2):
|
63 |
+
|
64 |
+
hash1 = average_hash(Image.fromarray(image1))
|
65 |
+
hash2 = average_hash(Image.fromarray(image2))
|
66 |
+
return hash1 - hash2 < 10
|
67 |
+
|
68 |
+
|
69 |
+
def split_stereo_image(image):
|
70 |
+
"""
|
71 |
+
Splits an image into left and right halves for stereographic viewing.
|
72 |
+
|
73 |
+
Args:
|
74 |
+
image: PIL Image or numpy array
|
75 |
+
|
76 |
+
Returns:
|
77 |
+
tuple: (left_half, right_half) as numpy arrays
|
78 |
+
"""
|
79 |
+
# Convert to numpy array if PIL Image
|
80 |
+
if isinstance(image, Image.Image):
|
81 |
+
image = np.array(image)
|
82 |
+
|
83 |
+
# Get width and calculate split point
|
84 |
+
width = image.shape[1]
|
85 |
+
split_point = width // 2
|
86 |
+
|
87 |
+
# Split into left and right halves
|
88 |
+
left_half = image[:, :split_point]
|
89 |
+
right_half = image[:, split_point:]
|
90 |
+
|
91 |
+
#If stereo image is provided, return left and right halves
|
92 |
+
if check_image_similarity(left_half, right_half):
|
93 |
+
return left_half, right_half
|
94 |
+
else:
|
95 |
+
return image, resize_image(image, 99)
|
96 |
+
|