Spaces:

axelhortua
/

Object-segmentation

Running

App Files Files Community

Alex Hortua commited on Mar 23

Commit

ef7ef27

2 Parent(s): 9df842f 2ec985a

Merge branch 'master' into main

Browse files

Files changed (7) hide show

.gitignore +3 -0
README.MD +116 -0
requirements.txt +8 -0
src/anaglyphGenerator.py +39 -0
src/app.py +88 -0
src/testing.py +4 -0
src/utils.py +96 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,3 @@

+.qodo
+/src/__pycache__
+/venv

README.MD ADDED Viewed

	@@ -0,0 +1,116 @@

+# 3D Person Segmentation and Anaglyph Generation
+title: Object Segmentation
+emoji: 👁
+colorFrom: gray
+colorTo: pink
+sdk: gradio
+sdk_version: 5.22.0
+app_file: src/app.py
+pinned: false
+## Lab Report
+### Introduction
+This project implements a sophisticated 3D image processing system that combines person segmentation with stereoscopic and anaglyph image generation. The main objectives were to:
+1. Accurately segment people from images using advanced AI models
+2. Generate stereoscopic 3D effects from 2D images
+3. Create red-cyan anaglyph images for 3D viewing
+4. Provide an interactive web interface for real-time processing
+### Methodology
+#### Tools and Technologies Used
+- **SegFormer (nvidia/segformer-b0)**: State-of-the-art transformer-based model for semantic segmentation
+- **PyTorch**: Deep learning framework for running the SegFormer model
+- **OpenCV**: Image processing operations and mask refinement
+- **Gradio**: Web interface development
+- **NumPy**: Efficient array operations for image manipulation
+- **PIL (Python Imaging Library)**: Image loading and basic transformations
+#### Implementation Steps
+1. **Person Segmentation**
+   - Utilized SegFormer model fine-tuned on ADE20K dataset
+   - Applied post-processing with erosion and Gaussian blur for mask refinement
+   - Implemented mask scaling and centering for various input sizes
+2. **Stereoscopic Processing**
+   - Created depth simulation through horizontal pixel shifting
+   - Implemented parallel view stereo pair generation
+   - Added configurable interaxial distance for 3D effect adjustment
+3. **Anaglyph Generation**
+   - Combined left and right eye views into red-cyan anaglyph
+   - Implemented color channel separation and recombination
+   - Added background image support with proper masking
+4. **User Interface**
+   - Developed interactive web interface using Gradio
+   - Added real-time parameter adjustment capabilities
+   - Implemented support for custom background images
+### Results
+The system produces three main outputs:
+1. Segmentation mask showing the isolated person
+2. Side-by-side stereo pair for parallel viewing
+3. Red-cyan anaglyph image for 3D glasses viewing
+Key Features:
+- Adjustable person size (10-200%)
+- Configurable interaxial distance (0-10 pixels)
+- Optional custom background support
+- Real-time processing and preview
+### Discussion
+#### Technical Challenges
+1. **Mask Alignment**: Ensuring proper alignment between segmentation masks and background images required careful consideration of image dimensions and aspect ratios.
+2. **Stereo Effect Quality**: Balancing the interaxial distance for comfortable viewing while maintaining the 3D effect.
+3. **Performance Optimization**: Efficient processing of large images while maintaining real-time interaction.
+#### Learning Outcomes
+- Deep understanding of stereoscopic image generation
+- Experience with state-of-the-art segmentation models
+- Practical knowledge of image processing techniques
+- Web interface development for ML applications
+### Conclusion
+This project successfully demonstrates the integration of modern AI-powered segmentation with classical stereoscopic image processing techniques. The system provides an accessible way to create 3D effects from regular 2D images.
+#### Future Work
+- Implementation of depth-aware 3D effect generation
+- Support for video processing
+- Additional 3D viewing formats (side-by-side, over-under)
+- Enhanced background replacement options
+- Mobile device optimization
+## Setup
+```bash
+pip install -r requirements.txt
+```
+## Usage
+```bash
+cd src
+python app.py
+```
+## Parameters
+- **Person Image**: Upload an image containing a person
+- **Background Image**: (Optional) Custom background image
+- **Interaxial Distance**: Adjust the 3D effect strength (0-10)
+- **Person Size**: Adjust the size of the person in the output (10-200%)
+## Output Types
+1. **Segmentation Mask**: Shows the isolated person
+2. **Stereo Pair**: Side-by-side stereo image for parallel viewing
+3. **Anaglyph**: Red-cyan 3D image viewable with anaglyph glasses

requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+transformers
+torch
+Pillow
+datasets
+opencv-python
+gradio
+numpy
+scikit-image

src/anaglyphGenerator.py ADDED Viewed

	@@ -0,0 +1,39 @@

+import os
+import numpy as np
+from PIL import Image
+from utils import load_model, segment_person
+def create_anaglyph(person_img_path, background_img_path, output_path="output_anaglyph.png"):
+    image = Image.open(person_img_path).convert("RGB")
+    background = Image.open(background_img_path).convert("RGB").resize(image.size)
+    processor, model = load_model()
+    mask = segment_person(image, processor, model)
+    image_np = np.array(image)
+    background_np = np.array(background)
+    person_only = image_np * mask
+    background_only = background_np * (1 - mask)
+    # Stereoscopic shift
+    shift_pixels = 10
+    person_left = np.roll(person_only, shift=-shift_pixels, axis=1)
+    person_right = np.roll(person_only, shift=shift_pixels, axis=1)
+    left_eye = np.clip(person_left + background_only, 0, 255).astype(np.uint8)
+    right_eye = np.clip(person_right + background_only, 0, 255).astype(np.uint8)
+    # Merge into red-cyan anaglyph
+    anaglyph = np.stack([
+        left_eye[:, :, 0],
+        right_eye[:, :, 1],
+        right_eye[:, :, 2]
+    ], axis=2)
+    anaglyph_img = Image.fromarray(anaglyph.astype(np.uint8))
+    anaglyph_img.save(output_path)
+    print(f"✅ Anaglyph image saved to: {output_path}")
+if __name__ == "__main__":
+    create_anaglyph("person.png", "bg.png")

src/app.py ADDED Viewed

	@@ -0,0 +1,88 @@

+import gradio as gr
+import numpy as np
+from PIL import Image
+from utils import load_model, segment_person, resize_image, split_stereo_image
+# Load model and processor once
+processor, model = load_model()
+# Default background (solid color)
+default_bg = Image.new("RGB", (512, 512), color=(95, 147, 89))
+def generate_3d_outputs(person_img, background_img=None, shift_pixels=10,  person_size=100):
+    # Resize images to match
+    image = resize_image(person_img, person_size)
+    background_img = background_img if background_img is not None else default_bg
+    # Split background image into left and right halves
+    leftBackground, rightBackground = split_stereo_image(Image.fromarray(background_img))
+    # Resize image to match background dimensions
+    image = Image.fromarray(np.array(image)).resize((leftBackground.shape[1], leftBackground.shape[0]))
+    # Step 1: Segment person
+    mask = segment_person(image, processor, model)
+    image_np = np.array(image)
+    leftBackground_np = np.array(leftBackground)
+    rightBackground_np = np.array(rightBackground)
+    person_only = image_np * mask
+    leftBackground_only = leftBackground_np * (1 - mask)
+    rightBackground_only = rightBackground_np * (1 - mask)
+    # Step 2: Create stereo pair
+    person_left = np.roll(person_only, shift=-shift_pixels, axis=1)
+    person_right = np.roll(person_only, shift=shift_pixels, axis=1)
+    left_eye = np.clip(person_right + leftBackground_only, 0, 255).astype(np.uint8)
+    right_eye = np.clip(person_left + rightBackground_only, 0, 255).astype(np.uint8)
+    person_segmentation = np.clip(person_only, 0, 255).astype(np.uint8)
+    # --- Combine left and right images side by side ---
+    stereo_pair = np.concatenate([left_eye, right_eye], axis=1)
+    stereo_image = Image.fromarray(stereo_pair)
+    # Step 3: Create anaglyph
+    anaglyph = np.stack([
+        left_eye[:, :, 0],  # Red from left
+        right_eye[:, :, 1],  # Green from right
+        right_eye[:, :, 2]   # Blue from right
+    ], axis=2)
+    anaglyph_img = Image.fromarray(anaglyph.astype(np.uint8))
+    left_img = Image.fromarray(left_eye)
+    right_img = Image.fromarray(right_eye)
+    return person_segmentation, stereo_image, anaglyph_img
+# Gradio Interface
+demo = gr.Interface(
+    fn=generate_3d_outputs,
+    inputs=[
+        gr.Image(label="Person Image"),
+        gr.Image(label="Optional Background Image"),
+        gr.Slider(minimum=0, maximum=10, step=1, value=10, label="interaxial distance"),
+        gr.Slider(minimum=10, maximum=200, step=10, value=100, label="Person Size %"),
+    ],
+    outputs=[
+        gr.Image(label="segmentation mask"),
+        gr.Image(label="Stereo_pair"),
+        gr.Image(label="3D Anaglyph Image")
+    ],
+    title="3D Person Segmentation Viewer",
+    description="Upload a person photo and optionally a background image. Outputs anaglyph and stereo views."
+)
+if __name__ == "__main__":
+    demo.launch()

src/testing.py ADDED Viewed

	@@ -0,0 +1,4 @@

+from app import create_anaglyph
+# Provide paths to your test images
+create_anaglyph("person.png", "bg.png", "test_anaglyph.png")

src/utils.py ADDED Viewed

	@@ -0,0 +1,96 @@

+import torch
+import numpy as np
+from PIL import Image
+import cv2
+from transformers import AutoImageProcessor, SegformerForSemanticSegmentation
+from imagehash import average_hash
+def load_model():
+    processor = AutoImageProcessor.from_pretrained("nvidia/segformer-b0-finetuned-ade-512-512")
+    model = SegformerForSemanticSegmentation.from_pretrained("nvidia/segformer-b0-finetuned-ade-512-512")
+    return processor, model
+def segment_person(image: Image.Image, processor, model):
+    inputs = processor(images=image, return_tensors="pt")
+    with torch.no_grad():
+        outputs = model(**inputs)
+    logits = outputs.logits
+    upsampled_logits = torch.nn.functional.interpolate(
+        logits,
+        size=image.size[::-1],
+        mode="bilinear",
+        align_corners=False,
+    )
+    pred_classes = upsampled_logits.argmax(dim=1)[0].cpu().numpy()
+    mask = (pred_classes == 12).astype(np.uint8) * 255  # Class 12 = person
+    # Clean mask
+    kernel = np.ones((7, 7), np.uint8)
+    eroded_mask = cv2.erode(mask, kernel, iterations=1)
+    blurred_mask = cv2.GaussianBlur(eroded_mask, (3, 3), sigmaX=0, sigmaY=0)
+    final_mask = blurred_mask.astype(np.float32) / 255.0
+    final_mask_3ch = np.stack([final_mask]*3, axis=-1)
+    return final_mask_3ch
+def resize_image(image, size_percent):
+  # Convert image to RGB if it's RGBA
+  image = Image.fromarray(image).convert("RGB")
+  width, height = image.size
+  new_width = int(width * size_percent / 100)
+  new_height = int(height * size_percent / 100)
+  # Create new transparent image with original dimensions
+  resized_image = Image.new('RGB', (width, height), (0, 0, 0))
+  # Resize original image
+  scaled_content = image.resize((new_width, new_height))
+  # Calculate position to paste resized content in center
+  x = (width - new_width) // 2
+  y = (height - new_height) // 2
+  # Paste resized content onto transparent background
+  resized_image.paste(scaled_content, (x, y))
+  return resized_image
+# Check if two images are similar
+def check_image_similarity(image1, image2):
+    hash1 = average_hash(Image.fromarray(image1))
+    hash2 = average_hash(Image.fromarray(image2))
+    return hash1 - hash2  < 10
+def split_stereo_image(image):
+    """
+    Splits an image into left and right halves for stereographic viewing.
+    Args:
+        image: PIL Image or numpy array
+    Returns:
+        tuple: (left_half, right_half) as numpy arrays
+    """
+    # Convert to numpy array if PIL Image
+    if isinstance(image, Image.Image):
+        image = np.array(image)
+    # Get width and calculate split point
+    width = image.shape[1]
+    split_point = width // 2
+    # Split into left and right halves
+    left_half = image[:, :split_point]
+    right_half = image[:, split_point:]
+    #If stereo image is provided, return left and right halves
+    if check_image_similarity(left_half, right_half):
+        return left_half, right_half
+    else:
+        return image, resize_image(image, 99)