gemini-docs-dbfiles / docstore /00ceea18-bfce-4340-a019-233a30ab30f7
arshjaved's picture
Upload all project files and folders
2e9b585 verified
instructions, such as: "Show bounding boxes of all green objects in this image". It also support custom labels like "label the items with the allergens they can contain". For more examples, check following notebooks in the Gemini Cookbook : 2D spatial understanding notebook Experimental 3D pointing notebook Segmentation Starting with Gemini 2.5, models not only detect items but also segment them and provide their contour masks. The model predicts a JSON list, where each item represents a segmentation mask. Each item has a bounding box (" box_2d ") in the format [y0, x0, y1, x1] with normalized coordinates between 0 and 1000, a label (" label ") that identifies the object, and finally the segmentation mask inside the bounding box, as base64 encoded png that is a probability map with values between 0 and 255. The mask needs to be resized to match the bounding box dimensions, then binarized at your confidence threshold (127 for the midpoint). Note: For better results, disable thinking by setting the thinking budget to 0. See code sample below for an example. Python from google import genai from google.genai import types from PIL import Image , ImageDraw import io import base64 import json import numpy as np import os client = genai . Client () def parse_json ( json_output : str ): # Parsing out the markdown fencing lines = json_output . splitlines () for i , line in enumerate ( lines ): if line == "```json" : json_output = " \n " . join ( lines [ i + 1 :]) # Remove everything before "```json" output = json_output . split ( "```" )[ 0 ] # Remove everything after the closing "```" break # Exit the loop once "```json" is found return json_output def extract_segmentation_masks ( image_path : str , output_dir : str = "segmentation_outputs" ): # Load and resize image im = Image . open ( image_path ) im . thumbnail ([ 1024 , 1024 ], Image . Resampling . LANCZOS ) prompt = """ Give the segmentation masks for the wooden and glass items. Output a JSON list of segmentation masks