How to generate accurate segmentation masks using object detection and Meta SAM2
Learn how to generate segmentation masks with object detection and SAM2 models for advanced image processing tasks.
Segmentation masks are vital for precise object tracking and analysis, allowing pixel-level identification of objects. By leveraging a fine-tuned Ultralytics YOLO11 model alongside the Segment Anything 2 (SAM2) model, you can achieve unparalleled accuracy and flexibility in your workflows.
Hardware and software setup for this project
- CPU: Intel® Core™ i5-10400 CPU @ 2.90GHz for efficient processing.
- GPU: NVIDIA RTX 3050 for real-time tasks.
- RAM and Storage: 64 GB RAM and 1TB hard disk for seamless performance.
- Model: Fine-tuned YOLO11 model for object detection.
- Dataset: Custom annotated dataset for maximum accuracy.
How to generate segmentation masks
Step 1: Prepare the model
Train or fine-tune a custom YOLO11 model, or use the Ultralytics Pretrained Models for object detection tasks.
Step 2: Auto annotation with SAM2
Integrate the SAM2 model to convert bounding boxes into segmentation masks.
# Install the necessary library
# pip install ultralytics
from ultralytics.data.annotator import auto_annotate
# Automatically annotate images using YOLO and SAM2 models
auto_annotate(data="Path/to/images/directory",
det_model="yolo11n.pt",
sam_model="sam2_b.pt")
Step 3: Generate and save masks
Run the script to save segmentation masks as .txt files in the images_auto_annotate_labels folder.
Step 4: Visualize the results
Use the following script to overlay segmentation masks on images.
import os
import cv2
import numpy as np
from ultralytics.utils.plotting import colors
# Define folder paths
image_folder = "images_directory" # Path to your images directory
mask_folder = "images_auto_annotate_labels" # Annotation masks directory
output_folder = "output_directory" # Path to save output images
os.makedirs(output_folder, exist_ok=True)
# Process each image
for image_file in os.listdir(image_folder):
image_path = os.path.join(image_folder, image_file)
mask_file = os.path.join(mask_folder,
os.path.splitext(image_file)[0] + ".txt")
img = cv2.imread(image_path) # Load the image
height, width, _ = img.shape
with open(mask_file, "r") as f: # Read the mask file
lines = f.readlines()
for line in lines:
data = line.strip().split()
color = colors(int(data[0]), True)
# Convert points to absolute coordinates
points = np.array([(float(data[i]) * width, float(data[i + 1])*height)
for i in range(1, len(data), 2)],
dtype=np.int32).reshape((-1, 1, 2))
overlay = img.copy()
cv2.fillPoly(overlay, [points], color=color)
alpha = 0.6
cv2.addWeighted(overlay, alpha, img, 1 - alpha, 0, img)
cv2.polylines(img, [points], isClosed=True, color=color, thickness=3)
# Save the output
output_path = os.path.join(output_folder, image_file)
cv2.imwrite(output_path, img)
print(f"Processed {image_file} and saved to {output_path}")
print("Processing complete.")
That's it! After completing Step 4, you'll be able to segment objects and view the total count for each segmented object in every frame.
Real-world applications
- Medical Imaging: Segment organs and anomalies in scans for diagnostics.
- Retail Analytics: Detect and segment customer activities or products.
- Robotics: Enable robots to identify objects in dynamic environments.
- Satellite Imagery: Analyze vegetation and urban areas for planning.
Explore more
Start building your object segmentation workflow today!🚀