Image Segmentation with SAM 2 - The Ultimate Guide ๐Ÿ“ท

Chart

Introduction

In this comprehensive guide, we will explore how to implement image segmentation using Segment Anything Model (SAM) version 2. This cutting-edge technology allows us to segment any object within an image without the need for extensive training data or complicated configurations. By the end of this tutorial, you’ll have a robust pipeline for performing precise and efficient image segmentation tasks.

Prerequisites

  • Python 3.10+
  • torch version >=2.0.1
  • transformers version >=4.26.0
  • matplotlib
  • opencv-python

๐Ÿ“บ Watch: Neural Networks Explained

Video by 3Blue1Brown

Install the required packages with:

pip install torch==2.0.1 transformers==4.26.0 matplotlib opencv-python

Step 1: Project Setup

First, create a new directory for your project and initialize it as a Python package if necessary.

Inside this directory, set up the environment by creating the following files:

  • requirements.txt: List of required packages.
  • setup.py (if you plan to distribute your code).

Add the following content to your requirements.txt:

torch>=2.0.1
transformers>=4.26.0
matplotlib
opencv-python

Then, in your terminal or command prompt, navigate to your project directory and run:

pip install -r requirements.txt

Step 2: Core Implementation

The core of our implementation involves integrating SAM into a basic image segmentation pipeline. Below is a complete example:

import torch
from transformers import SamPredictor, sam_model_registry
from PIL import Image
import matplotlib.pyplot as plt
import cv2

def main_function():
    # Define the model type and checkpoint path
    model_type = "vit_h"
    checkpoint_path = "path/to/sam_vit_h_4b8939.pth"

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # Initialize SAM predictor
    predictor = SamPredictor(sam_model_registry[model_type].to(device))
    
    # Load an image from file and convert it to RGB format
    image_path = 'path/to/image.jpg'
    img_pil = Image.open(image_path).convert("RGB")
    img_cv2 = cv2.cvtColor(np.array(img_pil), cv2.COLOR_RGB2BGR)
    
    # Set the image in SAM predictor for segmentation
    predictor.set_image(img_cv2)

    # Optionally, provide input prompt points and masks here.
    # For simplicity, let's assume we have some prompts ready.

    # Generate a mask using prompts (if any)
    # Here is an example where you might specify coordinates of a point on the image
    point_coords = [[500, 375]]  # Example coordinate for segmentation

    masks, _, _ = predictor.predict(point_coords=point_coords)

    # Visualize the mask over original image
    plt.figure(figsize=(10, 10))
    plt.imshow(img_cv2)
    show_mask(masks[0], plt.gca(), random_color=True)  # Function to visualize mask

def show_mask(mask, ax=None, random_color=False):
    """Visualizes a binary mask by plotting it over the original image."""
    if not ax:
        fig, ax = plt.subplots(1, figsize=(20, 20))
    
    if random_color:
        color = np.concatenate([np.random.random(3), np.array([.3, .3, .3])], axis=0)
    else:
        color = np.array([251/255, 49/255, 47/255, 0.6])
    
    h, w = mask.shape[-2:]
    mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
    ax.imshow(mask_image)

main_function()

Step 3: Configuration

Customize the model_type and checkpoint_path variables according to your SAM model version. Additionally, adjust the path to your input image and point coordinates for segmentation.

# Example customization
model_type = "vit_b" # Change to 'vit_h' or other SAM variants if using a different checkpoint.
checkpoint_path = "path/to/sam_vit_b_01ec64.pth"
image_path = 'your/path/to/image.jpg'
point_coords = [[750, 325]]  # Adjust point coordinates based on your image and target object

Step 4: Running the Code

To execute your segmentation script:

python main.py
# Expected output:
# A matplotlib window with the input image and overlaid mask.

Make sure to replace main.py with the actual filename of your Python script.

Step 5: Advanced Tips

For optimal performance, consider preprocessing images to standard sizes before feeding them into SAM. Additionally, experiment with different point prompts and masks for more accurate segmentation results. Lastly, integrate this model into web applications or larger pipelines by utilizing Flask or Django frameworks for serving image segmentation services.

Results

Upon completing the tutorial, you should be able to load an image, run the Segment Anything Model on it, and visualize the output mask overlaid onto the original image. The generated segmentations will help in various computer vision tasks such as object detection, image editing, or automated content recognition.

Going Further

  • Explore SAM’s GitHub repository for updates and further documentation: https://github.com/facebookresearch/segment-anything
  • Use SamAutomaticMaskGenerator class to automatically generate masks without manual points input.
  • Integrate SAM with other ML models in PyTorch [2] [1] for more complex computer vision tasks, such as multi-label image classification.

Conclusion

In this guide, we’ve learned how to set up and use the Segment Anything Model (SAM) version 2 for precise image segmentation. The steps covered here provide a strong foundation for developers looking to leverage state-of-the-art models in their projects or applications.


๐Ÿ“š References & Sources

  1. PyTorch - Documentation documentation. Accessed 2026-01-05.
  2. PyTorch - Documentation documentation. Accessed 2026-01-05.

All links verified at time of publication.