Image Segmentation with SAM 2 - The Ultimate Guide π·

Introduction
In this comprehensive guide, we will explore how to implement image segmentation using Segment Anything Model (SAM) version 2. This cutting-edge technology allows us to segment any object within an image without the need for extensive training data or complicated configurations. By the end of this tutorial, you’ll have a robust pipeline for performing precise and efficient image segmentation tasks.
Prerequisites
- Python 3.10+
torchversion >=2.0.1transformersversion >=4.26.0matplotlibopencv-python
πΊ Watch: Neural Networks Explained
Video by 3Blue1Brown
Install the required packages with:
pip install torch==2.0.1 transformers==4.26.0 matplotlib opencv-python
Step 1: Project Setup
First, create a new directory for your project and initialize it as a Python package if necessary.
Inside this directory, set up the environment by creating the following files:
requirements.txt: List of required packages.setup.py(if you plan to distribute your code).
Add the following content to your requirements.txt:
torch>=2.0.1
transformers>=4.26.0
matplotlib
opencv-python
Then, in your terminal or command prompt, navigate to your project directory and run:
pip install -r requirements.txt
Step 2: Core Implementation
The core of our implementation involves integrating SAM into a basic image segmentation pipeline. Below is a complete example:
import torch
from transformers import SamPredictor, sam_model_registry
from PIL import Image
import matplotlib.pyplot as plt
import cv2
def main_function():
# Define the model type and checkpoint path
model_type = "vit_h"
checkpoint_path = "path/to/sam_vit_h_4b8939.pth"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Initialize SAM predictor
predictor = SamPredictor(sam_model_registry[model_type].to(device))
# Load an image from file and convert it to RGB format
image_path = 'path/to/image.jpg'
img_pil = Image.open(image_path).convert("RGB")
img_cv2 = cv2.cvtColor(np.array(img_pil), cv2.COLOR_RGB2BGR)
# Set the image in SAM predictor for segmentation
predictor.set_image(img_cv2)
# Optionally, provide input prompt points and masks here.
# For simplicity, let's assume we have some prompts ready.
# Generate a mask using prompts (if any)
# Here is an example where you might specify coordinates of a point on the image
point_coords = [500, 375] # Example coordinate for segmentation
masks, _, _ = predictor.predict(point_coords=point_coords)
# Visualize the mask over original image
plt.figure(figsize=(10, 10))
plt.imshow(img_cv2)
show_mask(masks[0], plt.gca(), random_color=True) # Function to visualize mask
def show_mask(mask, ax=None, random_color=False):
"""Visualizes a binary mask by plotting it over the original image."""
if not ax:
fig, ax = plt.subplots(1, figsize=(20, 20))
if random_color:
color = np.concatenate([np.random.random(3), np.array([.3, .3, .3])], axis=0)
else:
color = np.array([251/255, 49/255, 47/255, 0.6])
h, w = mask.shape[-2:]
mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
ax.imshow(mask_image)
main_function()
Step 3: Configuration
Customize the model_type and checkpoint_path variables according to your SAM model version. Additionally, adjust the path to your input image and point coordinates for segmentation.
# Example customization
model_type = "vit_b" # Change to 'vit_h' or other SAM variants if using a different checkpoint.
checkpoint_path = "path/to/sam_vit_b_01ec64.pth"
image_path = 'your/path/to/image.jpg'
point_coords = [750, 325] # Adjust point coordinates based on your image and target object
Step 4: Running the Code
To execute your segmentation script:
python main.py
# Expected output:
# A matplotlib window with the input image and overlaid mask.
Make sure to replace main.py with the actual filename of your Python script.
Step 5: Advanced Tips
For optimal performance, consider preprocessing images to standard sizes before feeding them into SAM. Additionally, experiment with different point prompts and masks for more accurate segmentation results. Lastly, integrate this model into web applications or larger pipelines by utilizing Flask or Django frameworks for serving image segmentation services.
Results
Upon completing the tutorial, you should be able to load an image, run the Segment Anything Model on it, and visualize the output mask overlaid onto the original image. The generated segmentations will help in various computer vision tasks such as object detection, image editing, or automated content recognition.
Going Further
- Explore SAM’s GitHub repository for updates and further documentation: https://github.com/facebookresearch/segment-anything
- Use
SamAutomaticMaskGeneratorclass to automatically generate masks without manual points input. - Integrate SAM with other ML models in PyTorch [2] [1] for more complex computer vision tasks, such as multi-label image classification.
Conclusion
In this guide, we’ve learned how to set up and use the Segment Anything Model (SAM) version 2 for precise image segmentation. The steps covered here provide a strong foundation for developers looking to leverage state-of-the-art models in their projects or applications.
π References & Sources
- PyTorch - Documentation documentation. Accessed 2026-01-05.
- PyTorch - Documentation documentation. Accessed 2026-01-05.
All links verified at time of publication.
π¬ Comments
Comments are coming soon! We're setting up our discussion system.
In the meantime, feel free to contact us with your feedback.