🖌️ Generate Images with Stable Diffusion XL on Mac M1/M2 (January 2026)

Introduction

In this comprehensive tutorial, we’ll guide you through setting up and using Stable Diffusion XL, a powerful text-to-image model from CompVis, on your Mac M1 or M2 device. By the end of this guide, you’ll have a local setup that can generate stunning images based on textual descriptions. This is especially useful for creative professionals, researchers, or anyone interested in exploring the capabilities of large language models in generating visual content.

Prerequisites

Before we begin, ensure you have the following prerequisites installed:

  1. Python 3.10+: If you haven’t already, install Python using Homebrew:

    brew install python@3.10
    
  2. PyTorch 1.12 with MPS backend (for Mac M1/M2): To leverage the power of your Mac’s GPU.

    pip install torch==1.12 --extra-index-url https://download.pytorch.org/whl/cu102
    
  3. Transformers 4.18: The library that provides Stable Diffusion XL model.

    pip install transformers==4.18
    
  4. Gradio 3.7: For creating a simple UI to interact with our model.

    pip install gradio==3.7
    

Step 1: Project Setup

First, let’s set up a new directory for our project and navigate into it:

mkdir stable_diffusion_xl
cd stable_diffusion_xl

Next, create a requirements.txt file to keep track of the packages we’ll use:

python=3.10
torch==1.12 --extra-index-url https://download.pytorch.org/whl/cu102
transformers==4.18
gradio==3.7

Now, install the required packages using pip:

pip install -r requirements.txt

Step 2: Core Implementation

Create a new Python file named diffusion.py and add the following code:

import torch
from transformers import StableDiffusionXLProcessor, StableDiffusionXLPipeline

class ImageGenerator:
    def __init__(self):
        self.model_id = "CompVis/stable-diffusion-xl-base-1.0"
        self.processor = StableDiffusionXLProcessor.from_pretrained(self.model_id)
        self.pipe = StableDiffusionXLPipeline.from_pretrained(self.model_id, torch_dtype=torch.float16).to("mps")

    def generate_image(self, prompt):
        inputs = self.processor(text=prompt, num_inference_steps=50, guidance_scale=7.5, width=640, height=640)
        image = self.pipe(**inputs).images[0]
        return image

def main():
    generator = ImageGenerator()
    prompt = "Astronaut riding a horse on Mars"
    generated_image = generator.generate_image(prompt)
    generated_image.save("generated_image.png")

if __name__ == "__main__":
    main()

This script initializes the Stable Diffusion XL model with a specified ID, processes textual inputs, generates images based on the prompt using the generate_image method, and saves the result as “generated_image.png”.

Step 3: Configuration

No additional configuration is required for this project. The default parameters provided in diffusion.py should work well for most use cases.

Step 4: Running the Code

Run the script with:

python diffusion.py

After running, you’ll find a generated image named “generated_image.png” in your project directory. You can verify its correctness by opening it using any image viewer.

If you encounter any errors related to GPU memory, try reducing the width and height parameters during input processing:

inputs = self.processor(text=prompt, num_inference_steps=50, guidance_scale=7.5, width=320, height=320)

Step 5: Advanced Tips

  1. Fine-tuning the model: You can fine-tune Stable Diffusion XL on your custom dataset using transformers.Trainer to adapt it to specific artistic styles or domains.

  2. Using different models: Explore other text-to-image models like DALL-E 3, Imagen, or Midjourney by replacing the model ID in ImageGenerator’s constructor.

Results

Upon completion of this tutorial, you’ll have successfully generated an image using Stable Diffusion XL on your Mac M1/M2 device. The generated image will be saved as “generated_image.png” and should depict the scene described in the prompt (“Astronaut riding a horse on Mars”).

Going Further

Here are some next steps to build upon this project:

  1. Create a Gradio UI: Use Gradio to create an easy-to-use interface for your image generation model: Gradio Quick Start.

  2. Experiment with different prompts and parameters: Try varying the prompt, number of inference steps, guidance scale, and image dimensions to see how they affect the generated images.

  3. Fine-tune the model on custom data: Follow the official guide on fine-tuning Stable Diffusion for more advanced use cases.

Conclusion

In this tutorial, we successfully set up and used Stable Diffusion XL to generate images locally on a Mac M1/M2 device. We covered installation, setup, implementation, configuration, running the code, and advanced tips to enhance your experience with text-to-image models. Happy image generating! 📸🚀