Overview

Docker containers ensure your ML application runs identically everywhere. This guide covers containerization best practices for ML workloads.

Basic Dockerfile

FROM python:3.11-slim

WORKDIR /app

# Install dependencies first (cached layer)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Multi-Stage Build

Reduce image size by separating build and runtime:

# Build stage
FROM python:3.11 AS builder

WORKDIR /app
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

# Runtime stage
FROM python:3.11-slim

WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY . .

ENV PATH=/root/.local/bin:$PATH
CMD ["python", "main.py"]

GPU Support

FROM nvidia/cuda:12.1-runtime-ubuntu22.04

RUN apt-get update && apt-get install -y python3 python3-pip

WORKDIR /app
COPY requirements.txt .
RUN pip3 install --no-cache-dir -r requirements.txt

COPY . .
CMD ["python3", "train.py"]

Run with GPU:

docker run --gpus all my-ml-image

PyTorch Image

FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .
CMD ["python", "inference.py"]

Docker Compose

version: '3.8'

services:
  api:
    build: .
    ports:
      - "8000:8000"
    volumes:
      - ./models:/app/models
    environment:
      - MODEL_PATH=/app/models/model.pt
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

  redis:
    image: redis:alpine
    ports:
      - "6379:6379"

.dockerignore

.git
.venv
__pycache__
*.pyc
.env
data/
logs/
*.pt
*.pth
checkpoints/
.pytest_cache

Best Practices

  1. Pin versions: python:3.11.6-slim not python:latest
  2. Use slim images: Reduce attack surface and size
  3. Layer caching: Put rarely-changing layers first
  4. Non-root user: Run as non-root for security
  5. Health checks: Add health check endpoints

Security

# Create non-root user
RUN useradd -m -u 1000 appuser
USER appuser

# Health check
HEALTHCHECK --interval=30s --timeout=3s \
  CMD curl -f http://localhost:8000/health || exit 1

Image Size Comparison

Base ImageSize
python:3.111.0 GB
python:3.11-slim150 MB
python:3.11-alpine50 MB
pytorch/pytorch6.5 GB

Key Resources