Skip to content

Deploying ML Models with FastAPI

Overview FastAPI is the go-to framework for serving ML models. It’s fast, has automatic documentation, and handles async requests efficiently. Basic Setup pip install fastapi uvicorn pydantic from fastapi import FastAPI from pydantic import BaseModel import pickle app = FastAPI(title="ML Model API") # Load model at startup with open("model.pkl", "rb") as f: model = pickle.load(f) class PredictionInput(BaseModel): features: list[float] class PredictionOutput(BaseModel): prediction: float confidence: float @app.post("/predict", response_model=PredictionOutput) async def predict(input: PredictionInput): pred = model.predict([input.features])[0] proba = model.predict_proba([input.features])[0].max() return PredictionOutput(prediction=pred, confidence=proba) Running the Server uvicorn main:app --host 0.0.0.0 --port 8000 --reload Health Check Endpoint @app.get("/health") async def health(): return {"status": "healthy", "model_loaded": model is not None} Batch Predictions class BatchInput(BaseModel): instances: list[list[float]] @app.post("/predict/batch") async def predict_batch(input: BatchInput): predictions = model.predict(input.instances) return {"predictions": predictions.tolist()} Docker Deployment FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . EXPOSE 8000 CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] docker build -t ml-api . docker run -p 8000:8000 ml-api Production Tips Use Gunicorn: gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker Add rate limiting: Use slowapi or nginx Monitor latency: Add Prometheus metrics Cache predictions: Use Redis for repeated inputs Validate inputs: Pydantic handles this automatically Key Resources FastAPI Documentation Uvicorn

December 1, 2025 · 1 min · 199 words · BlogIA Team

Containerizing ML Applications with Docker

Overview Docker containers ensure your ML application runs identically everywhere. This guide covers containerization best practices for ML workloads. Basic Dockerfile FROM python:3.11-slim WORKDIR /app # Install dependencies first (cached layer) COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy application code COPY . . EXPOSE 8000 CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] Multi-Stage Build Reduce image size by separating build and runtime: # Build stage FROM python:3.11 AS builder WORKDIR /app COPY requirements.txt . RUN pip install --user --no-cache-dir -r requirements.txt # Runtime stage FROM python:3.11-slim WORKDIR /app COPY --from=builder /root/.local /root/.local COPY . . ENV PATH=/root/.local/bin:$PATH CMD ["python", "main.py"] GPU Support FROM nvidia/cuda:12.1-runtime-ubuntu22.04 RUN apt-get update && apt-get install -y python3 python3-pip WORKDIR /app COPY requirements.txt . RUN pip3 install --no-cache-dir -r requirements.txt COPY . . CMD ["python3", "train.py"] Run with GPU: ...

December 1, 2025 · 2 min · 295 words · BlogIA Team