Deploying ML Models with FastAPI
Overview FastAPI is the go-to framework for serving ML models. It’s fast, has automatic documentation, and handles async requests efficiently. Basic Setup pip install fastapi uvicorn pydantic from fastapi import FastAPI from pydantic import BaseModel import pickle app = FastAPI(title="ML Model API") # Load model at startup with open("model.pkl", "rb") as f: model = pickle.load(f) class PredictionInput(BaseModel): features: list[float] class PredictionOutput(BaseModel): prediction: float confidence: float @app.post("/predict", response_model=PredictionOutput) async def predict(input: PredictionInput): pred = model.predict([input.features])[0] proba = model.predict_proba([input.features])[0].max() return PredictionOutput(prediction=pred, confidence=proba) Running the Server uvicorn main:app --host 0.0.0.0 --port 8000 --reload Health Check Endpoint @app.get("/health") async def health(): return {"status": "healthy", "model_loaded": model is not None} Batch Predictions class BatchInput(BaseModel): instances: list[list[float]] @app.post("/predict/batch") async def predict_batch(input: BatchInput): predictions = model.predict(input.instances) return {"predictions": predictions.tolist()} Docker Deployment FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . EXPOSE 8000 CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] docker build -t ml-api . docker run -p 8000:8000 ml-api Production Tips Use Gunicorn: gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker Add rate limiting: Use slowapi or nginx Monitor latency: Add Prometheus metrics Cache predictions: Use Redis for repeated inputs Validate inputs: Pydantic handles this automatically Key Resources FastAPI Documentation Uvicorn