Overview
FastAPI is the go-to framework for serving ML models. It’s fast, has automatic documentation, and handles async requests efficiently.
Basic Setup
pip install fastapi uvicorn pydantic
from fastapi import FastAPI
from pydantic import BaseModel
import pickle
app = FastAPI(title="ML Model API")
# Load model at startup
with open("model.pkl", "rb") as f:
model = pickle.load(f)
class PredictionInput(BaseModel):
features: list[float]
class PredictionOutput(BaseModel):
prediction: float
confidence: float
@app.post("/predict", response_model=PredictionOutput)
async def predict(input: PredictionInput):
pred = model.predict([input.features])[0]
proba = model.predict_proba([input.features])[0].max()
return PredictionOutput(prediction=pred, confidence=proba)
Running the Server
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
Health Check Endpoint
@app.get("/health")
async def health():
return {"status": "healthy", "model_loaded": model is not None}
Batch Predictions
class BatchInput(BaseModel):
instances: list[list[float]]
@app.post("/predict/batch")
async def predict_batch(input: BatchInput):
predictions = model.predict(input.instances)
return {"predictions": predictions.tolist()}
Docker Deployment
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
docker build -t ml-api .
docker run -p 8000:8000 ml-api
Production Tips
- Use Gunicorn:
gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker - Add rate limiting: Use
slowapior nginx - Monitor latency: Add Prometheus metrics
- Cache predictions: Use Redis for repeated inputs
- Validate inputs: Pydantic handles this automatically
π¬ Comments
Comments are coming soon! We're setting up our discussion system.
In the meantime, feel free to contact us with your feedback.