MLOps with MLflow
Overview MLflow is the most popular open-source platform for managing the ML lifecycle. It handles experiment tracking, model packaging, and deployment. Installation pip install mlflow mlflow ui # Start tracking server at localhost:5000 Experiment Tracking import mlflow mlflow.set_experiment("my-classification") with mlflow.start_run(): # Log parameters mlflow.log_param("learning_rate", 0.01) mlflow.log_param("epochs", 100) # Train model... # Log metrics mlflow.log_metric("accuracy", 0.95) mlflow.log_metric("f1_score", 0.93) # Log model mlflow.sklearn.log_model(model, "model") # Log artifacts mlflow.log_artifact("confusion_matrix.png") Autologging import mlflow.sklearn mlflow.sklearn.autolog() # All sklearn metrics logged automatically model = RandomForestClassifier() model.fit(X_train, y_train) Model Registry # Register a model mlflow.register_model("runs:/abc123/model", "ProductionClassifier") # Load from registry model = mlflow.pyfunc.load_model("models:/ProductionClassifier/Production") Model Serving # Serve model as REST API mlflow models serve -m "models:/ProductionClassifier/1" -p 5001 # Query the API curl -X POST http://localhost:5001/invocations \ -H "Content-Type: application/json" \ -d '{"inputs": [1.0, 2.0, 3.0, 4.0]}' Project Structure my_project/ ├── MLproject # Project definition ├── conda.yaml # Environment ├── train.py # Training script └── data/ # MLproject name: my_project conda_env: conda.yaml entry_points: main: parameters: learning_rate: {type: float, default: 0.01} command: "python train.py --lr {learning_rate}" Best Practices Use experiments: Group related runs Tag runs: Add metadata for filtering Version data: Log dataset hashes Automate: Use autolog when possible Compare: Use MLflow UI to compare runs Key Resources MLflow Documentation MLflow GitHub