Overview
MLflow is the most popular open-source platform for managing the ML lifecycle. It handles experiment tracking, model packaging, and deployment.
Installation
pip install mlflow
mlflow ui # Start tracking server at localhost:5000
Experiment Tracking
import mlflow
mlflow.set_experiment("my-classification")
with mlflow.start_run():
# Log parameters
mlflow.log_param("learning_rate", 0.01)
mlflow.log_param("epochs", 100)
# Train model...
# Log metrics
mlflow.log_metric("accuracy", 0.95)
mlflow.log_metric("f1_score", 0.93)
# Log model
mlflow.sklearn.log_model(model, "model")
# Log artifacts
mlflow.log_artifact("confusion_matrix.png")
Autologging
import mlflow.sklearn
mlflow.sklearn.autolog()
# All sklearn metrics logged automatically
model = RandomForestClassifier()
model.fit(X_train, y_train)
Model Registry
# Register a model
mlflow.register_model("runs:/abc123/model", "ProductionClassifier")
# Load from registry
model = mlflow.pyfunc.load_model("models:/ProductionClassifier/Production")
Model Serving
# Serve model as REST API
mlflow models serve -m "models:/ProductionClassifier/1" -p 5001
# Query the API
curl -X POST http://localhost:5001/invocations \
-H "Content-Type: application/json" \
-d '{"inputs": [1.0, 2.0, 3.0, 4.0]}'
Project Structure
my_project/
├── MLproject # Project definition
├── conda.yaml # Environment
├── train.py # Training script
└── data/
# MLproject
name: my_project
conda_env: conda.yaml
entry_points:
main:
parameters:
learning_rate: {type: float, default: 0.01}
command: "python train.py --lr {learning_rate}"
Best Practices
- Use experiments: Group related runs
- Tag runs: Add metadata for filtering
- Version data: Log dataset hashes
- Automate: Use autolog when possible
- Compare: Use MLflow UI to compare runs
💬 Comments
Comments are coming soon! We're setting up our discussion system.
In the meantime, feel free to contact us with your feedback.