Overview
ML projects have unique version control challenges: large files, experiment tracking, and model versioning. This guide covers Git best practices for ML.
.gitignore for ML
# Data
data/
*.csv
*.parquet
*.json
!config.json
# Models
*.pt
*.pth
*.onnx
*.pkl
models/
# Checkpoints
checkpoints/
*.ckpt
# Logs
logs/
wandb/
mlruns/
# Environment
.venv/
__pycache__/
*.pyc
# Notebooks
.ipynb_checkpoints/
# IDE
.vscode/
.idea/
Git LFS for Large Files
# Install Git LFS
git lfs install
# Track large files
git lfs track "*.pt"
git lfs track "*.onnx"
git lfs track "data/*.parquet"
# Commit .gitattributes
git add .gitattributes
git commit -m "Configure Git LFS"
DVC for Data Versioning
# Install DVC
pip install dvc
# Initialize
dvc init
# Track data
dvc add data/training.csv
# Push to remote storage
dvc remote add -d storage s3://my-bucket/dvc
dvc push
# Pull data
dvc pull
Branching Strategy
main
├── develop
│ ├── feature/new-model
│ ├── feature/data-pipeline
│ └── experiment/bert-large
└── release/v1.0
Commit Messages
# Format: type(scope): description
feat(model): add BERT classifier
fix(data): handle missing values in preprocessing
exp(training): test learning rate 1e-4
docs(readme): add installation instructions
refactor(pipeline): simplify data loading
Experiment Tracking with Git
# Create experiment branch
git checkout -b exp/lr-sweep-001
# Run experiment
python train.py --lr 0.001
# Commit results
git add results/
git commit -m "exp: lr=0.001, acc=0.92"
# Tag successful experiments
git tag -a exp-lr001-acc92 -m "Best LR experiment"
Pre-commit Hooks
# .pre-commit-config.yaml
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.1.6
hooks:
- id: ruff
args: [--fix]
- id: ruff-format
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- id: check-added-large-files
args: ['--maxkb=1000']
pip install pre-commit
pre-commit install
Best Practices
- Never commit data: Use DVC or Git LFS
- Never commit secrets: Use environment variables
- Small commits: One logical change per commit
- Meaningful messages: Describe what and why
- Branch per experiment: Easy to compare and revert
💬 Comments
Comments are coming soon! We're setting up our discussion system.
In the meantime, feel free to contact us with your feedback.