Chapter 12: Model Deployment & MLOps Basics (2025 must-have)
Model Deployment & MLOps Basics (2025–2026 must-have), explained like we’re wrapping up our long journey together in Airoli — it’s evening now (January 29, 2026, around 5:43 PM IST), the street lights are on outside, your laptop fan is humming, and we’re finally moving from notebooks to real-world impact. This is the chapter that turns your churn model (or any project) from a cool Jupyter experiment into something stakeholders can actually use 24/7.
In 2026 India (especially Mumbai/Navi Mumbai/Hyderabad fintech, e-commerce, telecom), companies expect juniors to know basics of deployment + monitoring — not just train models. Pure notebook work is entry-level; production thinking (drift, APIs, cost, reproducibility) gets you mid-level interviews and better pay. MLOps is no longer optional — it’s table stakes.
We’ll use our Telco Churn XGBoost/RF model from earlier chapters as the running example.
1. Saving & Loading Models (joblib, pickle)
Models are just Python objects — save them so you can load later (inference, deployment).
joblib — preferred for scikit-learn/XGBoost/LightGBM (faster, better with large NumPy arrays).
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
import joblib # After training (from Ch9) best_model = xgb # or rf, cat_model # Save joblib.dump(best_model, 'churn_xgboost_model.joblib') print("Model saved!") # Load later (e.g., in production script) loaded_model = joblib.dump('churn_xgboost_model.joblib') pred = loaded_model.predict_proba(new_data)[:, 1] |
pickle — built-in, works everywhere, but slower/less secure for large models.
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 |
import pickle with open('churn_model.pkl', 'wb') as f: pickle.dump(best_model, f) # Load with open('churn_model.pkl', 'rb') as f: loaded = pickle.load(f) |
2026 tip: Use joblib for scikit-learn family, torch.save / safetensors for PyTorch/LLMs (safer, faster). Add requirements.txt (pip freeze > requirements.txt) and model card (README with metrics, usage).
2. Flask / FastAPI for Simple APIs
Serve predictions via HTTP API — frontend/mobile calls it.
Flask — lightweight, quick for MVPs (still used in 2026 for simple stuff).
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
# app.py (Flask) from flask import Flask, request, jsonify import joblib import pandas as pd app = Flask(__name__) model = joblib.load('churn_xgboost_model.joblib') @app.route('/predict', methods=['POST']) def predict(): data = request.get_json() df = pd.DataFrame([data]) # expect JSON like {"tenure": 12, "MonthlyCharges": 70.5, ...} # Preprocess same as training (encode, scale — assume you have functions) prob = model.predict_proba(df)[0][1] return jsonify({'churn_probability': prob, 'churn_risk': 'High' if prob > 0.5 else 'Low'}) if __name__ == '__main__': app.run(debug=True, port=5000) |
Run: python app.py → test with Postman/cURL: POST http://localhost:5000/predict
FastAPI (2026 winner for ML serving — async, auto Swagger docs, type hints, 3–10x faster than Flask for concurrency).
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
# app.py (FastAPI) from fastapi import FastAPI from pydantic import BaseModel import joblib import pandas as pd app = FastAPI(title="Telco Churn Prediction API") model = joblib.load('churn_xgboost_model.joblib') class Customer(BaseModel): tenure: int MonthlyCharges: float # Add other features... @app.post("/predict") def predict(customer: Customer): df = pd.DataFrame([customer.dict()]) # Preprocess... prob = model.predict_proba(df)[0][1] return {"churn_probability": float(prob), "risk": "High" if prob > 0.5 else "Low"} # Run: uvicorn app:app --reload |
Auto docs at http://localhost:8000/docs — huge win for teams. FastAPI vs Flask 2026: FastAPI for production/ML APIs (async, Pydantic validation, OpenAPI). Flask for quick prototypes or when you need Jinja templates.
3. Streamlit / Gradio Demos
Quick interactive UIs — no frontend needed.
Streamlit (Python-only, super fast for DS demos).
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
# app.py import streamlit as st import joblib import pandas as pd st.title("Telco Customer Churn Predictor 2026") st.write("Enter customer details — get churn risk in seconds!") model = joblib.load('churn_xgboost_model.joblib') tenure = st.slider("Tenure (months)", 0, 72, 12) monthly_charges = st.number_input("Monthly Charges (₹)", 20.0, 150.0, 70.5) # Add more inputs... if st.button("Predict Churn"): input_df = pd.DataFrame({ 'tenure': [tenure], 'MonthlyCharges': [monthly_charges], # ... fill others }) # Preprocess... prob = model.predict_proba(input_df)[0][1] st.success(f"Churn Probability: {prob:.2%}") if prob > 0.5: st.error("High Risk — Send retention offer!") else: st.info("Low Risk — Loyal customer") |
Run: streamlit run app.py → shareable link.
Gradio — similar, great for Hugging Face Spaces (share ML demos publicly).
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
import gradio as gr def predict_churn(tenure, monthly): # same logic return f"Churn prob: {prob:.2%}" demo = gr.Interface( fn=predict_churn, inputs=[gr.Slider(0,72), gr.Number()], outputs="text", title="Churn Predictor" ) demo.launch(share=True) # public link |
Choice: Streamlit for full apps/dashboards; Gradio for quick model demos (esp. NLP/CV).
4. Docker Basics
Docker = packages app + dependencies → runs identically everywhere (laptop → server → cloud).
Why for DS in 2026? Reproducibility (no “works on my machine”), easy scaling, cloud deployment.
Step-by-step basics:
- Install Docker Desktop (Windows/Mac) or Docker Engine (Linux).
- Create Dockerfile in project root:
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
# Use official Python image FROM python:3.11-slim # Set work dir WORKDIR /app # Copy requirements + install COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy code + model COPY . . # Expose port (FastAPI default 8000) EXPOSE 8000 # Run app CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"] |
- requirements.txt:
|
0 1 2 3 4 5 6 7 8 9 10 11 |
fastapi uvicorn joblib pandas xgboost # etc. |
- Build image:
|
0 1 2 3 4 5 6 |
docker build -t churn-api:latest . |
- Run container:
|
0 1 2 3 4 5 6 |
docker run -p 8000:8000 churn-api:latest |
Access http://localhost:8000/docs
Pro tip: Use multi-stage builds for smaller images; add .dockerignore (ignore data/, notebooks/).
5. MLflow or Weights & Biases Intro
MLflow (open-source, free, Databricks-backed) — tracks experiments, logs models, registry, serving.
Quickstart:
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
import mlflow import mlflow.sklearn # or xgboost mlflow.set_experiment("Telco_Churn_2026") with mlflow.start_run(): # Train model... mlflow.log_param("learning_rate", 0.05) mlflow.log_metric("roc_auc", 0.875) mlflow.log_artifact("feature_importance.png") mlflow.xgboost.log_model(xgb, "churn_model") |
UI: mlflow ui → localhost:5000 — compare runs, register model.
Weights & Biases (W&B) — cloud-first, beautiful UI, collaboration, sweeps (hyperparam tuning).
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
import wandb wandb.init(project="telco-churn-2026", entity="your-username") wandb.config.update({"learning_rate": 0.05, "n_estimators": 300}) # Train... wandb.log({"roc_auc": 0.875, "loss": 0.32}) wandb.watch(model) # gradients wandb.finish() |
Free tier generous; team love it for sweeps + reports.
Choice 2026: MLflow if open-source/self-hosted; W&B if you want zero-setup UI + collaboration.
6. Cloud Platforms Overview (AWS SageMaker, GCP Vertex AI, Azure ML)
AWS SageMaker (market leader ~34% 2025–2026) — full MLOps, notebooks, training, endpoints, monitoring. Strong in custom algos, Inferentia chips for cheap inference.
GCP Vertex AI (~22%) — intuitive UI, AutoML strong, TPU for fast training, integrates BigQuery. Great for data-heavy/NLP.
Azure ML (~29%) — best if Microsoft stack (Teams, Power BI, Purview governance). Confidential computing, strong regulated industries.
Quick comparison (2026 vibes):
| Aspect | AWS SageMaker | GCP Vertex AI | Azure ML |
|---|---|---|---|
| Market Share (2025) | ~34% | ~22% | ~29% |
| Best For | Scale, custom, AWS ecosystem | AutoML, data integration, UI | Microsoft shops, compliance |
| Training Speed | GPU/Trainium good | TPU fastest for some | Good, but less specialized |
| Cost | Savings Plans up to 64% off | Pay-per-use, can be high data | Complex, but enterprise deals |
| Ease for Beginners | Steep (many services) | Most intuitive | Good notebooks |
| GenAI/LLM | Bedrock + SageMaker | Gemini + Vertex | OpenAI + Azure AI |
For you in Airoli 2026: Start with free tier (GCP Vertex or Azure) — easy notebooks. If job at AWS-heavy company → SageMaker. All have free credits.
Final Project Tip: Dockerize your FastAPI churn API → push to Docker Hub → deploy to Render/Fly.io (free tier) or cloud (SageMaker endpoint). Track with MLflow/W&B. Add Streamlit frontend. Boom — production portfolio piece!
That’s Chapter 12 — you’re now job-ready for deployment!
