Slides 🖼️

Week 3: CI/CD for ML

Learning Objectives

  • Learn the basics of CI/CD
  • Leverage the power of CI/CD tools for ML projects with CML
  • Integrate an ML model into the FastAPI framework
  • Build and test a Docker container running a web API service
  • Deploy the resulting Docker container to cloud

Steps

Introduction to GitHub Actions and CML

  • Introduction to GitHub Actions
  • Introduction to CML
Create.github/workflows/train-model.yaml
name: train-model
on:
  push:
    paths:
      - "data/**"
      - "src/**"
      - "params.yaml"
      - "dvc.*"
jobs:
  train-model:
    runs-on: ubuntu-latest
    environment: cloud
    permissions:
      contents: read
      id-token: write
    steps:
      - uses: actions/checkout@v3
        with:
          ref: ${{ github.event.pull_request.head.sha }}
      - uses: iterative/setup-cml@v1
      - uses: actions/setup-python@v2
        with:
          python-version: "3.10"
      - uses: actions/setup-node@v1
        with:
          node-version: '16'
      - name: SetupGitUser
        run: cml ci
        env:
          REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
      - name: TrainModel
        env:
          REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        run: |
          pip install -r requirements.txt
          dvc pull
          dvc repro
          dvc push
          # Create CML report
          echo "## Metrics" >> report.md
          dvc metrics show --md >> report.md
          echo "## Feature Importances" >> report.md
          csv2md reports/feat_imp.csv >> report.md
          echo "## Confusion Matrix" >> report.md
          echo '![](reports/figures/cm.png)' >> report.md
          cml comment create report.md
  • Push workflow file with git
  • Modify some model parameters (e.g. max_depth), rerun the pipeline (dvc exp run) and push changes to DVC remote and git
  • Review GitHub Actions runs

Web App Development

Create web application src/app/main.py
import json
import sys
from pathlib import Path

import uvicorn

src_path = Path(__file__).parent.parent.resolve()
sys.path.append(str(src_path))

from typing import List

import pandas as pd
from fastapi import Body, FastAPI, Request
from fastapi.middleware.cors import CORSMiddleware
from joblib import load
from pydantic import BaseModel

from utils.load_params import load_params

app = FastAPI()
# https://fastapi.tiangolo.com/tutorial/cors/#use-corsmiddleware
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

params = load_params(params_path='params.yaml')
model_path = params.train.model_path
feat_cols = params.base.feat_cols
model = load(filename=model_path)

class Customer(BaseModel):
    CreditScore: int
    Age: int
    Tenure: int
    Balance: float
    NumOfProducts: int
    HasCrCard: int
    IsActiveMember: int
    EstimatedSalary: float

class Request(BaseModel):
    data: List[Customer]

@app.post("/predict")
async def predict(info: Request = Body(..., example={
    "data": [
        {
            "CreditScore": 619,
            "Age": 42,
            "Tenure": 2,
            "Balance": 0,
            "NumOfProducts": 1,
            "HasCrCard": 1,
            "IsActiveMember": 1,
            "EstimatedSalary": 101348.88
        },
        {
            "CreditScore": 699,
            "Age": 39,
            "Tenure": 21,
            "Balance": 0,
            "NumOfProducts": 2,
            "HasCrCard": 0,
            "IsActiveMember": 0,
            "EstimatedSalary": 93826.63
        }
    ]
})):
    json_list = json.loads(info.json())
    data = json_list['data']
    input_data = pd.DataFrame(data)
    probs = model.predict_proba(input_data)[:,0]
    probs = probs.tolist()
    return probs

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)
  • Test API

    • Run web app

      uvicorn src.app.main:app --host 0.0.0.0 --port 8080
      
    In a separate shell session (while the app is still running) submit the following test POST request
      curl -X 'POST' \
        'http://0.0.0.0:8080/predict' \
        -H 'accept: application/json' \
        -H 'Content-Type: application/json' \
        -d '{
        "data": [
          {
            "CreditScore": 400,
            "Age": 50,
            "Tenure": 2,
            "Balance": 0,
            "NumOfProducts": 1,
            "HasCrCard": 1,
            "IsActiveMember": 1,
            "EstimatedSalary": 101348.88
          },
          {
            "CreditScore": 699,
            "Age": 39,
            "Tenure": 21,
            "Balance": 0,
            "NumOfProducts": 2,
            "HasCrCard": 0,
            "IsActiveMember": 0,
            "EstimatedSalary": 93826.63
          }
        ]
      }'
    

    The output of this command should be a list of two numbers e.g.: [0.6912380116959064,0.9133333333333333]

Dockerize Web App

  • Put web app into Dockerfile

    FROM python:3.10
    ENV PYTHONPATH=/app
    WORKDIR /app
    COPY . .
    RUN pip install -U pip 
    RUN pip install -r requirements.txt
    EXPOSE 8080
    CMD uvicorn src.app.main:app --host 0.0.0.0 --port 8080
    
  • Ignore copying unnecessary directories by creating a .dockerignore file

    .dvc
    data
    notebooks
    reports
    
  • Build and run docker container locally

    docker build . -t churn-api-image
    docker run -p 8080:8080 churn-api-image
    
  • Test web app running in docker container

  • Push files to git

Deploy to fly.io

  • Create an account on fly.io You need to enter your credit card information, but you won’t be charged because this project will stay under the free-tier limits.

  • Install flyctl: https://fly.io/docs/hands-on/install-flyctl/

    curl -L https://fly.io/install.sh | sh
    
  • Authenticate

    flyctl auth login
    
  • Launch application (this will take some time)

    fly launch 
    # accept all default values
    # except Yes to creating Postgres db and "deploy now"
    
To verify that it worked, run the following command
curl -X 'POST' \
  'https://<YOUR_APP_NAME>.fly.dev/predict' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "data": [
    {
      "CreditScore": 400,
      "Age": 50,
      "Tenure": 2,
      "Balance": 0,
      "NumOfProducts": 1,
      "HasCrCard": 1,
      "IsActiveMember": 1,
      "EstimatedSalary": 101348.88
    },
    {
      "CreditScore": 699,
      "Age": 39,
      "Tenure": 21,
      "Balance": 0,
      "NumOfProducts": 2,
      "HasCrCard": 0,
      "IsActiveMember": 0,
      "EstimatedSalary": 93826.63
    }
  ]
}'
The above command will also generate fly.toml file that will look similar to the one below
# fly.toml file generated for wispy-sun-5093 on 2022-09-04T23:05:06-04:00
app = "wispy-sun-5093"
kill_signal = "SIGINT"
kill_timeout = 5
processes = []

[env]

[experimental]
  allowed_public_ports = []
  auto_rollback = true

[[services]]
  http_checks = []
  internal_port = 8080
  processes = ["app"]
  protocol = "tcp"
  script_checks = []
  [services.concurrency]
    hard_limit = 25
    soft_limit = 20
    type = "connections"

  [[services.ports]]
    force_https = true
    handlers = ["http"]
    port = 80

  [[services.ports]]
    handlers = ["tls", "http"]
    port = 443

  [[services.tcp_checks]]
    grace_period = "1s"
    interval = "15s"
    restart_limit = 0
    timeout = "2s"
  • Push fly.toml to git

CI/CD: Automatic application deployment

Create .github/workflows/deploy-api.yaml
name: deploy
on:
  push:
    tags:
      - "deploy-v*"
jobs:
  Deploy:
    runs-on: ubuntu-latest
    environment: cloud
    permissions:
      contents: read
      id-token: write
    steps:
      - uses: actions/checkout@v3
      - uses: iterative/setup-dvc@v1
      - uses: actions/setup-python@v2
        with:
          python-version: "3.10"
      - name: PullModel
        env:
          REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        run: |
          pip install dvc[s3]
          dvc pull models/clf-model.joblib
      - uses: superfly/flyctl-actions/setup-flyctl@master
      - name: DeployApp
        env: 
          FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}
        run: flyctl deploy --remote-only
  • Push workflow file to git

  • Create and push a new git tag version

    git tag deploy-v0.0.1
    git push origin deploy-v0.0.1
    
  • Review GitHub Actions runs

  • Verify new deployment