Slides 🖼️
Learning Objectives
- Learn the basics of CI/CD
- Leverage the power of CI/CD tools for ML projects with CML
- Integrate an ML model into the FastAPI framework
- Build and test a Docker container running a web API service
- Deploy the resulting Docker container to cloud
Steps
Introduction to GitHub Actions and CML
- Introduction to GitHub Actions
- Introduction to CML
CI/CD: Automatic reporting for model-related changes
- Add
PERSONAL_ACCESS_TOKEN
,AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
to GH secrets: https://docs.github.com/en/actions/security-guides/encrypted-secrets- For AWS credentials, see https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/getting-your-credentials.html
- For
PERSONAL_ACCESS_TOKEN
:- Generate a new personal access token under GitHub developer settings
- in the “Note” field, type
PERSONAL_ACCESS_TOKEN
- select repo scope
- click “Generate token” and copy it
- In your GitHub repository and/or organization, navigate to Settings -> Secrets -> New repository/organization secret
- in the “Name” field, type
PERSONAL_ACCESS_TOKEN
- in the “Value” field, paste the token
- click Add secret
Create.github/workflows/train-model.yaml
name: train-model
on:
push:
paths:
- "data/**"
- "src/**"
- "params.yaml"
- "dvc.*"
jobs:
train-model:
runs-on: ubuntu-latest
environment: cloud
permissions:
contents: read
id-token: write
steps:
- uses: actions/checkout@v3
with:
ref: ${{ github.event.pull_request.head.sha }}
- uses: iterative/setup-cml@v1
- uses: actions/setup-python@v2
with:
python-version: "3.10"
- uses: actions/setup-node@v1
with:
node-version: '16'
- name: SetupGitUser
run: cml ci
env:
REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
- name: TrainModel
env:
REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
run: |
pip install -r requirements.txt
dvc pull
dvc exp run
dvc push
# Create CML report
echo "## Metrics" >> report.md
dvc metrics show --md >> report.md
echo "## Feature Importances" >> report.md
csv2md reports/feat_imp.csv >> report.md
echo "## Confusion Matrix" >> report.md
echo '![](reports/figures/cm.png)' >> report.md
cml comment create report.md
- Push workflow file with git
- Modify some model parameters (e.g.
max_depth
), rerun the pipeline (dvc exp run
) and push changes to DVC remote and git - Review GitHub Actions runs
Web App Development
Create web application src/app/main.py
import json
import sys
from pathlib import Path
import uvicorn
src_path = Path(__file__).parent.parent.resolve()
sys.path.append(str(src_path))
from typing import List
import pandas as pd
from fastapi import Body, FastAPI, Request
from fastapi.middleware.cors import CORSMiddleware
from joblib import load
from pydantic import BaseModel
from utils.load_params import load_params
app = FastAPI()
# https://fastapi.tiangolo.com/tutorial/cors/#use-corsmiddleware
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
params = load_params(params_path='params.yaml')
model_path = params.train.model_path
feat_cols = params.base.feat_cols
model = load(filename=model_path)
class Customer(BaseModel):
CreditScore: int
Age: int
Tenure: int
Balance: float
NumOfProducts: int
HasCrCard: int
IsActiveMember: int
EstimatedSalary: float
class Request(BaseModel):
data: List[Customer]
@app.post("/predict")
async def predict(info: Request = Body(..., example={
"data": [
{
"CreditScore": 619,
"Age": 42,
"Tenure": 2,
"Balance": 0,
"NumOfProducts": 1,
"HasCrCard": 1,
"IsActiveMember": 1,
"EstimatedSalary": 101348.88
},
{
"CreditScore": 699,
"Age": 39,
"Tenure": 21,
"Balance": 0,
"NumOfProducts": 2,
"HasCrCard": 0,
"IsActiveMember": 0,
"EstimatedSalary": 93826.63
}
]
})):
json_list = json.loads(info.json())
data = json_list['data']
input_data = pd.DataFrame(data)
probs = model.predict_proba(input_data)[:,0]
probs = probs.tolist()
return probs
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
Test API
Run web app
uvicorn src.app.main:app --host 0.0.0.0 --port 8080
In a separate shell session (while the app is still running) submit the following test POST request
curl -X 'POST' \ 'http://0.0.0.0:8080/predict' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "data": [ { "CreditScore": 400, "Age": 50, "Tenure": 2, "Balance": 0, "NumOfProducts": 1, "HasCrCard": 1, "IsActiveMember": 1, "EstimatedSalary": 101348.88 }, { "CreditScore": 699, "Age": 39, "Tenure": 21, "Balance": 0, "NumOfProducts": 2, "HasCrCard": 0, "IsActiveMember": 0, "EstimatedSalary": 93826.63 } ] }'
The output of this command should be a list of two numbers e.g.:
[0.6912380116959064,0.9133333333333333]
Dockerize Web App
Put web app into
Dockerfile
FROM python:3.10 ENV PYTHONPATH=/app WORKDIR /app COPY . . RUN pip install -U pip RUN pip install -r requirements.txt EXPOSE 8080 CMD uvicorn src.app.main:app --host 0.0.0.0 --port 8080
Ignore copying unnecessary directories by creating a
.dockerignore
file.dvc data notebooks reports
Build and run docker container locally
docker build . -t churn-api-image docker run -p 8080:8080 churn-api-image
Test web app running in docker container
Push files to git
Deploy to fly.io
Create an account on fly.io You need to enter your credit card information, but you won’t be charged because this project will stay under the free-tier limits.
Install
flyctl
: https://fly.io/docs/hands-on/install-flyctl/curl -L https://fly.io/install.sh | sh
Authenticate
flyctl auth login
Launch application (this will take some time)
fly launch # accept all default values # except Yes to creating Postgres db and "deploy now"
To verify that it worked, run the following command
curl -X 'POST' \
'https://<YOUR_APP_NAME>.fly.dev/predict' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"data": [
{
"CreditScore": 400,
"Age": 50,
"Tenure": 2,
"Balance": 0,
"NumOfProducts": 1,
"HasCrCard": 1,
"IsActiveMember": 1,
"EstimatedSalary": 101348.88
},
{
"CreditScore": 699,
"Age": 39,
"Tenure": 21,
"Balance": 0,
"NumOfProducts": 2,
"HasCrCard": 0,
"IsActiveMember": 0,
"EstimatedSalary": 93826.63
}
]
}'
The above command will also generate fly.toml
file that will look similar to the one below
# fly.toml file generated for wispy-sun-5093 on 2022-09-04T23:05:06-04:00
app = "wispy-sun-5093"
kill_signal = "SIGINT"
kill_timeout = 5
processes = []
[env]
[experimental]
allowed_public_ports = []
auto_rollback = true
[[services]]
http_checks = []
internal_port = 8080
processes = ["app"]
protocol = "tcp"
script_checks = []
[services.concurrency]
hard_limit = 25
soft_limit = 20
type = "connections"
[[services.ports]]
force_https = true
handlers = ["http"]
port = 80
[[services.ports]]
handlers = ["tls", "http"]
port = 443
[[services.tcp_checks]]
grace_period = "1s"
interval = "15s"
restart_limit = 0
timeout = "2s"
- Push
fly.toml
to git
CI/CD: Automatic application deployment
- Add
FLY_API_TOKEN
to GH secrets: https://fly.io/docs/app-guides/continuous-deployment-with-github-actions/
Create .github/workflows/deploy-api.yaml
name: deploy
on:
push:
tags:
- "deploy-v*"
jobs:
Deploy:
runs-on: ubuntu-latest
environment: cloud
permissions:
contents: read
id-token: write
steps:
- uses: actions/checkout@v3
- uses: iterative/setup-dvc@v1
- uses: actions/setup-python@v2
with:
python-version: "3.10"
- name: PullModel
env:
REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
run: |
pip install dvc[s3]
dvc pull models/clf-model.joblib
- uses: superfly/flyctl-actions/setup-flyctl@master
- name: DeployApp
env:
FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}
run: flyctl deploy --remote-only
Push workflow file to git
Create and push a new git tag version
git tag deploy-v0.0.1 git push origin deploy-v0.0.1
Review GitHub Actions runs
Verify new deployment