Slides 🖼️ Week 3: CI/CD for ML
Learning Objectives Learn the basics of CI/CD Leverage the power of CI/CD tools for ML projects with CML Integrate an ML model into the FastAPI framework Build and test a Docker container running a web API service Deploy the resulting Docker container to cloud Steps Introduction to GitHub Actions and CML Introduction to GitHub Actions Introduction to CML CI/CD: Automatic reporting for model-related changes Add PERSONAL_ACCESS_TOKEN , AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to GH secrets: https://docs.github.com/en/actions/security-guides/encrypted-secrets For AWS credentials, see https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/getting-your-credentials.html For PERSONAL_ACCESS_TOKEN: Generate a new personal access token under GitHub developer settings in the “Note” field, type PERSONAL_ACCESS_TOKEN select repo scope click “Generate token” and copy it In your GitHub repository and/or organization, navigate to Settings -> Secrets -> New repository/organization secret in the “Name” field, type PERSONAL_ACCESS_TOKEN in the “Value” field, paste the token click Add secret Create.github/workflows/train-model.yaml name: train-model on: push: paths: - "data/**" - "src/**" - "params.yaml" - "dvc.*" jobs: train-model: runs-on: ubuntu-latest environment: cloud permissions: contents: read id-token: write steps: - uses: actions/checkout@v3 with: ref: ${{ github.event.pull_request.head.sha }} - uses: iterative/setup-cml@v1 - uses: actions/setup-python@v2 with: python-version: "3.10" - uses: actions/setup-node@v1 with: node-version: '16' - name: SetupGitUser run: cml ci env: REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }} - name: TrainModel env: REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }} AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} run: | pip install -r requirements.txt dvc pull dvc exp run dvc push # Create CML report echo "## Metrics" >> report.md dvc metrics show --md >> report.md echo "## Feature Importances" >> report.md csv2md reports/feat_imp.csv >> report.md echo "## Confusion Matrix" >> report.md echo '![](reports/figures/cm.png)' >> report.md cml comment create report.md Push workflow file with git Modify some model parameters (e.g. max_depth), rerun the pipeline (dvc exp run) and push changes to DVC remote and git Review GitHub Actions runs Web App Development Create web application src/app/main.py import json import sys from pathlib import Path import uvicorn src_path = Path(__file__).parent.parent.resolve() sys.path.append(str(src_path)) from typing import List import pandas as pd from fastapi import Body, FastAPI, Request from fastapi.middleware.cors import CORSMiddleware from joblib import load from pydantic import BaseModel from utils.load_params import load_params app = FastAPI() # https://fastapi.tiangolo.com/tutorial/cors/#use-corsmiddleware app.add_middleware( CORSMiddleware, allow_origins=["*"], allow_credentials=True, allow_methods=["*"], allow_headers=["*"], ) params = load_params(params_path='params.yaml') model_path = params.train.model_path feat_cols = params.base.feat_cols model = load(filename=model_path) class Customer(BaseModel): CreditScore: int Age: int Tenure: int Balance: float NumOfProducts: int HasCrCard: int IsActiveMember: int EstimatedSalary: float class Request(BaseModel): data: List[Customer] @app.post("/predict") async def predict(info: Request = Body(..., example={ "data": [ { "CreditScore": 619, "Age": 42, "Tenure": 2, "Balance": 0, "NumOfProducts": 1, "HasCrCard": 1, "IsActiveMember": 1, "EstimatedSalary": 101348.88 }, { "CreditScore": 699, "Age": 39, "Tenure": 21, "Balance": 0, "NumOfProducts": 2, "HasCrCard": 0, "IsActiveMember": 0, "EstimatedSalary": 93826.63 } ] })): json_list = json.loads(info.json()) data = json_list['data'] input_data = pd.DataFrame(data) probs = model.predict_proba(input_data)[:,0] probs = probs.tolist() return probs if __name__ == "__main__": uvicorn.run(app, host="0.0.0.0", port=8000) Test API
...