Home on Alex Kim's blog

Home on Alex Kim's bloghttps://alex000kim.com/Recent content in Home on Alex Kim's blogHugo -- gohugo.ioen-usalex000kim@gmail.com (Alex Kim)alex000kim@gmail.com (Alex Kim)Sun, 05 May 2024 00:00:00 +0000Experiments with OpenAI's Function Callinghttps://alex000kim.com/tech/2024-05-05-function-calling-experiments/Sun, 05 May 2024 00:00:00 +0000alex000kim@gmail.com (Alex Kim)https://alex000kim.com/tech/2024-05-05-function-calling-experiments/Intro This notebook (also on github) demonstrates how to use Function Calling functionality with the OpenAI API. In this demo, we’ll use the Northwind database to convert natural language queries into SQL: "What is the total revenue for each product in the database?" -> -> "SELECT ... FROM ..." -> DataFrame There will be two function calling examples: A simple one-step function call to convert a natural language query into SQL, where we’ll put the database schema into the system prompt and them use function calling to convert a natural language query into SQL.Fine-Tuning Large Language Models with a Production-Grade Pipelinehttps://alex000kim.com/tech/2023-09-08-finetune-llm-pipeline-dvc-skypilot/Fri, 08 Sep 2023 00:00:00 +0000alex000kim@gmail.com (Alex Kim)https://alex000kim.com/tech/2023-09-08-finetune-llm-pipeline-dvc-skypilot/Introduction - Solving cloud resources and reproducibility for LLMs A few of weeks ago, I wrote a post about the challenges of training large ML models, in particular: the need for more computing power and the complexity of managing cloud resources; the difficulty of keeping track of ML experiments and reproducing results. There I proposed a solution to these problems by using SkyPilot and DVC to manage cloud resources and track experiments, respectively.Trying to Understand Something Difficult? Minimize the Number of Attempts!https://alex000kim.com/non-tech/2023-08-16-minimize-the-number-of-attempts/Wed, 16 Aug 2023 00:00:00 +0000alex000kim@gmail.com (Alex Kim)https://alex000kim.com/non-tech/2023-08-16-minimize-the-number-of-attempts/When you’re trying to wrap your head around challenging new ideas, your natural instinct may be to chip away at it in several attempts. Read a bit, take a break, come back later - like attacking a mountain ascent in short sprints. This may feel like you’re making progress, but you’re actually doing yourself a disservice. Every time you re-engage with the difficult material, you’re forcing your mind to re-enter a state of intense flow.ML experiments in the cloud with SkyPilot and DVChttps://alex000kim.com/tech/2023-08-10-ml-experiments-in-cloud-skypilot-dvc/Thu, 10 Aug 2023 00:00:00 +0000alex000kim@gmail.com (Alex Kim)https://alex000kim.com/tech/2023-08-10-ml-experiments-in-cloud-skypilot-dvc/Introduction One of the things that makes machine learning hard is that you have to run a lot of experiments. You have to try different models, different data sets, different hyperparameters, different features. And each experiment can take a long time to run, especially if you’re working on deep learning problems. You can’t just run them on your laptop or desktop. You need more computing power, and you need it fast.Why Sales Engineers Existhttps://alex000kim.com/non-tech/2023-08-02-why-sales-engineers-exist/Wed, 02 Aug 2023 00:00:00 +0000alex000kim@gmail.com (Alex Kim)https://alex000kim.com/non-tech/2023-08-02-why-sales-engineers-exist/Paul Graham once wrote that “a startup is a company designed to grow fast.” For a startup to grow fast, especially a B2B startup selling SaaS products to other companies, it needs sales engineers. Sales engineers are the technical experts who work closely with sales teams. They exist because selling enterprise software requires deep technical knowledge that salespeople typical don’t have. The salesperson establishes rapport with the prospect and understands their business needs.Don’t know what to do next? Teach!https://alex000kim.com/non-tech/2023-07-31-teach/Mon, 31 Jul 2023 00:00:00 +0000alex000kim@gmail.com (Alex Kim)https://alex000kim.com/non-tech/2023-07-31-teach/We (esp. those in the tech industry) have an instinctual aversion to pedagogy. “Those who can’t do, teach” the old saying goes. But the truth is the opposite. Teaching does not indicate an inability to do something. On the contrary, teaching empowers and enables ability. It is the highest form of understanding. When you teach something, you gain a deeper mastery over the subject matter than you would as a passive student.Abouthttps://alex000kim.com/about/Mon, 01 Jan 0001 00:00:00 +0000alex000kim@gmail.com (Alex Kim)https://alex000kim.com/about/Dad | ML Engineer | Technical Instructor | Consultant | Community Builder Hi! I’m Alex. I am an ML engineer by trade and a physicist by degree. I’ve lived in different parts of the world and am currently based in Montreal, 🇨🇦. I speak English, Russian, French and un poco de español. I am also a husband and a dad of two 👧🏻👦🏻. Occasionally, I find time to: Contribute to open-source projects, which have been used in multiple academic research studies Provide guidance to EdTech startups on designing effective learning curriculums Work with startups on evaluating and recommending enhancements to their ML & MLOps practices Coordinate data science and machine learning meetups, such as PyData Montreal and Montreal MLOps Conduct online educational courses Deliver presentations at conferences and author technical blog posts In my free time, you might find me:Links to other sites/blogs/newsletters that I likehttps://alex000kim.com/links/Mon, 01 Jan 0001 00:00:00 +0000alex000kim@gmail.com (Alex Kim)https://alex000kim.com/links/I am maintaining this list here primarily for my own reference. https://til.simonwillison.net https://lilianweng.github.io https://koaning.io https://rachelbythebay.com https://vickiboykis.com https://sebastianraschka.com https://eugeneyan.com https://huyenchip.com/ https://medium.com/@iamleonie https://logankilpatrick.medium.com/ https://cameronrwolfe.me/ https://finbarr.ca/ https://www.philschmid.de/Week 1: Kick-starting an ML projecthttps://alex000kim.com/other/oreilly-mlops/week1/Mon, 01 Jan 0001 00:00:00 +0000alex000kim@gmail.com (Alex Kim)https://alex000kim.com/other/oreilly-mlops/week1/Slides 🖼️ Week 1: ML project lifecycle and MLOps best practices Learning objectives Understand the core philosophy behind MLOps ideas Apply best practices for establishing ML project structure and dependencies management Manage project dependencies with pip and virtualenv Version datasets with DVC Project Introduction Problem Description and Dataset This dataset contains 10,000 records, each of which corresponds to a different bank’s user. The target is Exited, a binary variable that describes whether the user decided to leave the bank.Week 2: ML Pipelines, Reproducibility Experimentationhttps://alex000kim.com/other/oreilly-mlops/week2/Mon, 01 Jan 0001 00:00:00 +0000alex000kim@gmail.com (Alex Kim)https://alex000kim.com/other/oreilly-mlops/week2/Slides 🖼️ Week 2: ML Pipelines, Reproducibility and Experimentation Learning objectives Refactor a Jupyter notebook into a reproducible ML pipeline Version artifacts of an ML pipeline in a remote storage Iterate over a large number of ML experiments in a disciplined way Steps Refactor Jupyter notebook in a DVC pipeline Docs: https://dvc.org/doc/start/data-pipelines Create the following files to read parameter values from a file params.yaml base: project: bank_customer_churn raw_data_dir: data/raw countries: - France - Spain feat_cols: - CreditScore - Age - Tenure - Balance - NumOfProducts - HasCrCard - IsActiveMember - EstimatedSalary targ_col: Exited random_state: 42 data_split: test_size: 0.Week 3: CI/CD for ML and ML-based Web APIhttps://alex000kim.com/other/oreilly-mlops/week3/Mon, 01 Jan 0001 00:00:00 +0000alex000kim@gmail.com (Alex Kim)https://alex000kim.com/other/oreilly-mlops/week3/Slides 🖼️ Week 3: CI/CD for ML Learning Objectives Learn the basics of CI/CD Leverage the power of CI/CD tools for ML projects with CML Integrate an ML model into the FastAPI framework Build and test a Docker container running a web API service Deploy the resulting Docker container to cloud Steps Introduction to GitHub Actions and CML Introduction to GitHub Actions Introduction to CML CI/CD: Automatic reporting for model-related changes Add PERSONAL_ACCESS_TOKEN , AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to GH secrets: https://docs.Week 4: Monitoring for ML Projectshttps://alex000kim.com/other/oreilly-mlops/week4/Mon, 01 Jan 0001 00:00:00 +0000alex000kim@gmail.com (Alex Kim)https://alex000kim.com/other/oreilly-mlops/week4/Slides 🖼️ Week 4: Data Drift Monitoring for ML Projects Learning Objectives Distinguish between application monitoring and ML monitoring Use Alibi Detect framework to detect data drift Steps Introduction to Data Drift Monitoring What’s data drift and why do we need to monitor for it? Intro to Alibi Detect Add Churn_Modelling_Germany.csv to data/more_data/ Churn_Modelling_Germany.csv Add /more_data entry to data/.gitignore Create and explore notebooks/DriftDetection.ipynb DriftDetection.ipynb Incorporate drift detection into the DVC pipeline Create src/stages/drift_detector.