Llm | Alex Kim's blog

You've Been Doing Harness Engineering All Along

If you used LLMs for coding before the coding agents (~prior to Claude Code release in Feb last year), and if you got annoyed with typing the same context into LLM chat boxes, you’d start saving it into per-project markdown files. Many of those who used LLMs heavily ended up doing this eventually. Then we got Claude Code and other agents, and by mid-summer, this technique got formalized as the AGENTS.md file and its tool-specific equivalents like CLAUDE.md, .cursor/rules, and .github/copilot-instructions.md. A few of us have been doing this before it had a name. At a bigger scale, that’s what happened to me with “Harness Engineering.” ...

The Claude Code Source Leak: fake tools, frustration regexes, undercover mode, and more

Update: see HN discussions about this post: https://news.ycombinator.com/item?id=47586778 I use Claude Code daily, so when Chaofan Shou noticed earlier today that Anthropic had shipped a .map file alongside their Claude Code npm package, one containing the full, readable source code of the CLI tool, I immediately wanted to look inside. The package has since been pulled, but not before the code was widely mirrored, including myself and picked apart on Hacker News. This is Anthropic’s second accidental exposure in a week (the model spec leak was just days ago), and some people on Twitter are starting to wonder if someone inside is doing this on purpose. Probably not, but it’s a bad look either way. The timing is hard to ignore: just ten days ago, Anthropic sent legal threats to OpenCode, forcing them to remove built-in Claude authentication because third-party tools were using Claude Code’s internal APIs to access Opus at subscription rates instead of pay-per-token pricing. That whole saga makes some of the findings below more pointed. ...

US Tariffs, DeepSeek and OpenAI

This week the new Trump administration announced new tariffs on key trading partners, Canada and Mexico. Both countries vowed to retaliate, and tensions are high: “Canada imposes 25% tariffs in trade war with US” Living in Canada, I am obviusly curious about the impact this will all have both both sides of the border. I am a self-described economics nerd: having never studied economics, I try to read and listen about when I have spare time. One other topic that’s been on my mind recently is the recent release of the DeepSeek R1 model and the the release of the OpenAI O3 model that followed. Both are SOTA (as of Feb 2025) LLM models that were trained to “reason” before providing an answer through a chain of thought Reinforcement Learning: ...

Orchestrating LLM Fine-tuning on Kubernetes with SkyPilot and MLflow: A Complete Guide

Training and fine-tuning Large Language Models (LLMs) requires significant computational resources and careful experiment tracking. While many focus on the modeling aspects, efficiently managing compute resources and experiment tracking is equally important for successful ML projects. This guide demonstrates how to use SkyPilot and MLflow, two open-source tools, to orchestrate LLM fine-tuning jobs effectively. An open-source stack for LLM fine-tuning Modern LLM fine-tuning workflows involve multiple moving parts: Resource orchestration across different cloud providers Environment setup and dependency management Experiment tracking and monitoring Distributed training coordination System metrics collection Using SkyPilot for resource orchestration and MLflow for experiment tracking provides an easy-to-use and fully open-source stack for managing these complexities. ...

Fine-Tuning Large Language Models with a Production-Grade Pipeline

Introduction - Solving cloud resources and reproducibility for LLMs A few of weeks ago, I wrote a post about the challenges of training large ML models, in particular: the need for more computing power and the complexity of managing cloud resources; the difficulty of keeping track of ML experiments and reproducing results. There I proposed a solution to these problems by using SkyPilot and DVC to manage cloud resources and track experiments, respectively. ...