/ #ai-engineering#coding-agents#rust

Why I built 🤦 slop — My coding agent

Every Jedi is required to build their own lightsaber. It is a rite of passage. Every woodworker make their own jigs. In the same sense, I believe every AI engineer should build their own coding agent. Right now, Claude Code allows you to connect to different models but that can be removed by a single update. As the VC-driven AI subsidies are going to end soon, so will the openness of these harnesses. This is why I decided to build my own personal coding agent. Here are the choices I made for 🤦 slop.

The agent loop

A coding agent is a small loop. You assemble the context and the available tools, send them to the model, and get back either text or a request to call a tool. You run the tool, append the result, and send everything again. The loop ends when the model stops asking for tools or you hit a stop condition you set yourself.

The interesting decisions live around that loop rather than inside it. How do you decide a task is actually finished instead of the model just pausing. Do you allow parallel tool calls or force them one at a time. How many times do you retry a failed call before giving up, and do you hand the error back to the model or handle it yourself. I put a provider abstraction in front of the inference API early so that swapping a model is a config change and not a rewrite. I use the genai crate to normalize across LLM providers. I have made some contributions along the way, so I understand the architecture and I am comfortable maintaining it. The conversations are stored in a PostgreSQL database. slop itself speaks the Agent Client Protocol

Context management

Every turn you pay to resend the system prompt, the tool definitions, the running conversation, and every file and tool output you have pulled in so far. That pile grows quickly, and a full window is both expensive and worse at the actual task. So the real work of context management is deciding what to keep and what to drop.

The usual levers are pruning old tool outputs once they stop being relevant, summarizing earlier turns into a compact form, and pulling files in on demand instead of holding everything resident. This is also where the multi-staged prompt pipelines in slop earn their keep, since you can shape what each stage sees instead of carrying one ever-growing transcript. I have opted for a more manual handoff. I keep a close watch on the context usage via the window and make it easy to summarize the actions and the tasks to be done into a handoff.md and can start a clean session with that. This is better than automatic compaction as I have complete visibility into the plan and the tasks that are left to be done and I can prompt the handoff accordingly.

Tools and their granularity

The granularity question is whether you give the model a few powerful tools or many narrow ones. One end of the spectrum is a single shell tool that can run anything. Pi does this. The other end is a set of specific tools like read file, edit file, and search. A shell tool is flexible and the model already knows how to use it, but it is hard to sandbox cleanly and hard to show in a readable way. Narrow tools are easier to permission and easier to present, at the cost of you writing and maintaining more of them. Having more granular tools also increases the cost as you need to send back the entire history with the tool result.

Most agents end up somewhere in the middle, and where you land tends to follow from your sandboxing and permission model rather than from taste. Slop has basic tools for reading and for writing I use the udiff format that allows me to write to multiple files. Slop heavily leans towards scripting with either shell or python when running in a sandbox. More on that below. I do not have a formal verification suite (because $$$), but empirical observations indicate that this could save costs by up to 50%.

Modes

Modes are basically custom system prompts but they have a very important ability in terms of coding agents. Modes separate thinking from doing. In plan mode the agent reads, explores, and produces a plan, but it cannot write files or run commands that change anything. This is where I added support for sub-agents. If you are exploring the UI and the backend parts for a story the planning agent can kick off two different delegates that can then produce a report as a result back to the main loop. This is helpful in both shortening the message history length and improves the performance as well. In the code mode it executes the plan. The split keeps the agent from charging ahead on a bad assumption, and it gives you a natural checkpoint to read the plan before any of it touches your machine. I also give it access to an LSP to view definitions, read symbols in a file, and so on without having to load the entire file into context.

Presenting tool calls

A coding agent that runs ten commands and edits five files is useless if you cannot follow what it did. File edits read best as diffs rather than as full rewrites. Long command output wants to be collapsed by default and expandable when you care. A clear running status of what the agent is doing right now is worth more than it sounds, because most of the time you are deciding whether to let it keep going or stop it.

This is the part people underrate. The CLI or TUI is just the surface, but a legible surface is what makes you trust the engine underneath.

Sandboxing

This is the part I have spent the most time on. An agent runs commands and writes code you have not read yet, so it needs somewhere to do that without putting your machine at risk. The options run from no isolation at all, through containers, up to full virtual machines, and each step buys stronger boundaries for more overhead.

slop runs workloads in a microVM through libkrun, which gives close to VM level isolation without the weight of a full VM, and I recently added support for spinning up docker containers as a lighter alternative when that is enough. The direction I am heading now is a dynamic permission policy tied to the sandbox, so what the agent is allowed to do depends on where it is running rather than on a fixed global setting. I bias the model towards writing python/shell scripts when running inside a sandbox.

Headless and remote modes

Not every run should happen in front of a person at a terminal. You want to fire the agent from a script, run it in CI, or push it onto a bigger remote machine and let it work. Supporting that cleanly means structured input and output instead of interactive prompts, and a way to grant permissions ahead of time so the agent is not stuck waiting on an answer no one is there to give.

It also opens the door to running several agents at once on separate sandboxes, which is far more appealing once the sandbox story is already solid. slop is architected specifically to support this. When I launch it from the CLI it trampolines a server process. But I can also explicitly launch the server process and run a session entirely using ACP. slop also divorces the hands from the brain. Because I support sandboxes, the agent can read or write files, run commands and so on on a machine running outside the process.

FinOps for inference APIs

Tokens cost money, and an agent is very good at spending them. Long contexts, retries, and loops that run a little too long all add up, and without measurement you only find out at the end of the month. So I track cost per run and set limits that stop a run before it gets silly. I also route simple steps to a cheaper model instead of paying top rates for work that does not need it.

This loops back to why I built 🤦 slop in the first place. The current prices are propped up by subsidies that will not last, so the agent that understands and controls its own spend is the one that keeps making sense when the cheap era ends. Coding agents like Claude are currently deeply discounting the harness tokens from the tokens as listed on the price. Organizations are in for a pretty bad sticker shock very soon.

Coding agents also form the backbone of many of the other agentic experiences for other integrations. It is no fluke that Claude Code and Codex got a nice UI and they called it Cowork and the Codex app. Once you understand the mechanics you will know that the CLI/TUI is just an interface and what matters for these coding agents is the engine that beats in it.

Go ahead and build your own coding agent. May the source be with you.