David's Blog

What I learned from writing a basic coding agent

Earlier this year I started looking at the Gleam programming language. Its functional strongly typed nature and small feature set made it seem like a fun language. I wanted to build something in it but didn't have time or a particular project in mind. Then a few months ago I saw You Should Write An Agent and that idea stuck with me too. In the last week or so while I've had time off of work, I sat down and started playing with those two ideas.

I've used Claude Code some at work in the last few months, enough to have a good idea of what it's capable of (and not capable of). While I didn't have any desire to build something as full-fledged as Claude Code, I was curious how far I could get with a basic loop + some simple tools. As the Fly.io article says, the core concept is very straightfoward. You need a loop that allows text input, sending that text (+ conversation history) to a LLM API. Then you need at least a few basic tools.

I chose to use OpenRouter's API for my toy project, mostly so that I could change models without changing the API request/response parsing. I initially intended to keep even that abstracted but I've let various modules get a bit more coupled than I wanted as I iterated. Additionally, I implemented this as a CLI program (though not a full TUI like Claude Code is). I had thoughts about keeping the IO + agent parts separate so that a Web UI or something could be built on top but again leaned towards practical immediate progress for my side project rather than spending time on architecture.

After a bit of work across just a few days, I had a really basic tool that could find and read files and output text to my terminal from the LLM API. Then I started adding file editing support as well. I'm not going to go over every piece of the tool I built in this post so I more want to talk about the design choices and engineering tradeoffs you hit once you get past the very basic steps.

Screenshot of my tool generating a response to a question after listing files in the project's source tree

Security

The way tools work in these LLM APIs is that you provide a description of them to the API and they respond with which (if any) tool is being requested and provide the arguments for that tool call. It's up to you to then implement the tool. In essence, they're function calls.

An obvious first concern is not wanting the agent to do anything on your computer to files that you don't authorize. I tackled this in a few ways. For read operations, I wanted to restrict it to files and directories in/under the current working directory of the location you started the tool. I make sure the paths don't contain any .. to avoid upward directory traversal and also implement checks against absolute paths, trying to ensure that the absolute path provided is prefixed with the current working directory.

For write operations, I did the same thing to prevent destructive operations against files I cared about and also added explicit confirmation input checks. My tool prints the file(s) being edited and the contents being written/removed and you most enter a y/n into the terminal before any edits are made. At the end of the day, I'm still relying on human judgement.

Beyond that, you could implement further protections like running the program in a chroot or container to further isolate your system and sensitive data from the tool but I'm not going to go further into that here.

File editing

Having used Claude Code with some success, one thing I hadn't considered before was that getting an LLM to accurately provide file edits is actually a tricky problem. You have the usual problem of determining whether or not the given output (e.g. some code) is correct/safe/accurate or not but that's not what I'm talking about here -- there's plenty of cases where LLMs can produce correct-enough code. Instead, I'm talking about the actual file editing: how do you get the text (code) an LLM generates into a text file? Claude Code mostly doesn't seem to have major problems with this so I hadn't considered that it could be tricky. There's multiple obvious ideas for how to implement such a tool: allow writing a full file, implement search and replace, apply diffs to files. There's more variations on this too.

Each approach has its own drawbacks: having to write the full file any time the LLM wants to change a few lines seems likely to introduce unwanted edits over time as the files it edits are larger and larger. Not to mention that larger files also blow up the context size, which both affects the outcome of the response as well as the cost of each API call (providers charge per token).

Search and replace seems like it could work ok for small edits but may not be that useful for writing new files (or new sections in existing files). I haven't gone down that road yet so I could be wrong.

Applying patch files (diffs) seemed like a great idea to me as it would allow me to see the changes before approving, remove the need to rewrite whole files, and also enable both line level edits AND adding new sections to files. However, as it turns out, this is also somewhat of a difficulty as it depends on the LLM to produce a valid patch file that can be applied. At least in the way I've implemented it, I'm relying on the patch command and the unified diff format that it uses. I've seen enough issues with this approach to be able to say that the default naive implementation of a patch tool is not sufficient. I've had to do things like ask the LLM to do multiple smaller patches, reading the file in between to make sure it gets the patch format and surrounding line context right and even then this is not a reliable success. I'm sure there's more that could be done here.

As I was working on file editing and encountering some of these tradeoffs I found a nice post exploring how some more widely used agent tools approach this. This is one area where it seems we don't have a single correct answer.

Context management

Again the Fly.io post also touches on this but once you get going with tools you start having to worry about filling up context pretty quickly (or building more sophisticated tools to mitigate). Reading full files (and writing full files) fill up context quickly. Doubly so if you ask the LLM to re-read the file each time to deal with the unrliable patch file generation.

Given that, there's a bunch of stuff that I haven't done that more widespread tools do to help with this: Compacting context, ensuring tools only save X lines/tokens in the context window, using "subagents", etc. BTW: Given that an agent's context is just the series of messages that you pass to the API call, a sub agent is nothing fancy -- just a temporary/clean context with only a subset of the context so you avoid bloating the main agent's context.

Managing non-deterministic LLMs

Again, not surprising, but something I've run into in various situations is the LLM getting "stuck". This has manifested in a few ways. On tool use, I've found it can be helpful to provide more direct instructions. My initial patch file edit tool put something like "user cancelled patch application" if you rejected the proposed patch application but that ended up putting the LLM into a loop where it would continually propose other/different patches in some cases. I found that adding "Ask if they want to do something else." into the context message on a tool call rejection was helpful.

Sometimes the models I've tried get confused with the contexts. An example might be that I ask for a plan on how to implement something, after a few tool calls it outputs a bunch of text and a question like "What should we do next?" and if I respond with "Go ahead an implement" then I've seen the tool respond with a 2nd (and occassionally 3rd) message that's just a restated plan instead of anything new/different. I haven't yet figured out what causes this but I suspect it's just an artifact of how the models work.

Gleam

Turns out that Gleam is fun to build with! It has a nice developer experience: gleam is a nice build and dependency management tool, it has a built in LSP server for doing things like code completion and formatting. And Gleam has enough of an ecosystem that a bunch of packages provide common tools like CLI argument parsing, JSON serialization, and HTTP clients.

It is a small ecosystem, so sometimes there's gaps in packages or (probably) some abandoned/unmaintained ones but overall this wasn't a large issue for me in this project so far. And, when there are package gaps in Gleam you can always pull in code from the ecosystems Gleam runs on. Gleam compiles to Erlang or Javascript (I defaulted to the Erlang build as that's the default) and so you can use Erlang libraries. One such case I encountered was that I needed to read the current working directory but there wasn't any easy way to do so (that I found) in Gleam's common libraries. So I went through its external function interface to call Erlang, which does have this ability in its standard library file module.

Lastly, more as an aside about LLM capabilities rather than Gleam itself, I started testing my little tool on its own codebase. One thing I found interesting is that the small Gleam ecosystem didn't seem to have much effect on the ability of an LLM to write it -- I had a fairly good experience with having my tool generate code suggestions (I was using Claude Sonnet 4.5 for the most part). It seems more modern models don't need huge ecosystems to be useful in coding.

Overall my experience building in Gleam was very pleasant and I'd be pretty happy using it to build other small projects.

Wrapping up

So, turns out there's a bunch of potentially interesting challenges in building out such a tool, even if some of the challenges are due to working around an unreliable/non-deterministic tool's output. There's also other areas that, if I were to continue building this tool, seem like they probably have additional interesting design spaces, some things that have come to mind so far:

Switching models based on the tools / conversation context: I understand Claude Code does this (a small model + a larger model) but with a tool maintained separately from an LLM provider you could switch across the various providers based on various heuristics.
Moving beyond CLI for the agent. The CLI interface is nice, you can load the tool in any directory you're working in easily. One thing I've been thinking about is providing an additional web interface too: a locally-run web server could provide certain things a Terminal UI (TUI) may not and could also be accessed remotely (e.g. to continue a session from your phone VPNing to your development machine).
Streaming vs batch API requests, parallel tool calls, other things "full featured" LLM coding tools do

posted at 18:02 · AI programming

Jan 02, 2026