The proliferation of software coding tools that leverage so-called Artificial Intelligence via Large Language Models (LLMs) is really quite astounding. In the span of a couple years there is now an entire industry devoted to generating code and whole software applications using the barest of human language prompts. “ChatGPT, make me a sandwich” is the new sudo. I’ve read a few good think pieces on this topic amidst a sea of hot takes but one thing I have not seen is a reflection on the LLM code-generator in the context of the sociology of the prototype.

The prototype is a well-known engineering idea: make a thing quickly and cheaply to demonstrate a proof of concept, and then throw it away. For example, we believe our experience of a problem is real and can be addressed in this way. Having proven that to our satisfaction, now we can think about building the actual solution. A prototype is what comes out of a hack-a-thon, or a weekend coding binge, or even a deliberate phase of discovery and solicitation of feedback during product development.

Anyone who has been around organizations that depend on software tools knows this truth: your stop-gap, one-off, we-only-intended-this-to-last-a-week software has a high likelihood of sticking around much longer than you intended, to suffer bit rot, become brittle, and break at the least opportune time.

The problem with prototypes is sociological: they can work too well. People can quickly become attached to a prototype that actually works and solves a problem. Suddenly your Friday afternoon experiment is a vital piece of your organization’s infrastructure. The people that authorize the paychecks can be hard to convince that we need to stop and write this code as a proper piece of software (tests, documentation, modularity, CI/CD, etc) rather than continue to depend on what was supposed to be a throwaway idea. This is one reason why designers wisely insist on making prototypes ugly during the discovery phase of a project; folks are less likely to attach to something that is unattractive, even if it solves a problem, or at least they are easier to convince later that the prototype was intended to be thrown away and replaced.

LLM code-generators can crank out prototypes at an alarming rate, and they can look very pretty. Tens of thousands of lines of code with just a few sentences of prompting. Suddenly the Friday afternoon experiment can start to resemble a real application and the temptation to insert that experiment into production can be too strong to resist.

Tribbles, per wikipedia
Captain Kirk, forlorn with Tribbles. A still life.

Pretty prototypes puts me in mind of that classic Star Trek episode, The Trouble with Tribbles. The tribbles are cute, furry animals that proliferate at disturbing rates and quickly overwhelm your capacity to contain them. Their secret weapon? They make you feel really good, even as they consume more and more of your resources.

Writing as learning

What makes LLM-generated code deceptive is that it is largely write-once. A ten thousand line software program could evolve over months and years if written by humans. An AI could churn it out in minutes. Sure, it works and it looks good. But when you want to change it, or fix it, you are now dependent on the AI to re-generate it.

Why is this? Because humans learn by doing. Reading code is vital and important, but to really understand it, to learn a codebase intimately to the point that you know where best to nudge it or apply patches to achieve a desired change, you must spend time writing it. Changing AI-generated code is like climbing a sheer wall with no handholds for the first time. You have no muscle memory to fall back on.

As I learned in high school, the best way to learn a topic is to take notes as you read or listen. You build muscle through repetitive actions and activate your neurons and memory-making cells in a way that reading someone else’s notes simply cannot replace.

People sometimes comment to me on how fast I can write code. My trick? I don’t type or think any faster than anyone else. I simply spend a lot of time getting to know the codebase. Usually I do this by writing tests and fixing things that break. The first few months on the job, as I get to know an existing codebase, I tend to start re-writing it. Not a wholesale replacement. Just small things. All code needs care and weeding, just like a garden. All the good gardeners I know are constantly puttering around, trimming, weeding, moving things around to find a more optimal amount of sunshine, shade or water. It’s the same thing with code. And as I do that, I internalize the structure of the code, which files contain which logic, how all the pieces fit together.

The single biggest expense for any business is people’s time. Hiring a new software engineer and training them up to become a fully realized contributor to the efficacy of your team can take several months. In my experience, it takes me about six months, at minimum, to become familiar enough with a large existing codebase to make reasonably intelligent choices about how to change and maintain it. That’s a pretty expensive investment by a business, paying me for six months to learn about something by writing tests and re-writing what already exists. But in the end, after that learning curve is climbed, I can be very efficient and quick at generating fixes, new features, etc. The organization’s investment pays off. Like Daniel LaRusso I need to spend hours sanding the floor, painting the fence, waxing on, waxing off. And then I’m ready for the tournament.

But if I spend that time telling an AI how to generate code, I don’t really learn anything about it. I do not internalize any of it. I sand no floors, wax no cars. A LLM-generated application is immediately a legacy app, and I’m on my first day of the job. So while it seems like a really cheap way to generate working code, we need to be ruthless about treating it like a proper prototype and throw it away after it has proved the concept.