I read your comment as a joke, but in case if was a defense, or is taken as a defense by others, let me help you punch up your writing for you:
"[Person who is financially incentivized to make unverifiable claims about the utility of the tool they helped build] said [tool] [did an unverified and unverifiable thing] last month"
Which could mean that code was refactored and then built on top of. Or it could just mean that Claude had to correct itself multiple times over those 459 commits.
Does correcting your mistakes from yesterday’s ChatGPT binge episode count as progress…maybe?
If it doesn't revert the corrections, maybe it is progress?
I can easily imagine constant churn in the code because it switches between five different implementations when run five times, foing back to the first one on the sixth time and repeating the process.
I gotta ask, though, why exactly is that much code needed for what CC does?
How many lines of code are they allowed to use for it, and why have we put you in charge of deciding how much code they're allowed to use? There's probably a bit more to it than just:
> How many lines of code are they allowed to use for it, and why have we put you in charge of deciding how much code they're allowed to use?
That's an awfully presumptious tone to take :-)
I'm not deciding "This is how many lines they are allowed", I'm trying to get an idea of exactly what sort of functionality that CC provides requires that sort of volume.
I mean, it's a high-level language being used, it's pulling in a lot of dependencies, etc. It literally is glue code.
Bearing in mind that it appears to be (at this point anyway) purely vibe-coded, I am wondering just how much of the code is dead weight - generated by the LLM and never removed.
The premise of the steps you've listed is flawed in two ways.
This is more what agentic-assisted dev looks like:
1. Get a feature request / bug
2. Enrich the request / bug description with additional details
3. Send AI agents to handle request
4a. In some situations, manually QA results, possibly return to 2.
4b. Otherwise, agents will babysit the code through merge.
The second is that the above steps are performed in parallel across X worktrees. So, the stats are based on the above steps proceeding a handful of times per hour--in some cases completely unassisted.
---
With enough automation, the engineer is only dealing with steps 2 and 4a. You get notified when you are needed, so your attention can focus on finding the next todo or enriching a current todo as per step 2.
---
Babysitting the code through merge means it handles review comments and CI failures automatically.
---
I find communication / consensus with stakeholders, and retooling take the most time.
One can think of a lot of obvious improvements to a MVP product that don't requre much regarding "get a feature request/bug - understand the problem - think on a solution".
You know the features you'd like to have in advance, or changes you want to make you can see as you build it.
And a lot of the "deliver the solution - test - submit to code review, including sufficient explanation" can be handled by AI.
I'd love to see Claude Code remove more lines than it added TBH.
There's a ton of cruft in code that humans are less inclined to remove because it just works, but imagine having LLM doing the clean up work instead of the generation work.
Is it possible for humans to review that amount of code?
My understanding of the current state of AI in software engineering is that humans are allowed (and encouraged) to use LLMs to write code. BUT the person opening a PR must read and understand that code. And the code must be read and reviewed by other humans before being approved.
I could easily generate that amount of code and make it write and pass tests. But I don't think I could have it reviewed by the rest of my team - while I am also taking part in reviewing code written by other people on my team at that pace.
Perhaps they just aren't human reviewing the code? Then it is feasible to me. But it would go against all of the rules that I have personally encountered at my companies and that peers have told me they have at their companies.
> Gas Town is also expensive as hell. You won’t like Gas Town if you ever have to think, even for a moment, about where money comes from. I had to get my second Claude Code account, finally; they don’t let you siphon unlimited dollars from a single account, so you need multiple emails and siphons, it’s all very silly. My calculations show that now that Gas Town has finally achieved liftoff, I will need a third Claude Code account by the end of next week. It is a cash guzzler.
Read that as "speed of lines of code", which is very VERY very different from "speed of delivery."
Lines of code never correlated with quality or even progress. Now they do even less.
I've been working a lot more with coding agents, but my convictions around the core principles of software development have not changed. Just the iteration speed of certain parts of the process.
> It’s also 100% vibe coded. I’ve never seen the code, and I never care to, which might give you pause. ‘Course, I’ve never looked at Beads either, and it’s 225k lines of Go code that tens of thousands of people are using every day. I just created it in October. If that makes you uncomfortable, get out now.
You're counting wheel revolutions, not miles travelled. Not an accurate proxy measurement unless you can verify the wheels are on the road for the entire duration.
My theory is that even if the models are frozen here, we'll still spend a decade building out all the tooling, connections, skills, etc and getting it into each industry. There's so much _around_ the models that we're still working on too.
Agree completely. It's already been like this for 1-2 years even. Things are finally starting to get baked in but its still early. For example, AI summaries of product reviews, gemini youtube video summaries, etc..
Its hard to quantify what sort of value those examples generate (youtube and amazon were already massively popular). Personally I find it very useful, but it's still hard to quantify. It's not exactly automating a whole class of jobs, although there are several youtube transcription services that this may make obsoete.
> Shows how much more work there is still to be done in this space.
This is why I roll my eyes every time I read doomer content that mentions an AI bubble followed by an AI winter. Even if (and objectively there's 0 chance of this happening anytime soon) everyone stops developing models tomorrow, we'll still have 5+ years of finding out how to extract every bit of value from the current models.
The idea that this technology isn't useful is as ignorant as thinking that there is no "AI" bubble.
Of course there is a bubble. We can see it whenever these companies tell us this tech is going to cure diseases, end world hunger, and bring global prosperity; whenever they tell us it's "thinking", can "learn skills", or is "intelligent", for that matter. Companies will absolutely devalue and the market will crash when the public stops buying the snake oil they're being sold.
But at the same time, a probabilistic pattern recognition and generation model can indeed be very useful in many industries. Many of our problems can be approached by framing them in terms of statistics, and throwing data and compute at them.
So now that we've established that, and we're reaching diminishing returns of scaling up, the only logical path forward is to do some classical engineering work, which has been neglected for the past 5+ years. This is why we're seeing the bulk of gains from things like MCP and, now, "agents".
> This is why we're seeing the bulk of gains from things like MCP and, now, "agents".
This is objectively not true. The models have improved a ton (with data from "tools" and "agentic loops", but it's still the models that become more capable).
Check out [1] a 100 LoC "LLM in a loop with just terminal access", it is now above last year's heavily harnessed SotA.
> Gemini 3 Pro reaches 74% on SWE-bench verified with mini-swe-agent!
I don't understand. You're highlighting a project that implements an "agent" as a counterargument to my claim that the bulk of improvements are from "agents"?
Sure, the models themselves have improved, but not by the same margins from a couple of years ago. E.g. the jump from GPT-3 to GPT-4 was far greater than the jump from GPT-4 to GPT-5. Currently we're seeing moderate improvements between each release, with "agents" taking up center stage. Only corporations like Google are still able to squeeze value out of hyperscale, while everyone else is more focused on engineering.
They're pointing out that the "agent" is just 100 lines of code with a single tool. That means the model itself has improved, since such a bare bones agent is little more than invoking the model in a loop.
That doesn't make sense, considering that the idea of an "agentic workflow" is essentially to invoke the model in a loop. It could probably be done in much less than 100 lines.
This doesn't refute the fact that this simple idea can be very useful. Especially since the utility doesn't come from invoking the model in a loop, but from integrating it with external tools and APIs, all of which requires much more code.
We've known for a long time that feeding the model with high quality contextual data can improve its performance. This is essentially what "reasoning" is. So it's no surprise that doing that repeatedly from external and accurate sources would do the same thing.
In order to back up GP's claim, they should compare models from a few years ago with modern non-reasoning models in a non-agentic workflow. Which, again, I'm not saying they haven't improved, but that the improvements have been much more marginal than before. It's surprising how many discussions derail because the person chose to argue against a point that wasn't being made.
The original point was that the previous SotA was a "heavily harnessed" agent, which I took to mean it had more tools at its disposal and perhaps some code to manage context and so on. The fact that the model can do it now in just 100 LoC and a terminal tool implied the model itself has improved. It's gotten better at standard terminal commands at least, and possibly bigger context window or more effectively using the data in its context window.
Those are improvements to the model, albeit in service of agentic workflows. I consider that distinct from improvements to agents themselves which are things like MCP, context management, etc.
Useful technology can still create a bubble. The internet is useful but the dotcom bubble still occurred. There’s expectations around how much the invested capital will see a return and growing opportunity cost if it doesn’t, and that’s what creates concerns about a bubble. If a bubble bursts, the capital will go elsewhere, and then you’ll have an “AI winter” once again
This and the Meta post seem too crazy to be true... Feels uncomfortable that these are the daily lives of sine tech workers compared to my relatively 'relaxed' (but lower paying) tech job.
reply