Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I find LLM generated code ends up pushing review/maintenance burden onto others. It "looks" right at first glance, and passes superficial tests, so it's easy to get merged. But then as you build on top of it, you realize the foundations are hastily put together, so a lot of it needs to be rewritten. Fine for throwaway or exploratory work, but heaven help you if you're working in a project where people use LLMs to "fix" bugs generated by previous LLM generated code.

So yes it does increase "velocity" for the person A who can get away with using it. But then the decrease in velocity for person B trying to build on top of that code is never properly tracked. It's like a game of hot potato, if you want to game the metrics you better be the one working on greenfield code (although I suppose maintenance work has never been looked at favorably in performance review; but now the cycle of code rot is accelerated)



Im working on some website and created some custom menu. Nothing fancy. AI got it done after some tries and I was happy as web development is not my area of expertise. After some time I realized the menu results to scrolling when it shouldn’t and wanted to make the parent container expand. This was impossible as the AI did a rather unusual implementation even for such a limited use case. Best part: my task now is impossible to solve with AI as it doesn’t really get its own code. I resulted to actually just looking into CSS and the docs and realized there is a MUCH simpler way to solve all of my issues.

Turns out sometimes the next guy who has to do maintenance is oneself.


> Turns out sometimes the next guy who has to do maintenance is oneself.

Over the years I've been well-served by putting lots of comments into tickets like "here's the SQL query I used to check for X" or "an easy local repro of this bug is to disable Y", etc.

It may not always be useful to others... but Future Me tends to be glad of it when a similar issue pops up months later.


On the same boat, I've learnt to leave breadcrumbs for the future quite a long time ago, and it's paid off many, many times.

After it becomes second-nature is really relaxing to know I have left all the context I could muster around, comments in tickets, comments in the code referencing a decision, well-written commit messages for anything a little non-trivial. I learnt that peppering all the "whys" around is just being a good citizen in the codebase, even if only for Future Me.


Agree completely, while "what" is completely redundant most of the time, a couple of "why"s can be of immense help later, to oneself and others.


> it doesn’t really get its own code

It doesn’t really get its own anything, as it is unable to "get". It's just a probabilistic machine spitting out the next token


"Getting things" is a matter of performance, not about the underlying hardware. If I'm an idiot who knows nothing about programming, but every time I slam the keyboard we get good programs, then how useful is it to discuss whether I am in fact empty-headed?


But the discussion here is that it does not output good programs at all.


So we might discuss their performance along a gradient and think on their year over year improvement. Current industry performance is of such magnitude that it has persuaded the world to adopt ChatGPT workflows as much as they have. Adjacent to code, one might look to Terry Tao and how he relates to ML workflows in math.


I guess in your arbitrary hypothetical it wouldn't be useful


It's just a probabilistic machine spitting out the next token

As are you and I. Did you have a deeper point to make?


Hey, I think everyone understands how they work by now and the pedantry isn't helpful.


Its a tale worth repeating because a minuscule percentage of people know or pretend to know how it works. Our view might be a bit skewed here on hackernews but normal people believe llms are thinking machines.


Then if it can't really reason on its own creation how do you expect it to be correct in what it does if it's simply regurgitating code parsed online?


actually I'm not sure everyone does


This is pretty much how permanent staff often have to work with consultants/contractors or job-hoppers in some sectors.

Shiny new stuff quickly produced, manager smiles and pays, contractor disappears, heaven help the poor staffers who have to maintain it.

It's not new, just in a new form.


I love this analogy of consultants parachuting into a project, doing the bare minimum, and passing the baton to the next person to maintain the mess.

Leadership is buying into the hype and busy turning everyone into overzealous slopmongers. Companies are pushing an “AI Fluency” framework where people are encouraged (read: forced) to use LLMs and agentic coding in every aspect of development.


Don't ignore the difference in scale though. Something happening some of the time isn't the same as happening most of the time.


Yeah, LLMs are easier to keep available ;)


This misallignment of incentives is why we have shitty software in everyday life


What's new though is that now you can do it to your future self!


In my experience, AI generated code is much higher quality than code written by external service companies. For example it will look at your code base and follow the style and conventions.


Style can conventions are very superficial properties of code. The more relevant property is how many bugs are lurking below the surface.


Style conventions have a real impact on how effectively bugs are found.


The actual design of the solution has a way bigger impact on the amount of bugs to be found in the first place


this just means the bugs it creates are better camouflaged


A while back someone made a post or comment about how they managed to vibe code a huge PR (1000+ of lines) to an open source project. They said they didn’t have time to read through the code but instead used tests to ensure the code was doing the right thing. Then it came out that there was a very lengthy review period where the maintainers had gone through the PR and helped fix the (rather significant) issues with it. So while the authors “didn’t have time” to review their own work the burden was shifted onto the maintainers.


This has been described a lot as “workslop”, work that superficially looks great but pushes the real burden on the receiver of the work rather than the producer.


> So yes it does increase "velocity" for the person A who can get away with using it. But then the decrease in velocity for person B trying to build on top of that code is never properly tracked.

Offhand anecdote, 1990s

That reminds me of when the German corporation my mother worked for moved more and more production to China end of last century. All the failures that the still existing German factory had to handle by repairing them ended up in their accounts. From the top bosses' point of view, just looking at the accounting data, the China production looked clean.

Of course, unsurprisingly (with enough effort), they made it work over the years, fulfilling the prophecy. Good for China.

How you account for things shifts the narrative and then reality follows the investments made based on that.


One of the things about AI generally is it doesn't "save" work - it pushes work from the one who generates the work to the person who has to evaluate it.


That sounds more like an organizational problem. If you are an employee that doesn't care about maintainability of code, e.g. a freelancer working on a project you will never touch again after your contract is over, your incentive has always been to write crappy code as quickly as possible. Previously that took the form of copying cheap templates, copying and pasting code from StackOverflow as-is without adjustments, not caring about style, using tools to autogenerate bindings, and so on. I remember a long time ago I took over a web project that a freelancer had worked on, and when I opened it I saw one large file of mixed python and HTML. He literally just copied and pasted whole html pages into the render statements in the server code.

The same is true for many people submitting PRs to OSS. They don't care about making real contributions, they just want to put something on their resume.

AI is probably making it more common, but it really isn't a new issue, and is not directly related to LLMs.


> freelancer working on a project you will never touch again after your contract is over, your incentive has always been to write crappy code as quickly as possible

I don't agree with this at all. As a freelancer your incentive is to extend the contract or be remembered as the best contractor when the client needs help again. You should be the expert that improves the codebase and development practices, someone the internals can learn from.


>If you are an employee that doesn't care about maintainability of code, e.g. a freelancer working on a project you will never touch again after your contract is over, your incentive has always been to write crappy code as quickly as possible.

I never did this when I was a freelancer.


Yes, this is it. The idea that LLMs somehow write this deceptive code that magically looks right but isn't is just silly. Why would that be the case? If someone finds they are good at writing code (hard to define of course but take a "measure" like long term maintainability for example) but they fail to catch bad code in review it is just an issue with their skill. Reviewing code can be trained just as writing code can be. A good first step might be to ask oneself: "how would I have approached this".


I've decided to fight it the same way I fight tactical tornadoes - by leaving those people negative reviews at mid-year review.

(I also find the people who simply paste LLM output to you in chat are the much bigger evil)


have you tried... talking to them, instead of permanently hirting their chances of staying employed in a shit economy?

its great for you principles - perfect job security, sitting up on your thrones casting judgement on entry level staffers that are forced to use LLM code to make a fast impact. maybe try teaching your juniors how to do it the right way, rather than passive aggressively impacting someones physical safety net. shame on all of you assholes.


My guy, obviously I tell the person they're pulling a dick move first.

We're not talking some entry level staffers here, it's senior engineer FTEs who are employed at a tech company who are doing this.


I'm sort of reminded of the south park movie.

They kept repeatedly getting an NC-17 from the MPAA and kept on resubmitting it (6 times) until just before release when they just relented, gave it an R and released it as-is.

https://en.wikipedia.org/wiki/South_Park:_Bigger,_Longer_%26...


They didn’t just keep resubmitting it. The first four times, changes were made (mostly around language and Saddam Hussein/the Devil). The final time, they felt the changes were arbitrary (it was just more language censoring) and so an exec told the board to hurry up as they had a release around the corner and it was just silently rubber-stamped.


Aren’t junior engineers the same way? Give them an assignment and see what they turn in looks good because they made sure it worked. But then it has to be frequently rewritten?


We can yell at junior engineers and make them fix the problems they've created. You can't yell at an AI, and it can't understand that it's created any problems.


>You can't yell at an AI

You certainly can. Not productively, but you can.


I don't care and I'm sick of these arguments.

Yes, you're 100% right, but A is always responsible for his output, and if the output's crap then he should either step up or refresh his CV.

If any of my colleagues (or me), tried to use such a card the road to unemployment would be a quick one.


> I find LLM generated code ends up pushing review/maintenance burden onto others. It "looks" right at first glance, and passes superficial tests, so it's easy to get merged. But then as you build on top of it, you realize the foundations are hastily put together, so a lot of it needs to be rewritten.

This describes most projects I've been on where there wasn't a thorough RFC process?

Where I'm seeing the sweet spot right now:

1. Have a detailed RFC

2. Ticket out the work

3. Feed the RFC and ticket to the LLM via MCP

4. Have your refactoring and design patterns textbooks close at hand so you can steer the LLM properly when things start to get messy. "DRY this out" or "implement this using X pattern" tend to be highly effective prompts

5. Use agents or other LLMs to review code for RFC compliance, test coverage, etc. (this isn't as effective as I'd like it to be right now, skill issue probably)

6. When there are bugs, force the LLM to do TDD - say "We're observing a bug in production, here are the reproduction steps, write a failing test that covers this code path." Obviously check that the test is a real test and not slop. Then, prompt the LLM to fix the isue.


I unfortunately was handed a Claude-coded react native app that I've been beating into shape over the last couple of months.

Today I realized that there was a fundamental error in the architecture and now I have to port several thousand lines of typescript into C++. I really hate it here.


I'd say is a change of paradigm, and it might be even faster if you have test-driven development... Imagine writing your tests manually, getting LLM code, trying to pass the tests, done.

Of course, golden rules are 1. write the tests yourself, don't let the LLM write them for you and 2. don't paste this code directly on the LLM prompt and let it generate code for you.

In the end it boils down to specification: the prompt captures the loosely-defined specification of what you want, LLM spouts something already very similar to what you want, tweak it, test it, off you go.

With test driven development this process can be made simpler, and other changes in other parts of the code are also checked.


It's yet another example of "don't be the last one holding the bag."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: