The amount of hallucination I get when trying to write code is amazing. I mean it can get the core concepts of language, can create structure/algo. But it often makes up objects/values when I ask questions. Exampe:
It suggested TextLayoutResult.size - which is Int value. I asked if it is width and height. And it wrote it has size.height and also size.width. Which it does not. I am now writing production code and also evaluating the LLMs, that our management thinks will save us shit load of time.
We will get there sometimes, but the push from management is not compatible with the state of the LLMs.
(I use Claude 3.5 sonnet now, as it is also built in some of the "AI IDEs".)
You're not alone. In my experience the senior executive are enamoured by the possibility of halving headcount. The engineers reporting honestly about the limitations of connecting it to core systems (or using it to generate complex code running on core systems) are at risk of being perceived as blocking progress. So everyone keeps quiet, tries to find a quick and safe use case for the tech to present to management, and make sure that they aren't involved in any project that will be the big one to fail spectacularly and bring it all crashing down.
What irks me is how LLMs won't just say "no, it won't work" or "it's beyond my capabilities" and instead just give you "solutions" that are wrong.
Codeium for example will absolutely bend over backwards to provide you with solutions to requests that can't be satisfied, producing more and more garbage for every attempt. I don't think I've ever seen it just say no.
ChatGPT is marginally better and will sometimes tell you straight up that an algorithm can't be rewritten as you suggest, because of ... But sometimes it too will produce garbage in its attempts at doing something impossible that you ask it to do.
Two notes: I've never had any say no for code related stuff, but I have it disagree that something exists all the time. In fact I just one deny a Subaru brat exists, twice.
Secondly, if an llm is giving you the runaround it does not have a solution for the prompt you asked and you need either another prompt or another model or another approach to using the model (for vendor lock in like openai)
>What irks me is how LLMs won't just say "no, it won't work" or "it's beyond my capabilities" and instead just give you "solutions" that are wrong.
This is one of the clearest ways to demonstrate that an LLM doesn't "know" anything, and isn't "intelligence." Until an LLM can determine whether its own output is based on something or completely made up, it's not intelligent. I find them downright infuriating to use because of this property.
That’s an easily solvable problem for programming. Today ChatGPT has an embedded Python runtime that it can use to verify its own code and I have seen times that it will try different techniques if the code doesn’t give the expected answer. The one time I can remember is with generating regex.
I don’t see any reason that an IDE especially with a statically typed language can’t have an AI integrated that at least will never hallucinate classes/functions that don’t exist.
Modern IDEs can already give you real time errors across large solutions for code that won’t compile.
Yeah, but it would have to reason about the thing it just halucinated. Or it would have to be somehow hard prompted. There will be more tools and code around LLM, to make it behave like a human then people can imagine. They are trying to solve everything with LLMs. They have 0 agency.
This is a good representation of my experience as well.
At the end of the day, this is because it isn't "writing code" in the sense that you or I do. It is a fancy regurgitation engine, that will output bits of stuff it's seen before that seem related to your question. LLMs are incredibly good at this, but that it also why you can never trust their output.
yes, I told Windsurf to copy some code to another folder. And what it did? It "regenerated" the files, in the right folders. But the content was different. Great chaos Agent :D