Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’m starting to think there’s an LLM equivalent to the old saying about how everything the media writes is accurate except on the topics you’re an expert in. All LLM output looks to be good quality except when it’s output you’re an expert in.

People who have no background in writing or editing think LLMs will revolutionize those fields. Actual writers and editors take one look at LLM output and can see it’s basically valueless because the time taken to fix it would be equivalent to the time taken to write it in the first place.

Similarly people who are poor programmers or have only a surface level understanding of a topic (especially management types who are trying to appear technical) look at LLM output and think it’s ready to ship but good programmers recognize that the output is broken in so many ways large and small that it’s not worth the time it would take to fix compared to just writing from scratch.



LLMs are not worthless for programming. You just cannot expect it to ship a full programm for you, but for generating functions with limited scope, I found it very useful. How to make use of a new and common libary for example. But of course you have to check and test.

And for text I know people who use it succesfully (professionally) to generate texts for them as a summary from some data. They still have to proof read, but it saves them time, so it is valuable.


I've been using it for code review. I just paste some of my code in and ask the AI to critique it, suggest ideas and improvements. Makes for a less lonely coding experience. Wish I could point it to my git repositories and have it review the entire projects.

I've had mixed experiences with getting it to generate new code. It produced good node.js command line application code. It didn't do so well at writing a program that creates 16 bit PCM audio file. I asked it to explain the WAV file format and things like lengths of structures got so confusing I had to research the stuff to figure out the truth.


This mirrors my experience. Very helpful writing node.js application code, but struggles to walk through simple operations in assembly. My hunch is that the tokenization process really hurts keeping the 1s and 0s straight.

It's been hit or miss with rust. It's super helpful in decrypting compilation errors, decent with "core rust" and less helpful with 3rd party libraries like the cursive TUI crate

Which comes as no surprise, really, as there's certainly less training data on the cursive crate than, say, expressjs

Also FWIW I have actually pointed it at entire git repos with the WebPilot plugin within ChatGPT and it could explain what the repo did, but getting it to actually incorporate the source files as it wrote new code didn't work quite so well (I pointed it to https://github.com/kean/Get and it would frequently fall back to writing native Swift code for HTTP requests instead of using the library)


>LLMs are not worthless for programming.

They can be worse than worthless. They can sabotage your work if you let them making you spend even more time fixing it afterwards.

For an example. I've used Gpt4 as a sort of Google on steroids with prompts like "do subnets in gcloud span azs" and ", "in gcloud secret manager can you access secrets across regions". I very quickly learned to ask "is it true" after every answer and to never rely on a given answer too much(verify it quickly, don't let misinformation get you too far down the wrong route). So is it useful? Yes, but can it lead you down the wrong path? It very well can. The least experience you have in the field the easier it will happen.

>You just cannot expect it to ship a full programm for you, but for generating functions with limited scope, I found it very useful

Entire functions? Wow. I found it useful for generating skeletons I then have to fill by hand or tweak. I don't think I ever got anything out of Gpt4 that is useful as is (maybe except short snippets 3 lines long).

However, I found it extremely useful in parsing emails received from people or writing nice sounding replies. For that it is really good (in English).


"They can be worse than worthless."

But that is the same, when you blindly follow some stackoverflow answer.

And yes, I always have to tweak and I use it only rarely. But when I did, it was faster than googling and parsing the results.


Nobody ever made a code editor plugin that reads random SO answers and automatically pastes them over your code.

The amount of fighting I needed against MS development tools mingling my code recently is absurd. (Also, who the fuck decided that autocomplete on space and enter was a reasonable thing? Was that person high?)


>"I found it useful for generating skeletons I then have to fill by hand or tweak".

Even this can be a big time saver, that increases productivity.

Just like others have said, it isn't going to write a Pynchon novel, but it does do a great job at the other 99% of general writing that is done.

Same for computers, the average programmer isn't creating some new Dijkstra Algorithm every day, they are really just cranking out connecting things together and doing the equivalent of 'generic boiler plate'.


> They can be worse than worthless. They can sabotage your work if you let them making you spend even more time fixing it afterwards.

I basically gave up on llms because i was spending more time figuring out what it did wrong than actually getting value.

People without programming skill are still impressed by them. But they yet have to learn or deliver anything of value even with the help of chat bots.


I have twenty years of programming experience and LLMs give me a significant productivity boost for programming on a daily basis: https://simonwillison.net/2023/Sep/29/llms-podcast/#code-int...


I have met my share of folks with decades of experience that was not of quality. The most hilarious are those that open tar gz files using notepad wondering where the code is to those that work on the web but dont know what xsrf is. Experience while long if it’s of the not so great type doesnt count. Not saying this is the case.

LLMs do produce impressive code. Even if they were indeed just procedural generators it would still be impressive. The code has structure and appears useful.

But the issue is that you can tell it makes no sense, there is no thought process behind it. It fits in no greater picture.

Even if you add more context it still has no purpose.

People that find this useful are the same type that copy stackoverflow code that they dont understand. It kinda works when it does but again it doesnt fit in the bigger picture.

Code isnt about spelling instructions - an…ai can do that - code is about what goes where in a way that the what changes as often as the where. It’s the bigger picture. So yes it can help and replace those that spell instructions but it will be hard to replace those that are required to deliver more.


"But the issue is that you can tell it makes no sense, there is no thought process behind it. It fits in no greater picture."

Completely agree with you. That's my job. The LLM is effectively my typing assistant.


Sorry, I may have gotten something wrong by skimming over your link. Is this the "significant project" you have been assisted by LLMs?

https://github.com/simonw/sqlite-history


That's one of about a dozen at this point - but yeah, that's the one that I used LLMs to research the initial triggers and schema design for.

Here's the transcript (it pre-dates the ChatGPT share feature): https://gist.github.com/simonw/1aa4050f3f7d92b048ae414a40cdd...

I wrote more about it here: https://simonwillison.net/2023/Apr/15/sqlite-history/

Here's another one I built using AppleScript: https://github.com/dogsheep/apple-notes-to-sqlite - I wrote about that here: https://til.simonwillison.net/gpt3/chatgpt-applescript


While it is impressive that an ai can generate all this, the code is anything but significant. Using triggers for history is one sure way to bring a scalable system down fast and one of the first lessons a junior will learn.


Are you sure that holds with SQLite? My benchmarks so far have shown it to add a pretty inconsequential overhead.

Also: not every system has to be a scalable system. That's another lesson junior engineers (should) learn.


I honestly don’t understand how people can say LLMs are useless for coding. Have you tried ChatGPT 4, or are you basing this take on the obsolete 3.5? I’m a professional programmer and I think LLMs are extremely useful.


I’ve used GPT 4. It’s not helpful in any domain in which I’m already proficient. If I’m having to use a new language or platform for whatever reason it’s mildly quicker than alt-tabbing to stack overflow, but probably not worth the subscription.

For graphics tasks GenAI is absurdly helpful for me. I can code but I can’t draw. Getting icons and logos without having to pay a designer is great.


Programmers don't think that, though, or least not all the time.

You could say similar things about Stack Overflow, and yet we use it.


Stack Overflow responses are well known to be misranked. I’ve heard a rule of thumb that the actual correct answer is typically about #3.


And #1 is usually broken or wrong, due to its (typically) old age. The longer it has to accumulate upvotes the less relevant it becomes.


For any managers reading: Chat GPT and Stack Overflow are not the same kind of thing.


Indeed they're not. And GPT-4 tends to outperform SO in my experience.


Yep. ChatGPT is like having a junior engineer confidently asking to merge broken garbage into your codebase all the time. Adds negative value for anyone that knows what they’re doing.


But with one crucial difference: it's a junior programmer that can make changes based on your feedback in a few seconds, not a few hours. And it never gets tired or frustrated.

I find treating it like an intern is amazing productive: https://simonwillison.net/2023/Sep/29/llms-podcast/#code-int...


hahahah. A friend of mine has a problem with a contractor at his workplace that tries to PR in shell scripts written with Copilot. My friend spends an hour to explain why a script generated in 5 minutes is horrifically awful and will likely take down the company. He's legitimately angry about it.


It seems like the only ways to subordinate programming tasks are to write tests for your subordinate's code, or to review it tediously yourself, or to just trust the hell out of them.


> I’m starting to think there’s an LLM equivalent to the old saying about how everything the media writes is accurate except on the topics you’re an expert in.

This is true for media articles but for LLMs I feel like it's the opposite. Like people who aren't specialists don't fully appreciate how great it is at those tasks.


everyone you described share something in common.

they aren’t good at using language models.


Nor are 99.9% of humanity. I think that's the point.


Gell-Mann Amnesia!


Gell-Bott Amensia.


[flagged]


https://news.ycombinator.com/newsguidelines.html

Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.

When disagreeing, please reply to the argument instead of calling names. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."


Why the anger?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: