Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I had a look at the YouTube video -- I feel that an obvious question with regards to the "common sense" tests is, what was chat GPT-4 trained on? Was it partly trained on reams of questions used to test AI systems for example? How do you know it is "demonstrating" anything out-of-sample, especially if it is constantly being improved?

I've been learning some exotic programming languages recently, and my anecodotal experience is that asking ChatGPT to code in array programming or logic languages results in code which is highly non-idiomatic for those paradigms. Why is that? It mostly writes the code as if it was all just a funny syntax for Javascript or Python. I'm surprised at that if it really understood J or APL for example.

I am presuming that behind the scenes there are demonstrations of capabilities much greater than GPT-4 which are being used to illustrate the dangers of AI, because whilst I'm massively impressed by what's happening it is difficult to convince myself of a "qualitative" difference.



> ChatGPT to code in array programming or logic languages results in code which is highly non-idiomatic for those paradigms. Why is that?

Reason #1 is that those languages are unreadable line noise to humans too. Fundamentally, almost all of the code written in array languages is made purposefully obtuse. Single-letter identifiers, no or little comments, dense code with minimal structure, etc...

Reason #2 is that there are very few examples of these languages on the web, and even more importantly: vanishingly few examples with inline comments and/or explanations. This isn't just because they're rare -- see reason #1 above.

Reason #3 is that LLMs can only write left-to-right. They can't edit or backtrack. Array-based languages are designed to be iterated on, rapidly modified, and even "code golfed" to a high degree.[1]

I've noticed that LLMs struggle with things my coworkers also struggle with: the "line noise" languages like grep, sed, and awk. Like humans, LLMs do well with verbose languages like SQL.

PS: I just tested GPT 4 to see if it can parse a short piece of K code that came up in a thread[2] on HN and it failed pretty miserably. It came close, but on each run it came up with different explanations of what the code does, and none of them matched the explanations in that thread. Conversely, it had no problems with the Rust code. And, err... it found a bug in one of my Rust snippets. Outsmarted by an AI!

[1] You can have an LLM generate code, and then ask it to make it shorter and more idiomatic. Just like a human touching up hastily written messy code, the LLM can fix its own mistakes!

[2] https://news.ycombinator.com/item?id=27220613


Its true for logic programming languages too (e.g. Prolog, Picat, Mercury, etc), so I do not think its to do with line noise languages per say nor a lack of examples (in the case of Prolog). It'll write it but it treats it like Python with funny syntax: not idiomatic. You can ask it to make it more concise or idiomatic but it just can't.


I've heard of 1 of 3 of those languages, and I can program in over 20. That gives you an idea of how rare they must be!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: