I am whiny about being anti-AI, but no doubt they are well-suited for English<->...

I am whiny about being anti-AI, but no doubt they are well-suited for English<->programming translation tasks, which along with a well-tuned bag of tricks is a useful tool.

But two issues the tech community had not spent nearly enough effort discussing:

1) As a big fan of Idris, I am worried that these tools will strongly disincentivize language development: why design an elegant language if an LLM can write the boilerplate faster than you can write a cleaner implementation?

2) I still don't think these tools are even slightly ethical. In 2022 I kicked the tires on ChatGPT-3.5 for F# codegen, and got some truly terrible results. I copy-pasted some lines into GitHub and found the unique repositories which ChatGPT was obviously plagiarizing from, and with 15 seconds of prompt "engineering" I got it to spit out ~200 lines verbatim from my personal F# linear algebra library - the only thing that was changed was stripping out the comments and updating some syntax to F# 4.7. Pure plagiarism. It is especially frustrating that GPT is more likely to plagiarize that library precisely because there aren't very many similar repos on GitHub.

Obviously the plagiarism problem can be fixed. (and it seemingly has been...for F#. Not sure about Idris!) However, it really seems like that sort of RLHF fine-tuning is about covering OpenAI's tracks, not "teaching" the AI how to "generalize." In particular I refuse to use the tool because now instead of reliably getting it to plagiarize from F# developers, I have no clue whatsoever if it's stealing or if it managed to truly autoregress its way into an ethical solution. So instead of rolling the dice on being a graceless scumbag, I'll just take my time writing out my code by hand.

And it was striking that GPT-3.5 had read and memorized more F# than Don Syme has seen in his entire life, yet in response to simple questions it was a mindless plagiarist. It's a stark illustration why the legal argument that ANN learning = human learning is vacuous, and why OpenAI should lose most of the copyright lawsuits it's facing.