> Took 20-30 follow-up messages, telling it to add things, remove things, fix things, make things compatible with the versions of software it was running etc.
So in other words you basically spent just as much time and effort as if you did it yourself?
I understand your point, and you are right. Had I been familiar with the image I was working with, and the version differences in configuration etc. It probably would have taken me the same amount of time. I look at this from a more zoomed out perspective, this is just the beginning. The point is its capable and will improve.
Will it improve though? I’m not a GPT hater or denier, but how do you even predict that it hasn’t already hit the wall? They can increase parameter count x100 again, but correctness is not some knob they can just dial up to 10. What if a learning dataset simply has not enough info for a correct answer to have a greater weight than all the “con” noise? What if an answer requires a sort of reasoning inaccessible to LLMs?
Stories itt can as well be boiled down to “I fed it with corrections for some time and it didn’t f..k up this last time and finally included everything into the answer”. What makes you think it would not do just that better or quicker?
Edit: Another probably highly related question is, can it answer “I don’t know this / not sure about these parts”? Never seen that in chat logs.
> What if a learning dataset simply has not enough info for a correct answer to have a greater weight than all the “con” noise?
Indeed. I wonder what happens as available training data shifts from purely human-generated (now) to largely AI-generated (soon).
Is this an information analogue to the “gray goo” doomsday that an uncontrolled self-replicating nano device could cause?
>can it answer “I don’t know this”
Such a fabulous question. This statement likely appears infrequently in the training data.
>can it answer “I don’t know this”
Afaik this is one of the more newer ways of training ML models, I've been looking into using it myself for a few things.
A lot of models were trained to provide some quantifiable output 100% of the time, even if that output was wrong. Ie image recognition models "82.45% certain that is a dog", whereas it makes _all_ the difference for it to be able to say "82.42% certain that is a dog and 95.69% certain I don't know what that is" to indicate that the image has many features of a dog, but not enough for it to be more certain that it is a dog than isn't. It's the negative test problem I guess; us devs often forget to do it too.
In a way I wonder if that's how some of the systems in our brains work as well; ie we evolved certain structures to perform certain tasks, but when those structures fail to determine an action, the "I don't know" from that system can kick back into another. Thing like the fear response: brain tries to identify dark shadow & can't, kicks back to evolutionary defence mechanisms of be scared/cautious feel fear as it's saved the skins of our forebears.
Isn't that what the thumbs up/down are for? Some kind of annotating that can be used to improve future iterations of training ? They've got millions of people feeding potentially billions of queries, probably tons of feedback - would this not result in an improvement over time?
Assuming that the existing corpus was already coherent with what experts find true (afaik, they used all available books and common knowledge resources), why would any amount of additional corrective statements make a difference for a retrained model? It’s not that our written knowledge was wrong all the time and we tolerated it until mid 2022.
I don’t really understand how it works, how its iterations are different or what the roadmap is. But what I managed to learn (better say feel) about LLMs isn’t very consistent with such linear predictions.
Well, maybe it will use downvotes as anti-prompts? Existing sources must have had votes too, but it was probably only a subset. Maybe the current iteration didn’t rank by vote at all, so the next one will really shine? Guess we’ll see soon.
So far this has been my experience developing with it, lightning fast but inaccurate results, then just as much time getting it to work as I would've spent writing it myself.
Lmao, the difference being, of course, one tab open vs 20 different tabs for various aspects of Dockerfile docs, SO's providing more detail that the docs lack, etc.
Yeah we can all write this stuff by hand but it's incredibly exciting; when it first came out I was asking it to write snippets of JS for stuff, additions, removals, asking it to write unit tests and then update them when the overall code changed and it maintained several different "threads" of conversation all related to a singular exercise just fine. Sure it's not perfect, but it's kind of cool having a super junior dev who happens to have instant access to most documentation around at the time in its head.
The arc of software development is to make people depend on vaguer specifications to tell systems what they want, and rely on recognition of what they see being what they want with fast feedback loops, versus slowly recalling the syntax of how to state it.
It could be the same time spent, but not the same amount of cognitive effort.
So in other words you basically spent just as much time and effort as if you did it yourself?