I am still greatly disappointed by the insistence on end-to-end black-box models.
In the end, as impressive as these results are, they are fundamentally trending in the wrong direction. All the benefits and certainty (e.g. security, correctness, and reproducibility) provided by theorem provers and model-driven systems are thrown out the window in favour of fast but potentially wrong or insecure results.
The worst part of this development is the psychological aspect - humans have a tendency to rely on machine generated results and view those as superior. The disconnect between working code and correct or secure code respectively is going to widen using this approach.
A glaring example is found in the blog post: the image manipulation example (7.) contains an error that the author failed to even recognise or mention. Instead of turning the uploaded image into a mosaic as intended, the generated code simply creates a fixed-size black-and-white checker-board pattern. This is clearly neither a mosaic nor image manipulation.
It is a very impressive tech demo, but generating actual software that can be trusted and rigorously checked against requirements will end up using a formal description (i.e. programming language, theorems, or modelling akin to UML) anyway.
>in favor of fast but potentially wrong or insecure results.
Formal methods don't eliminate wrong or insecure results. Formal methods tell you that a program matches the specification when certain conditions are true eg. no bit flips, the computer does not crash, the kernel doesn't kill your process, that allocations succeed, that your program can make progress (the kernel can decide to never schedule your program), etc. You can have bugs in writing the specification where the specification does not match your intention. Even your intention of a system may have vulnerabilities in it. If your specification can't generate code, your code might not match the specification.
Using formal methods slows you down compared to things like testing which can get us most of the way there in less time. Systems can be designed to be robust such that if a machine fails the system keeps on running. If a end-to-end black-box model can get you most of the way there with a sufficiently low number of bugs it may be worth it to use. Time is a limited resource and being 100% correct is not necessarily better than being 99% correct and having extra features.
>The disconnect between working code and correct or secure code respectively is going to widen using this approach.
Not really. People are not going to just start ignoring bugs when they run into them because the software they are using happened to be machine generated.
>Instead of turning the uploaded image into a mosaic as intended, the generated code simply creates a fixed-size black-and-white checker-board pattern
It worked fine for an image I just tried. Just like the prompt it "convert[ed] the image to a 32x32 mosaic." There was no checkerboard, but it may be worth noting that it converted transparent pixels to black.
> Formal methods don't eliminate wrong or insecure results.
Yes, they do. That's the entire purpose of a proof. Of course formal methods cannot prevent or even detect wrong specifications, but that's no different from generated code either.
> Using formal methods slows you down compared to things like testing which can get us most of the way there in less time.
But that's the point - formal methods are slow if people have to apply them. Automated theorem provers exist and can work on generated ASTs, so why not add the step and create a hybrid system that verifies the generated result?
>> Instead of turning the uploaded image into a mosaic as intended, the generated code simply creates a fixed-size black-and-white checker-board pattern
> It worked fine for an image I just tried. Just like the prompt it "convert[ed] the image to a 32x32 mosaic." There was no checkerboard, but it may be worth noting that it converted transparent pixels to black.
The code may have been changed - it works for me now, too, when before it definitely didn't.
>Of course formal methods cannot prevent or even detect wrong specifications
A wrong specification can give you a wrong or insecure result. That was my point. Formal methods aren't a sliver bullet and your system still needs to be robust to failures.
>so why not add the step and create a hybrid system that verifies the generated result?
Because the time spent writing a specification is time wasted if there ends up being no issues with the generated code.
People are going to generate black-box code, launch it as an MVP, and then hire engineers to iron out bugs or even do whole rewrites if the product gets traction. It's all going to just fit back into the same standard model.
There's no way to un-black box it. There's just too many parameters. Call it black box or lack of model explainability. It's effectively the same thing.
In the end, as impressive as these results are, they are fundamentally trending in the wrong direction. All the benefits and certainty (e.g. security, correctness, and reproducibility) provided by theorem provers and model-driven systems are thrown out the window in favour of fast but potentially wrong or insecure results.
The worst part of this development is the psychological aspect - humans have a tendency to rely on machine generated results and view those as superior. The disconnect between working code and correct or secure code respectively is going to widen using this approach.
A glaring example is found in the blog post: the image manipulation example (7.) contains an error that the author failed to even recognise or mention. Instead of turning the uploaded image into a mosaic as intended, the generated code simply creates a fixed-size black-and-white checker-board pattern. This is clearly neither a mosaic nor image manipulation.
It is a very impressive tech demo, but generating actual software that can be trusted and rigorously checked against requirements will end up using a formal description (i.e. programming language, theorems, or modelling akin to UML) anyway.