Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Almost all deployed ML systems work like this.

I.e. for classification you can judge "certainty" by the soft-max outputs of the classifier, then in the less certain cases can refuse to classify and send it to humans.

And also do random sampling of outputs by humans to verify accuracy over time.

It's just that humans are really expensive and slow though, so it can be hard to maintain.

But if humans have to review everything anyway (like with the EU's AI act for many applications) then you don't really gain much - even though the humans would likely just do a cursory rubber-stamp review anyway, as anyone who has seen Pull Request reviews can attest to.



I have the same experience but I am still 5 to 10 times more productive using claude. I'll have it write a class, have it write tests for the class and give it the output of the tests, from which it usually figures out problems like "oops those methods don't exist". Along the way I am guiding it on the approach and architecture. Sometimes it does get stuck and it needs very specific intervention. You need to be a senior engineer to do this well, In the end I usually get what I want with way more tests than I would have the patience to write and a fraction of the time. Importantly since it now has the context loaded, I can have it write nicely formatted documentation and add bells and whistles like a pretty cli, with minimal effort. In the end I usually get what I want with better tests, docs and polish in a fraction of the time, especially with cursor which makes the iteration process so much faster.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: