It hasn’t really worked so far. Pretty much exactly what you’ve described. I don...

It hasn’t really worked so far. Pretty much exactly what you’ve described. I don’t even really work on that team, but “a judge LLM” low-key triggered me just because of how much I’ve been hearing it over the last couple of months.

I think the reason of the recent pivot is to “keep the human in the loop” more. The current thinking is they tried to remove the human too much and were getting bad results. So now they just want to make the interaction faster and let the human be more involved like how we (developers) use Claude code or copilot by checking every interaction and nudging it towards the right/desired answer.

I got the sense that management isn’t taking it well though. Just this Friday they gave a demo of the new POC where the LLM is just suggesting things and frequently asking for permissions and where to go next and expecting the user to interact with it a lot more than the one-shot approach before (which I do think is likely to yield better results tbh) but the main reaction was “this seems like a massive step backward”