Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think you're seriously underestimating the importance of the RL steps on LLM performance.

Also how do you think the most successful RL models have worked? AlphaGo/AlphaZero both use Neural Networks for their policy and value networks which are the central mechanism of those models.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: