Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's a fair point. I suspect that to one outside the field, their touting major breakthroughs while trying to conceal that their first model was a distillation may cause a sense of skepticism as to the quality of their research. From what I've gathered, their research actually has added meaningfully to understandings of optimal model scaling and faster training.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: