Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

^ Er, misspoke, each expert is at most .9 B parameters there's 128 experts. 5.1 B is number of active parameters (4 experts + some other parameters).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: