Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Batching can lead to variance with things like batchnorm but most transformers use layer norm to avoid this problem


Batchnorm can only have an effect between batches during training, not inference.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: