Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm able to run a 22b parameter GPT-Neo model on my 24gb 3090 and can fit a 30b parameter OPT model when combining my 3090 and 12gb 3080


Could you point to any resources online about how to do this? e.g. is this using 8-bit quantisation?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: