Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Ideally in that scenario you'd have a model that unified vision, language and an understanding of 'doing things' and manipulating objects. so it wouldnt just be an LLM, it would be a language-vision-doingthings model. There's no reason why we cant build one.


Come to think of it, thats kindof what Tesla are building




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: