Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

And? Even if I believed this to be a limitation, I could bolt an adapter to an LLM to make it input and output non-text data.

That's how a lot of bleeding edge multimodals work already. They can take and emit images, sound, actions and more.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: