Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Are these models built atop models that already understand natural language?

If the commands all follow the same syntax, it's easy to imagine how you can generate a good training set.

But how to they fully grasp natural language to be able to perform tasks worded unexpectedly, which would be easy to parse, if they understood natural language?



"But how to they fully grasp natural language to be able to perform tasks worded unexpectedly, which would be easy to parse, if they understood natural language?"

A Large Language Model. Pardon me for spelling out the full acronym, but it is what it is for a reason.

I think a lot of the whiz-bang applications of LLMs have drowned it out, but LLMs are effectively the solution to the long-standing problem of natural language understanding, and that alone would be enough to make them a ground-breaking technology. Taking English text and translating it with very high fidelity into the vector space these models understand is amazing and I think somewhat underappreciated.


Yes, the newer image and video editing models have an LLM bolted onto them. The rich embeddings from the LLM are fed into a diffusion transformer (DiT) alongside a tokenized version of the input image. These two streams “tell” the model what to do.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: