Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes but you can use an llm to label data and then train a bert model which then costs a small fraction of time and money to run than the original llm.


Shhh, don’t tell everybody the secret. ;-)


Lol isn't everyone doing it? That's how I bootstraped my BERT fine-tunes.


I would say everybody smart is doing that, but a lot of the dumb money in AI right now is just wrappers on the GPT API That makes for a flashy demo with no underlying substance or expertise.


Is the encoder style arch better for representing classification tasks at a given compute budget than a causal LM?

Is this because the final represention in bert style models more globally focused, rather than being optimized for next token prediction?


They are 100% better for classification at a given compute budget. They can account for information before and after e.g. a token for token classification and use that information to classify.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: