Yes but you can use an llm to label data and then train a bert model which then ...

hdhshdhshdjd · on July 19, 2024

Shhh, don’t tell everybody the secret. ;-)

Karrot_Kream · on July 20, 2024

Lol isn't everyone doing it? That's how I bootstraped my BERT fine-tunes.

hdhshdhshdjd · on July 20, 2024

I would say everybody smart is doing that, but a lot of the dumb money in AI right now is just wrappers on the GPT API That makes for a flashy demo with no underlying substance or expertise.

robrenaud · on July 20, 2024

Is the encoder style arch better for representing classification tasks at a given compute budget than a causal LM?

Is this because the final represention in bert style models more globally focused, rather than being optimized for next token prediction?

jerrygenser · on July 20, 2024

They are 100% better for classification at a given compute budget. They can account for information before and after e.g. a token for token classification and use that information to classify.