We often ignore the importance of using good baseline systems and jump to the la...

PaulHoule · 2025-01-24T18:44:19 1737744259

As I see it, you need a model you can train quickly so you can do tuning, model selection, and all that.

I have a BERT + SVM + Logistic Regression (for calibration) model that can train 20 models for automatic model selection and calibration in about 3 minutes. I feel like I understand the behavior of it really well.

I've tried fine tuning a BERT for the same task and the shortest model builds take 30 minutes, the training curves make no sense (back in the day I used to be able to train networks with early stopping and get a good one every time) and if I look at arXiv papers it is rare for anyone to have a model selection process with any discipline at all, mainly people use a recipe that sorta-kinda seemed to work in some other paper. People scoff at you if you ask the engineering-oriented question "What training procedure can I use to get a good model consistently?"

Because of that I like classical ML.

korkybuchek · 2025-01-24T18:46:13 1737744373

There's a reason xgboost is still king in large companies.

3eb7988a1663 · 2025-01-25T05:49:26 1737784166

That's the thing that blows my mind. Even if NN are some percentage better, the training+deployment headaches are not worth it unless you have a billion users where a 0.1% lift equates to millions of dollars.

abhgh · 2025-01-24T23:09:17 1737760157

It is pleasantly surprising to see how close your pipeline is to mine. Essentially a good representation layer - usually based on BERT - like minilm or MPNet, followed by a calibrated linear SVM. Sometimes I replace the SVM with LightGBM if I have non-language features.

If I am building a set of models for a domain, I might fine-tune the representation layer. On a per-model basis I typically just train the SVM and calibrate it. For the amount of time this whole pipeline takes (not counting the occasions when I fine-tune), it works amazingly well.

shortrounddev2 · 2025-01-25T14:13:34 1737814414

I spent a week learning enough ML to design a recommender system that worked well with my company's use case. I knew enough linear algebra to determine that collaborative filtering with some specifically chosen dimensionality reduction and text vectorization algorithms as well as a strategy for scaling the models across multiple databases would work well for us. The solution was tailored specifically to the type of data we were working with.

When I presented the proposal, nobody read it and the meeting immediately turned to the vp of engineering and the ceo discussing neural networks and some other ML system that they had read about on HN the day before. When I tried to bring collaborative filtering up again, the VP said "I don't know what that is", so obviously he hadn't read the doc that I was assigned to write over the last week