← Blog

What the bake-off taught us: classical ML is not dead, it is just under-attended

May 31, 2026 · 3 min read

There is a quiet meme in the AI community that "classical machine learning is solved" and serious people should be working on LLMs and agents. I do not buy it. Most production prediction problems still run on logistic regression, random forests, and gradient boosting. Those teams are not lacking ambition - they are doing the math.

To illustrate, we ran a head-to-head bake-off on a real tabular dataset: predicting which trial users would convert to paid. Twelve classical algorithms, same train/test split, same preprocessing pipeline. Here is what we found.

The setup

The data: 50,000 trial users with 28 features (usage frequency, support tickets, feature adoption, demographic signals). The target: did they convert within 30 days. Positive rate: 14%.

The algorithms: logistic regression, ridge, lasso, decision tree, random forest, extra trees, gradient boosting, AdaBoost, SVM with RBF kernel, k-nearest neighbours, Gaussian Naive Bayes, and a Decision Tree we cross-validated for depth.

The headline result

Gradient boosting won on accuracy (0.873) and ROC AUC (0.91). Random forest came in close second. But here is the surprise: logistic regression scored 0.852 on accuracy and 0.89 on ROC AUC. It trained in 80 milliseconds. It produced coefficients you could put in a Slack message and have the product team understand.

The gap between the best ensemble and the simplest linear model was 2 percentage points. The gap between either of them and the next decision (when do we hand a salesperson a customer) was 20 percentage points. We spent four months tuning the ensemble. We could have spent three weeks shipping the linear model and four months talking to the sales team.

What the underperformers told us

SVM took 14 minutes to train and scored 0.847. The kernel computation drowned us. KNN took 6 milliseconds to train (it stores the dataset) and 2.4 seconds per prediction. Naive Bayes scored 0.812 because the independence assumption is laughably wrong for this data.

Each algorithm encodes assumptions. SVM assumes a meaningful margin separates classes in some kernel space. KNN assumes that similar feature vectors mean similar labels. Naive Bayes assumes features are conditionally independent given the label. When the assumptions match your data, the algorithm sings. When they do not, you are paying for the wrong tool.

The interpretability dimension

We graded each model on interpretability from 1 (black box) to 5 (every coefficient inspectable). Gradient boosting: 3. Random forest: 3. Logistic regression: 5.

For our use case - explaining to a customer success rep why a trial user is high-priority - 5 wins. The rep does not need "the model says so." They need "this account has 4 power users, they hit the API 200 times last week, they opened a support ticket but it got resolved fast." Logistic regression coefficients give you that directly. Random forest gives you SHAP values that you have to translate.

When ensembles do earn their keep

Two cases where ensembles outperformed clearly:

  • Non-linear interactions. When the relationship between two features and the target is curved, linear models miss it. Ensembles capture it implicitly.
  • Mixed feature types. Ensembles handle categorical and continuous features without scaling. Linear models need explicit one-hot encoding and feature engineering.

For our trial-conversion data, neither was decisive. The features were mostly continuous, the relationships mostly monotonic. Linear was enough.

What this means in practice

The default playbook in 2026 is: start with logistic regression or linear regression. Look at the coefficients. Talk to your product team about what they imply. If the residuals show clear non-linearity, move to random forest. If you need every last point of accuracy and you can afford the training cost, gradient boosting. Reserve SVM for genuinely small high-dimensional datasets (a few thousand rows, hundreds of features). Reserve KNN for cases where similar-means-similar is the structure you care about.

LLMs and agents are extraordinary tools for extraordinary problems. Classical ML is still the right tool for most ordinary ones. The teams who recognise this ship faster, debug easier, and explain their predictions to non-technical stakeholders without diagrams.

We are not done with classical ML. We are barely paying attention to it.

Sign in to save and react.
Share Copied