Catboost is an open-source ML Gradient Boosted Decision Trees algorithm, it's name come from the terms “Category” and “Boosting.” It was developed by Yandex (Russian Google ) in 2017
Key attributes of Catboost:
- ranking objective function
- native categorical features preprocessing
- model analysis
- fastest prediction time
- 30-60x faster as documented by real companies
- on GPUs it is 50-100x times faster than XGBoost.
- performs remarkably well with default parameters, significantly improving performance when tuned
- utilising ideas such as Ordered Target Statistics from online learning, CatBoost considers datasets sequential in time and permutes them
- By creating the concept of artificial time 🕰️ CatBoost cleverly reduces Prediction Shift, inherent in the traditional Gradient Boosting models such as XGBoost and LightGBM.
- 8X faster inference than XGBoost
- build better trees 🌲 that result in better regularisation and speed, especially during inference
References
- The Gradient Boosters V: CatBoost – Deep & Shallow
- XGBoost? CatBoost? LightGBM? | Plank
- When to Choose CatBoost Over XGBoost or LightGBM [Practical Guide]
- Is CatBoost faster than LightGBM and XGBoost?
- ICR - Identifying Age-Related Conditions | Kaggle
- Tabular Data: Deep Learning is Not All You Need
- When Do Neural Nets Outperform Boosted Trees on Tabular Data?
Resources
Catboost
- CatBoost: unbiased boosting with categorical features
- CatBoost: A Deeper Dive | Kaggle
- catboost_simple.py · optuna/optuna-examples
- CatBoost - open-source gradient boosting library
- CatBoost Github Repo
GBDT