Polars apply alternative: when/then/otherwise for 80x speedups
Eight worked examples that swap pandas-style .apply() in Polars for native when/then/otherwise expressions, plus benchmarks showing the real speedup on a 50M-row dataset.
Marcus is an analytics engineer with 9 years in the dbt and warehouse-modeling trenches. He spent three years at dbt Labs as a senior solutions architect helping enterprise customers (a large US bank, two telecom carriers) untangle 4000-model projects, and before that ran the analytics platform at HelloFresh's North America org where he rebuilt the supply-chain mart on Snowflake + dbt. His writing focuses on dbt project structure at scale, incremental model patterns that actually survive backfills, and the unglamorous work of column-level lineage and contract testing. He is a regular contributor to the dbt-utils package and co-maintains a small open-source linter for SQL style. Marcus lives in Berlin, holds a master's in statistics from UNC Chapel Hill, and roasts his own coffee badly.
Eight worked examples that swap pandas-style .apply() in Polars for native when/then/otherwise expressions, plus benchmarks showing the real speedup on a 50M-row dataset.
A practical, code-driven guide to hypothesis testing in Python using SciPy 1.17. Covers t-tests, chi-square, ANOVA, Mann-Whitney U, and Kruskal-Wallis with working examples, assumption checking, and a decision framework for choosing the right test.
Find out which automated EDA tool fits your Python workflow. We compare YData Profiling, SweetViz, DataPrep, and D-Tale with code examples, benchmarks, and a practical decision framework.
Master every cross-validation strategy in scikit-learn 1.8 — KFold, StratifiedKFold, GroupKFold, TimeSeriesSplit, and more. Includes working code, a decision flowchart, and tips for avoiding data leakage.
Learn how to use SHAP values to explain machine learning predictions in Python. Practical guide with scikit-learn, XGBoost, and LightGBM code examples, visualization techniques, and production-ready interpretability patterns.
A hands-on guide to XGBoost 3.2, LightGBM 4.6, and CatBoost 1.2.10 in Python. Covers benchmarks, categorical feature handling, Optuna hyperparameter tuning, SHAP interpretability, and production deployment.
A practical guide to time series forecasting in Python — from pandas fundamentals and classical ARIMA models to deep learning with NeuralForecast and zero-shot predictions with Amazon Chronos-2 foundation models.
A hands-on guide to Python data visualization in 2026 covering Matplotlib 3.10's accessible color cycles and Colorizer API, Seaborn's mature objects interface, and Plotly 6's Narwhals-powered zero-copy DataFrame support. Includes code examples, performance tips, and a library comparison.
Master the modern Python data stack by learning when and how to use Polars and DuckDB alongside pandas. This hands-on guide covers real code examples, performance benchmarks, and hybrid workflow patterns for 2026.