Scikit-Learn Just Got Its Biggest Upgrade in Years — Here's What Changed
If you've been working in Python machine learning for any length of time, you know scikit-learn. It's the library that quietly powers the majority of production ML systems out there — everything from churn prediction to fraud detection to recommendation engines. While PyTorch and TensorFlow tend to grab all the headlines, scikit-learn has been doing the unglamorous work of keeping real-world pipelines running.
So when version 1.8 dropped on December 10, 2025, people noticed. And for good reason.
The headline feature? Native GPU support. But honestly, that's just the start. Scikit-learn 1.8 also brings support for free-threaded Python 3.14 (yes, the no-GIL future is actually happening), temperature scaling for better multiclass probability calibration, a new ClassicalMDS algorithm, some serious performance boosts to linear models and decision trees, and a nicer experience in Jupyter notebooks.
Let's walk through all of it with hands-on code examples, benchmarks, and practical advice you can actually use. Whether you're training models in notebooks or running pipelines in production, there's something here for you.
GPU Acceleration via the Array API Standard
Let's start with the big one. For years, the most common complaint about scikit-learn was pretty simple: it only ran on CPU. Want GPU acceleration? You had to jump ship to RAPIDS cuML, PyTorch, or TensorFlow — even for basic stuff like scaling features or fitting a ridge regression.
Scikit-learn 1.8 changes that.
How It Actually Works
Rather than bolting CUDA dependencies directly into the library (which would have been a maintenance nightmare), the team went with a much cleaner approach: the Python Array API standard. It's a cross-library specification that defines a common interface for array operations, and it's supported by NumPy, CuPy, PyTorch, JAX, and others.
In practice, what this means is pretty straightforward — you pass CuPy arrays or PyTorch GPU tensors directly to scikit-learn estimators, and computation happens on whatever device those arrays live on. GPU, CPU, even Apple's MPS accelerator. No data conversion needed, no extra dependencies in scikit-learn itself.
# First, set the environment variable before imports
# export SCIPY_ARRAY_API=1
import sklearn
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import RidgeClassifierCV
from sklearn.pipeline import make_pipeline
import numpy as np
# Generate some training data
from sklearn.datasets import make_classification
X_np, y_np = make_classification(
n_samples=50_000,
n_features=200,
n_informative=50,
n_classes=5,
random_state=42
)
Using CuPy for NVIDIA GPU Acceleration
If you have an NVIDIA GPU, CuPy is the most direct path to GPU-accelerated scikit-learn. It's essentially a drop-in replacement for NumPy that runs on CUDA hardware.
import cupy
# Move data to GPU — this is the only change you need
X_gpu = cupy.asarray(X_np)
y_gpu = cupy.asarray(y_np)
# Enable Array API dispatch and use scikit-learn as normal
with sklearn.config_context(array_api_dispatch=True):
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_gpu)
# The output stays on GPU
print(type(X_scaled)) #
print(X_scaled.device) #
# Fit a classifier — all computation happens on GPU
clf = RidgeClassifierCV(alphas=[0.01, 0.1, 1.0, 10.0])
clf.fit(X_scaled, y_gpu)
predictions = clf.predict(X_scaled)
print(type(predictions)) #
Here's the thing I really like about this approach: you don't tell scikit-learn to use the GPU. It detects the array type and dispatches automatically. Outputs stay on the same device as inputs, so you can chain operations in a pipeline without constant CPU-GPU transfers bouncing your data around.
Using PyTorch Tensors on GPU
Already in a PyTorch ecosystem? (A lot of us are.) You can feed PyTorch tensors directly to scikit-learn:
import torch
# Create tensors on GPU
X_torch = torch.tensor(X_np, device="cuda", dtype=torch.float32)
y_torch = torch.tensor(y_np, device="cuda", dtype=torch.float32)
with sklearn.config_context(array_api_dispatch=True):
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_torch)
print(type(X_scaled)) #
print(X_scaled.device.type) # 'cuda'
This is particularly powerful for hybrid workflows — use scikit-learn for preprocessing and traditional ML, then hand off to PyTorch for deep learning, all without ever moving data off the GPU.
Which Estimators Support GPU?
Not everything works with GPU arrays yet. The support is growing with each release, but here's what's available as of 1.8:
Preprocessing:
StandardScaler,MinMaxScaler,MaxAbsScaler,NormalizerPolynomialFeatures,Binarizer,KernelCentererLabelEncoder,LabelBinarizer
Linear Models:
Ridge,RidgeCV,RidgeClassifier,RidgeClassifierCV(solver="svd")
Dimensionality Reduction:
PCA(with specific SVD solvers)LinearDiscriminantAnalysis(solver="svd")
Other:
GaussianNBGaussianMixtureCalibratedClassifierCV(method="temperature")
Meta-estimators (if the base estimator supports it):
GridSearchCV,RandomizedSearchCVHalvingGridSearchCV,HalvingRandomSearchCV
Metrics: Over 30 metrics including accuracy_score, precision_score, recall_score, f1_score, r2_score, mean_squared_error, cosine_similarity, euclidean_distances, and many more.
A Complete GPU Pipeline Example
Here's a more realistic example — a preprocessing + training pipeline that runs entirely on GPU:
import cupy
import sklearn
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import RidgeClassifierCV
from sklearn.calibration import CalibratedClassifierCV
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import cross_validate
import numpy as np
# Generate dataset
X_np, y_np = make_classification(
n_samples=100_000,
n_features=50,
n_informative=25,
n_classes=3,
random_state=42
)
# Move to GPU
X_gpu = cupy.asarray(X_np.astype(np.float32))
y_gpu = cupy.asarray(y_np)
# Build pipeline — same API as always
pipeline = make_pipeline(
StandardScaler(),
CalibratedClassifierCV(
RidgeClassifierCV(alphas=[0.01, 0.1, 1.0, 10.0]),
method="temperature"
)
)
# Train and evaluate on GPU
with sklearn.config_context(array_api_dispatch=True):
cv_results = cross_validate(
pipeline, X_gpu, y_gpu,
cv=5,
scoring="accuracy",
return_train_score=True
)
print(f"Test accuracy: {cv_results['test_score'].mean():.4f}")
print(f"Train accuracy: {cv_results['train_score'].mean():.4f}")
How Much Faster Is GPU, Really?
It depends. (I know, everyone's favorite answer.)
For the pipeline above with 100K samples and 50 features, you can expect roughly a 10x speedup on a modern GPU like an A100 or RTX 4090 compared to a single CPU core. For operations dominated by linear algebra — PCA on high-dimensional data, for instance — the improvement can hit 20-50x.
But there are caveats. GPU acceleration shines when:
- Your dataset has many samples (tens of thousands or more)
- The computation is dominated by matrix operations
- You're running the pipeline multiple times (cross-validation, hyperparameter search, etc.)
It won't help much for small datasets, tree-based models (which are inherently sequential), or operations dominated by Python overhead. RandomForest, GradientBoosting, and DecisionTree are all Cython-based and fundamentally not amenable to GPU acceleration via the Array API.
Input/Output Type Rules
Quick note on how scikit-learn handles array types — it follows a simple convention: everything follows X. When you fit with GPU arrays:
- All other inputs (
y,sample_weight) get automatically converted to match X's array library and device - Fitted attributes (like
scaler.mean_) stay on the same device as training data - Prediction methods expect inputs from the same library and device
Need to bring results back to NumPy for visualization or saving? Convert explicitly:
# CuPy to NumPy
predictions_np = cupy.asnumpy(predictions)
# PyTorch to NumPy
predictions_np = predictions.cpu().numpy()
# Or use the helper utility
from sklearn.utils._array_api import _estimator_with_converted_arrays
clf_np = _estimator_with_converted_arrays(clf, lambda arr: cupy.asnumpy(arr))
Free-Threaded Python Support: The No-GIL Future Is Here
The second major feature — and arguably the one with the most long-term impact — is support for free-threaded CPython 3.14. Python without the GIL. If that doesn't excite you at least a little, you might not have spent enough late nights debugging multiprocessing issues.
Wait, What Is Free-Threaded Python?
For decades, Python's Global Interpreter Lock (GIL) has been the bottleneck for CPU-bound parallel work. The GIL ensures only one thread executes Python bytecode at a time, which means even on a 64-core machine, a multi-threaded Python program is effectively single-core for CPU-bound tasks. The workaround has always been multiprocessing — spawning separate processes — which adds overhead for inter-process communication and memory duplication.
With PEP 703 and PEP 779, the GIL can now be disabled. As of Python 3.14, free-threaded CPython is officially supported (though not yet the default build).
What This Means for Scikit-Learn Users
Scikit-learn uses joblib for parallelism (that's the n_jobs parameter you see everywhere). Historically, joblib relied on process-based parallelism to work around the GIL. With free-threaded Python, it can use threads instead, which is fundamentally more efficient:
- No memory duplication: Threads share the same address space — no copying large datasets between workers
- No serialization overhead: No pickling and unpickling objects between threads
- Lower startup cost: Spawning a thread is orders of magnitude faster than forking a process
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
import joblib
# Define a parameter grid
param_grid = {
"n_estimators": [100, 200, 500],
"max_depth": [5, 10, 20, None],
"min_samples_split": [2, 5, 10]
}
clf = RandomForestClassifier(random_state=42)
grid_search = GridSearchCV(clf, param_grid=param_grid, cv=5, n_jobs=4)
# On free-threaded Python 3.14, use threading backend
# No memory duplication, no serialization — just shared-memory parallelism
with joblib.parallel_config(backend="threading"):
grid_search.fit(X, y)
print(f"Best score: {grid_search.best_score_:.4f}")
print(f"Best params: {grid_search.best_params_}")
How to Install Free-Threaded Python
Want to try it yourself? Here's how:
# Using pyenv (recommended)
pyenv install 3.14.0a4-freethreading
pyenv shell 3.14.0a4-freethreading
# Verify GIL status
python -c "import sys; print(sys._is_gil_enabled())"
# False — GIL is disabled
# Install scikit-learn with free-threaded wheels
pip install scikit-learn
Scikit-learn ships prebuilt wheels for free-threaded Python on all supported platforms, so no compiling from source needed.
Performance: What to Expect
Early benchmarks show up to 4x speedup for CPU-bound parallel workloads compared to the GIL-enabled build. There's a trade-off though: single-threaded performance takes a roughly 5-10% hit due to additional synchronization mechanisms.
For scikit-learn specifically, the biggest wins come from parallelized operations — cross-validation, grid search, ensemble fitting, anything with n_jobs > 1. If your typical workflow is fitting a single model on a single core, you won't see dramatic differences yet.
Temperature Scaling: Better Probability Calibration for Multiclass Problems
Probability calibration is one of those topics that doesn't get nearly enough attention in applied ML. Here's the problem: many classifiers output probabilities that aren't well-calibrated. A model that says "90% chance of class A" might actually only be right 70% of the time. That matters a lot in medical diagnosis, risk scoring, and any system where you need to trust the predicted probabilities.
Scikit-learn 1.8 introduces temperature scaling as a new calibration method in CalibratedClassifierCV, specifically designed for multiclass problems.
Why Temperature Scaling Matters
The existing calibration methods — Platt scaling and isotonic regression — work one-vs-rest for multiclass. They fit separate calibration models for each class, which can lead to inconsistencies (calibrated probabilities that don't sum to 1) and requires more calibration data.
Temperature scaling is different. It learns a single scalar parameter (the "temperature") that softens or sharpens the logits. This is elegant because:
- It has only one free parameter regardless of the number of classes
- Calibrated probabilities are guaranteed to sum to 1
- It requires less calibration data than one-vs-rest approaches
- It preserves prediction rankings (same accuracy, better calibration)
from sklearn.calibration import CalibratedClassifierCV
from sklearn.datasets import make_classification
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
import numpy as np
# Generate a multiclass dataset
X, y = make_classification(
n_samples=10_000,
n_features=20,
n_informative=15,
n_classes=5,
n_clusters_per_class=1,
random_state=42
)
X_train, X_cal, y_train, y_cal = train_test_split(
X, y, test_size=0.3, random_state=42
)
# Train a base classifier
base_clf = GaussianNB()
base_clf.fit(X_train, y_train)
# Calibrate with temperature scaling
temp_clf = CalibratedClassifierCV(
base_clf,
method="temperature",
ensemble=False
)
temp_clf.fit(X_cal, y_cal)
# Compare calibrated vs uncalibrated probabilities
proba_uncal = base_clf.predict_proba(X_cal[:5])
proba_cal = temp_clf.predict_proba(X_cal[:5])
print("Uncalibrated probabilities:")
print(proba_uncal.round(3))
print("\nTemperature-scaled probabilities:")
print(proba_cal.round(3))
When Should You Use Temperature Scaling?
It works best when you have a multiclass problem (3+ classes), your base classifier already has reasonable accuracy, you have limited calibration data (since it only learns one parameter), and you need probabilities that sum exactly to 1.
For binary classification, Platt scaling and isotonic regression are still solid choices. For multiclass with abundant calibration data, isotonic regression may achieve better calibration — but at the cost of more parameters.
ClassicalMDS: A New Manifold Learning Algorithm
Scikit-learn 1.8 adds ClassicalMDS to the sklearn.manifold module. If you've worked in bioinformatics or ecology, you might know this as Principal Coordinates Analysis (PCoA) or Torgerson's scaling.
What Is Classical MDS, Exactly?
Classical MDS finds a low-dimensional representation of data that preserves pairwise distances from the original high-dimensional space. Unlike the iterative MDS already in scikit-learn (which uses stress minimization via gradient descent), Classical MDS has a closed-form solution via eigendecomposition. That means it's deterministic, faster, and doesn't get stuck in local minima.
Fun fact: when applied to Euclidean distances, Classical MDS produces results equivalent to PCA. But it can also handle non-Euclidean distance matrices, which makes it more versatile in certain applications.
from sklearn.manifold import ClassicalMDS
from sklearn.datasets import make_s_curve
import matplotlib.pyplot as plt
# Generate an S-curve dataset (3D -> 2D)
X, color = make_s_curve(n_samples=2000, random_state=42)
# Apply Classical MDS
cmds = ClassicalMDS(n_components=2)
X_embedded = cmds.fit_transform(X)
# Visualize the result
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Original 3D data (projected to 2D for visualization)
axes[0].scatter(X[:, 0], X[:, 2], c=color, cmap="viridis", s=5, alpha=0.7)
axes[0].set_title("Original S-Curve (X vs Z)")
# Classical MDS embedding
axes[1].scatter(X_embedded[:, 0], X_embedded[:, 1], c=color, cmap="viridis", s=5, alpha=0.7)
axes[1].set_title("Classical MDS Embedding")
plt.tight_layout()
plt.savefig("classical_mds_example.png", dpi=150)
plt.show()
ClassicalMDS vs MDS vs PCA: When to Use What
Quick decision guide:
- Use PCA when you have the raw feature matrix and want linear dimensionality reduction
- Use ClassicalMDS when you have a precomputed distance matrix or want a deterministic, non-iterative solution
- Use MDS (iterative) when you need non-metric distance functions or more flexibility
ClassicalMDS is especially handy in bioinformatics, psychology, ecology — basically any field where you start with pairwise dissimilarities rather than raw feature vectors.
Massive Speedup for L1-Penalized Linear Models
This one's an under-the-hood improvement that can save you real time. Scikit-learn 1.8 introduces gap safe screening rules for coordinate descent solvers, and the effect on L1-penalized models is dramatic.
Which Models Benefit?
LassoandLassoCVElasticNetandElasticNetCVMultiTaskLassoandMultiTaskLassoCVMultiTaskElasticNetandMultiTaskElasticNetCV
How Screening Rules Work
The idea is actually quite elegant. Before solving the full optimization problem, the algorithm identifies features whose coefficients are guaranteed to be zero at the optimal solution. Those features get excluded from the computation entirely, reducing the effective dimensionality.
The stronger the L1 penalty (higher alpha), the more features get screened out, and the bigger the speedup. For high-dimensional datasets with many irrelevant features — think genomics, text analysis, signal processing — the improvement is substantial.
from sklearn.datasets import make_regression
from sklearn.linear_model import LassoCV, ElasticNetCV
import time
# High-dimensional dataset with sparse signal
X, y = make_regression(
n_samples=5_000,
n_features=10_000,
n_informative=100, # Only 100 features actually matter
noise=0.1,
random_state=42
)
# LassoCV with screening rules (scikit-learn 1.8)
start = time.time()
lasso = LassoCV(cv=5, random_state=42)
lasso.fit(X, y)
elapsed = time.time() - start
print(f"LassoCV fit time: {elapsed:.1f} seconds")
print(f"Non-zero coefficients: {(lasso.coef_ != 0).sum()}")
print(f"Best alpha: {lasso.alpha_:.6f}")
# ElasticNetCV benefits similarly
start = time.time()
enet = ElasticNetCV(cv=5, l1_ratio=0.9, random_state=42)
enet.fit(X, y)
elapsed = time.time() - start
print(f"\nElasticNetCV fit time: {elapsed:.1f} seconds")
print(f"Non-zero coefficients: {(enet.coef_ != 0).sum()}")
In previous versions, fitting ElasticNetCV on 10,000 features could take over a minute. With screening rules in 1.8, the same operation often finishes in under 20 seconds. That's a 3-5x improvement depending on solution sparsity, and you get it for free — no code changes required.
DecisionTreeRegressor: From O(n²) to O(n log n)
If you've ever used DecisionTreeRegressor with criterion="absolute_error", you probably know the pain. It was slow. Like, painfully slow on anything beyond a toy dataset. The culprit was an O(n²) split-finding algorithm — doubling your samples made fitting four times slower.
Scikit-learn 1.8 replaces it with an O(n log n) algorithm. Night and day difference.
from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import make_regression
import time
X, y = make_regression(n_samples=100_000, n_features=10, random_state=42)
# Fit with absolute_error criterion
start = time.time()
tree = DecisionTreeRegressor(criterion="absolute_error", max_depth=10)
tree.fit(X, y)
elapsed = time.time() - start
print(f"Fit time with absolute_error: {elapsed:.2f} seconds")
# scikit-learn 1.8: ~0.2 seconds
# scikit-learn 1.7: ~20+ seconds for a single split on 100K samples
This improvement is especially valuable for ensemble methods. Building a RandomForestRegressor or GradientBoostingRegressor with absolute error? The speedup compounds across hundreds of trees.
Better Jupyter Notebook Experience
This one's subtle but genuinely useful. When you display a pipeline or estimator in a Jupyter notebook, each hyperparameter now shows:
- Clickable links to the relevant online documentation
- Tooltips with the parameter's docstring on hover
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.decomposition import PCA
# Display a pipeline in Jupyter — each parameter is now hyperlinked
pipeline = make_pipeline(
StandardScaler(),
PCA(n_components=50),
LogisticRegression(C=10, max_iter=1000, random_state=42)
)
# Simply display the pipeline object in a notebook cell
pipeline
# Each parameter (C=10, max_iter=1000, etc.) now links to documentation
# and shows a description tooltip on hover
It might seem like a small thing, but when you're reviewing someone else's pipeline (or revisiting your own from six months ago), having instant access to parameter docs without tab-switching is genuinely nice.
Migration Guide: Upgrading to Scikit-Learn 1.8
Good news: backward compatibility is strong. But there are a few things to watch for before upgrading production systems.
Step 1: Check Your Python Version
Scikit-learn 1.8 requires Python 3.10 or later. Still on 3.9? Time to upgrade.
# Check your current versions
python --version
pip show scikit-learn
# Upgrade scikit-learn
pip install --upgrade scikit-learn
# Verify the installation
python -c "import sklearn; print(sklearn.__version__)"
# 1.8.0
Step 2: Review Deprecation Warnings
A few things that were deprecated in earlier versions are now removed or changed:
base.clone()— thesafeparameter has been removedforce_all_finitehas been renamed toensure_all_finitein validation functions- Some unused keyword arguments in metric functions now raise errors instead of being silently ignored
# Before (deprecated)
from sklearn.base import clone
cloned = clone(estimator, safe=True) # 'safe' parameter removed
# After
cloned = clone(estimator) # Just remove the parameter
# Before (deprecated)
from sklearn.utils.validation import check_array
check_array(X, force_all_finite=True)
# After
check_array(X, ensure_all_finite=True)
Step 3: Enable GPU Support (Optional)
If you want to try GPU acceleration:
# 1. Install a GPU array library
pip install cupy-cuda12x # For NVIDIA GPUs with CUDA 12.x
# or
pip install torch # For PyTorch GPU support
# 2. Set the environment variable for SciPy Array API support
export SCIPY_ARRAY_API=1
# 3. In your code, enable array API dispatch
import sklearn
sklearn.set_config(array_api_dispatch=True) # Global setting
# or use context manager for specific code blocks
Step 4: Test Your Existing Code
Before deploying, run your test suite and check for deprecation warnings:
# Run tests with all warnings visible
python -W all -m pytest tests/
# or
python -W all your_ml_pipeline.py 2>&1 | grep -i "deprecat"
Practical Recommendations: Choosing the Right Optimization
With all these new options, here's a quick decision framework for speeding up your ML workflows:
For Data Preprocessing at Scale
Preprocessing large datasets (100K+ samples) with scaling, normalization, or polynomial features? GPU acceleration via CuPy or PyTorch is your best bet. Dramatic speedup, minimal code changes.
For Hyperparameter Tuning
Grid search and cross-validation are embarrassingly parallel. On a multi-core machine with free-threaded Python 3.14, the threading backend in joblib can cut memory usage and speed up searches significantly compared to process-based parallelism.
For High-Dimensional Sparse Models
Fitting Lasso, ElasticNet, or their multi-task variants on datasets with thousands of features? Just upgrade. The gap safe screening rules give you a substantial speedup with zero code changes.
For Production ML Pipelines
If you're running scikit-learn in production, take the cautious route: upgrade, run your test suite, fix deprecation warnings, and then selectively enable GPU acceleration for the bottleneck stages. Don't try to GPU-accelerate everything at once — profile first, optimize where it actually matters.
What's Coming Next
While there's no formal timeline for scikit-learn 2.0, the direction is clear. Array API support will keep expanding to cover more estimators — expect tree-based models and ensembles to gain GPU-aware implementations in future releases. Free-threaded Python support will mature as CPython 3.14 approaches stability and more of the ecosystem catches up.
The bigger picture? A scikit-learn that scales seamlessly from laptops to multi-GPU servers, using the same API that millions of data scientists already know. Version 1.8 brings that vision a lot closer to reality.
Wrapping Up
Scikit-learn 1.8 is a genuinely important release. GPU acceleration via the Array API means competitive performance for linear algebra-heavy workloads without adding heavy dependencies. Free-threaded Python support sets up the library for a more efficient parallelism model. Temperature scaling, ClassicalMDS, and the performance improvements to linear models and decision trees round things out nicely.
The upgrade path is straightforward, backward compatibility is solid, and the performance gains are real. If you're running scikit-learn anywhere — production or notebooks — this is an update worth making sooner rather than later.