Time Series Analysis and Forecasting in Python: From Pandas to Foundation Models

A practical guide to time series forecasting in Python — from pandas fundamentals and classical ARIMA models to deep learning with NeuralForecast and zero-shot predictions with Amazon Chronos-2 foundation models.

Introduction: Why Time Series Forecasting Deserves Its Own Playbook

If you've spent any time in data science, you know that time series data is everywhere — stock prices, sensor readings, web traffic, retail sales, energy consumption, weather patterns. Roughly 70% of enterprise data has a temporal component. Yet time series analysis remains one of the most misunderstood areas in the Python data ecosystem. People routinely apply standard machine learning techniques without accounting for temporal dependencies, and the results are predictably bad.

Here's the fundamental issue: time series data violates the core assumption that most machine learning algorithms depend on — that observations are independent and identically distributed (i.i.d.). In time series, the value at time t is typically correlated with values at t-1, t-2, and so on. Ignore that, and you'll get a model that looks great in cross-validation but falls apart the moment you deploy it.

I've seen this happen more times than I can count.

In this guide, we'll walk through the entire time series analysis and forecasting pipeline in Python — from basic manipulation in pandas, through classical statistical models, all the way to cutting-edge foundation models like Amazon Chronos-2. By the end, you'll have a clear framework for choosing the right approach for your specific problem and the practical code to actually implement it.

Time Series Fundamentals with Pandas

Before we get into forecasting models, let's make sure our data foundation is solid. Pandas remains the go-to library for time series manipulation in Python, and with pandas 3.0, the tooling has only gotten better.

Setting Up Your Time Series Data

The single most important step when working with time series in pandas? Getting your datetime handling right from the start. Seriously — so many downstream headaches trace back to sloppy date parsing. Here's how to do it properly:

import pandas as pd
import numpy as np

# Load data and immediately parse dates
df = pd.read_csv("sales_data.csv", parse_dates=["date"])

# Always convert to datetime explicitly if needed
df["date"] = pd.to_datetime(df["date"], format="%Y-%m-%d")

# Set datetime as index — this unlocks time series functionality
df = df.set_index("date")
df = df.sort_index()

print(df.index)
# DatetimeIndex(['2023-01-01', '2023-01-02', ...], dtype='datetime64[ns]', name='date', freq=None)

Setting the datetime column as your index isn't just a convenience — it unlocks pandas' time series–specific indexing, resampling, and windowing operations. Without it, you're leaving performance and functionality on the table.

Resampling and Frequency Conversion

Real-world data rarely arrives at the frequency you need. The resample() method lets you change frequency while applying appropriate aggregations:

# Downsample daily data to monthly
monthly = df["revenue"].resample("ME").agg({
    "total": "sum",
    "average": "mean",
    "peak": "max"
})

# Upsample monthly to daily with forward fill
daily = monthly.resample("D").ffill()

# Business day frequency — skips weekends
business_daily = df["revenue"].resample("B").mean()

Rolling Windows and Exponential Smoothing

Rolling statistics are your first line of defense for understanding trends and seasonality:

# Rolling mean and standard deviation
df["rolling_mean_7d"] = df["revenue"].rolling(window=7).mean()
df["rolling_std_7d"] = df["revenue"].rolling(window=7).std()

# Exponentially weighted moving average — more recent values matter more
df["ewma"] = df["revenue"].ewm(span=7, adjust=False).mean()

# Rolling with a time-based window (instead of count-based)
df["rolling_30d"] = df["revenue"].rolling("30D").mean()

The exponentially weighted moving average (EWMA) is particularly useful because it gives more weight to recent observations. That makes it way more responsive to recent changes than a simple moving average — which, honestly, tends to lag behind when things shift quickly.

Decomposition: Trend, Seasonality, and Residuals

Before fitting any model, you should decompose your series to understand its components. This step is one that a lot of people skip, and it comes back to bite them later:

from statsmodels.tsa.seasonal import seasonal_decompose

# Additive decomposition (use multiplicative for proportional seasonality)
result = seasonal_decompose(
    df["revenue"],
    model="additive",
    period=365  # annual seasonality for daily data
)

# Access individual components
trend = result.trend
seasonal = result.seasonal
residual = result.resid

# Plot all components
result.plot()
plt.tight_layout()
plt.show()

If the seasonal fluctuations grow proportionally with the level of the series, use model="multiplicative" instead. This distinction matters more than people realize — choosing the wrong decomposition type leads to misspecified models downstream.

Stationarity Testing: The Prerequisite Most People Skip

Most classical time series models require stationarity — meaning the statistical properties of the series (mean, variance, autocorrelation) don't change over time. In practice, almost no real-world series is stationary out of the box, so you'll need to transform it first.

The Augmented Dickey-Fuller Test

from statsmodels.tsa.stattools import adfuller

def test_stationarity(series, name="Series"):
    result = adfuller(series.dropna(), autolag="AIC")
    print(f"ADF Statistic for {name}: {result[0]:.4f}")
    print(f"p-value: {result[1]:.4f}")
    print("Critical Values:")
    for key, value in result[4].items():
        print(f"  {key}: {value:.4f}")

    if result[1] < 0.05:
        print(f"→ {name} IS stationary (reject null hypothesis)")
    else:
        print(f"→ {name} is NOT stationary (fail to reject null)")
    return result[1] < 0.05

# Test the raw series
is_stationary = test_stationarity(df["revenue"], "Revenue")

# If not stationary, difference and test again
if not is_stationary:
    df["revenue_diff"] = df["revenue"].diff()
    test_stationarity(df["revenue_diff"], "Revenue (1st difference)")

The rule of thumb: if the p-value is below 0.05, the series is stationary. If it's not, apply differencing — first-order differencing handles trends, and seasonal differencing handles seasonal non-stationarity. Pretty straightforward once you get the hang of it.

ACF and PACF: Reading the Signatures

The autocorrelation function (ACF) and partial autocorrelation function (PACF) plots tell you about the memory structure of your series. Think of them as fingerprints that directly inform model selection:

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

plot_acf(df["revenue"].dropna(), lags=40, ax=axes[0])
axes[0].set_title("Autocorrelation Function (ACF)")

plot_pacf(df["revenue"].dropna(), lags=40, method="ywm", ax=axes[1])
axes[1].set_title("Partial Autocorrelation Function (PACF)")

plt.tight_layout()
plt.show()

Here's the cheat sheet (bookmark this one): if the PACF cuts off after lag p while the ACF decays gradually, you have an AR(p) process. If the ACF cuts off after lag q while the PACF decays, you have an MA(q) process. If both decay gradually, you need an ARMA model.

Classical Forecasting: ARIMA and Beyond with StatsForecast

ARIMA (AutoRegressive Integrated Moving Average) has been the workhorse of time series forecasting for decades, and for good reason — it's interpretable, well-understood, and surprisingly competitive on many datasets. But let's be honest: manually tuning ARIMA parameters is tedious. That's where Nixtla's StatsForecast library shines.

AutoARIMA with StatsForecast

StatsForecast provides blazing-fast implementations of classical models, optimized with numba for performance that can handle millions of series:

from statsforecast import StatsForecast
from statsforecast.models import (
    AutoARIMA,
    AutoETS,
    AutoCES,
    SeasonalNaive
)

# StatsForecast expects a specific DataFrame format:
# columns: unique_id, ds (timestamp), y (target)
train_df = df.reset_index().rename(columns={
    "date": "ds",
    "revenue": "y"
})
train_df["unique_id"] = "store_001"

# Define models to compare
models = [
    AutoARIMA(season_length=7),    # Weekly seasonality
    AutoETS(season_length=7),       # Exponential smoothing
    AutoCES(season_length=7),       # Complex exponential smoothing
    SeasonalNaive(season_length=7)  # Baseline
]

# Fit and forecast
sf = StatsForecast(
    models=models,
    freq="D",    # Daily frequency
    n_jobs=-1    # Use all CPU cores
)

sf.fit(train_df)
forecasts = sf.predict(h=30, level=[90])  # 30-day forecast with 90% CI

print(forecasts.head())

What makes StatsForecast special is speed. It can fit AutoARIMA on a single series in milliseconds, which means you can realistically run it across thousands of products, stores, or sensors without waiting hours. That's a game-changer for anyone working at scale.

Cross-Validation the Right Way

StatsForecast also includes temporal cross-validation built in, which avoids the cardinal sin of time series evaluation — training on future data:

# Temporal cross-validation with expanding window
cv_results = sf.cross_validation(
    df=train_df,
    h=7,           # 7-day forecast horizon
    step_size=7,   # Slide the window by 7 days each fold
    n_windows=5    # 5 evaluation windows
)

# Evaluate accuracy
from utilsforecast.losses import mae, mse, mape

evaluation = cv_results.groupby("unique_id").agg({
    "AutoARIMA": [mae, mse],
    "AutoETS": [mae, mse],
    "y": "count"
})

Machine Learning Approaches with Scikit-Learn

Classical statistical models work well for univariate series with clear seasonal patterns. But what if you have exogenous features — weather data, economic indicators, marketing spend? That's where ML-based forecasting enters the picture.

Feature Engineering for Time Series

The key insight for ML-based time series forecasting is reduction — transforming the forecasting problem into a supervised learning problem using lagged features. Once you see it this way, the whole scikit-learn toolkit opens up:

def create_time_features(df, target_col, lags=None, window_sizes=None):
    """Create lag features and rolling statistics for ML-based forecasting."""
    features = df.copy()

    # Calendar features
    features["day_of_week"] = features.index.dayofweek
    features["month"] = features.index.month
    features["quarter"] = features.index.quarter
    features["is_weekend"] = (features.index.dayofweek >= 5).astype(int)
    features["day_of_year"] = features.index.dayofyear

    # Lag features
    if lags is None:
        lags = [1, 7, 14, 28]
    for lag in lags:
        features[f"lag_{lag}"] = features[target_col].shift(lag)

    # Rolling window features
    if window_sizes is None:
        window_sizes = [7, 14, 30]
    for window in window_sizes:
        features[f"rolling_mean_{window}"] = (
            features[target_col].shift(1).rolling(window).mean()
        )
        features[f"rolling_std_{window}"] = (
            features[target_col].shift(1).rolling(window).std()
        )

    # Drop rows with NaN from lagging
    features = features.dropna()
    return features

df_features = create_time_features(df, "revenue")

Notice the shift(1) before calculating rolling statistics — this prevents data leakage by ensuring we only use information available at prediction time. Miss this detail and you'll get unrealistically good backtest results (and a nasty surprise in production).

Proper Train-Test Splitting with TimeSeriesSplit

Never use standard k-fold cross-validation for time series. I cannot stress this enough. Scikit-learn's TimeSeriesSplit preserves temporal order:

from sklearn.model_selection import TimeSeriesSplit
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np

# Define features and target
feature_cols = [c for c in df_features.columns if c != "revenue"]
X = df_features[feature_cols]
y = df_features["revenue"]

# Time-aware cross-validation
tscv = TimeSeriesSplit(n_splits=5, gap=7)  # 7-day gap prevents leakage

scores = []
for fold, (train_idx, test_idx) in enumerate(tscv.split(X)):
    X_train, X_test = X.iloc[train_idx], X.iloc[test_idx]
    y_train, y_test = y.iloc[train_idx], y.iloc[test_idx]

    model = HistGradientBoostingRegressor(
        max_iter=500,
        learning_rate=0.05,
        max_depth=6,
        random_state=42
    )
    model.fit(X_train, y_train)
    preds = model.predict(X_test)

    mae = mean_absolute_error(y_test, preds)
    rmse = np.sqrt(mean_squared_error(y_test, preds))
    scores.append({"fold": fold, "mae": mae, "rmse": rmse})
    print(f"Fold {fold}: MAE={mae:.2f}, RMSE={rmse:.2f}")

print(f"\nMean MAE: {np.mean([s['mae'] for s in scores]):.2f}")

The gap parameter is crucial here — it creates a buffer between training and test sets, simulating the real-world scenario where you don't have data from immediately before your forecast window.

Multi-Step Forecasting

Predicting more than one step ahead requires special handling. The two main approaches are recursive (feed predictions back as inputs) and direct (train a separate model for each horizon):

from sklearn.multioutput import RegressorChain

# Direct multi-step: one model per horizon
# Create targets for each horizon
horizons = [1, 7, 14, 28]
for h in horizons:
    df_features[f"target_h{h}"] = df_features["revenue"].shift(-h)

df_multi = df_features.dropna()

# Train separate models using RegressorChain to capture horizon dependencies
target_cols = [f"target_h{h}" for h in horizons]
X_multi = df_multi[feature_cols]
y_multi = df_multi[target_cols]

chain = RegressorChain(
    HistGradientBoostingRegressor(max_iter=300, random_state=42),
    order=list(range(len(horizons)))
)
chain.fit(X_multi.iloc[:split_idx], y_multi.iloc[:split_idx])
multi_preds = chain.predict(X_multi.iloc[split_idx:])

Deep Learning with NeuralForecast

When you have large datasets and complex patterns that classical models can't capture, deep learning becomes a viable option. Nixtla's NeuralForecast library provides a scikit-learn-like interface to over 30 state-of-the-art neural architectures. So, let's dive into the heavy hitters.

N-BEATS and N-HiTS: Neural Basis Expansion

from neuralforecast import NeuralForecast
from neuralforecast.models import NBEATS, NHITS, PatchTST
from neuralforecast.losses.pytorch import MAE, DistributionLoss

# Prepare data in Nixtla format
train_df = df.reset_index()
train_df.columns = ["ds", "y"]
train_df["unique_id"] = "store_001"

# Define neural models
models = [
    NBEATS(
        input_size=30,           # Look back 30 time steps
        h=7,                     # Forecast 7 steps ahead
        max_steps=500,
        learning_rate=1e-3,
        scaler_type="robust",
        batch_size=32,
    ),
    NHITS(
        input_size=30,
        h=7,
        max_steps=500,
        learning_rate=1e-3,
        scaler_type="robust",
        n_pool_kernel_size=[2, 2, 1],  # Multi-rate signal sampling
    ),
    PatchTST(
        input_size=60,
        h=7,
        max_steps=500,
        learning_rate=1e-4,
        patch_len=7,             # Weekly patches
        scaler_type="robust",
    ),
]

# Train and predict — same interface as StatsForecast
nf = NeuralForecast(models=models, freq="D")
nf.fit(df=train_df)
neural_forecasts = nf.predict()

print(neural_forecasts.head())

N-BEATS (Neural Basis Expansion Analysis for Time Series) decomposes the forecast into interpretable basis functions for trend and seasonality. N-HiTS extends this with hierarchical interpolation, making it more efficient for long-horizon forecasting. And PatchTST? It applies the transformer architecture to time series by dividing the input into patches — similar to how Vision Transformers handle images. Pretty clever approach, honestly.

Temporal Cross-Validation for Neural Models

# NeuralForecast also supports temporal cross-validation
cv_neural = nf.cross_validation(
    df=train_df,
    n_windows=3,
    step_size=7,
)

# Compare models
from utilsforecast.evaluation import evaluate
from utilsforecast.losses import mae, rmse

evaluation = evaluate(
    cv_neural,
    metrics=[mae, rmse],
    models=["NBEATS", "NHITS", "PatchTST"],
)
print(evaluation)

Foundation Models: Zero-Shot Forecasting with Amazon Chronos-2

The newest paradigm shift in time series forecasting is foundation models — large pretrained models that can forecast any time series without task-specific training. Think of it as the GPT moment for time series.

Amazon's Chronos-2, released in October 2025, is the current state of the art. It's a 120M-parameter encoder-only model that supports univariate, multivariate, and covariate-informed forecasting out of the box. With over 600 million downloads from Hugging Face, it's already been battle-tested in production at scale.

Zero-Shot Forecasting with Chronos-2

import torch
import pandas as pd
import numpy as np
from chronos import ChronosPipeline

# Load the pretrained model
pipeline = ChronosPipeline.from_pretrained(
    "amazon/chronos-t5-base",
    device_map="cpu",        # Use "cuda" for GPU
    torch_dtype=torch.float32
)

# Prepare context — just your historical data as a tensor
context = torch.tensor(df["revenue"].values, dtype=torch.float32)

# Generate probabilistic forecasts
forecast = pipeline.predict(
    context,
    prediction_length=30,    # 30-day forecast
    num_samples=100          # 100 sample paths for uncertainty
)

# Extract quantiles
median_forecast = np.median(forecast.numpy(), axis=1)
lower_bound = np.percentile(forecast.numpy(), 10, axis=1)
upper_bound = np.percentile(forecast.numpy(), 90, axis=1)

# Create forecast DataFrame
future_dates = pd.date_range(
    start=df.index[-1] + pd.Timedelta(days=1),
    periods=30,
    freq="D"
)
forecast_df = pd.DataFrame({
    "date": future_dates,
    "forecast": median_forecast,
    "lower_90": lower_bound,
    "upper_90": upper_bound
})

The remarkable thing about Chronos-2 is that you don't need to tell it anything about your data — no seasonality period, no differencing order, no feature engineering. It figures out the patterns from the raw numbers alone. On benchmarks, it achieves a win rate of over 90% against its predecessor Chronos-Bolt and consistently outperforms AutoARIMA baselines, often reducing MAE by 15–25%. Those are numbers that made me sit up and pay attention.

Chronos-2 with Covariates

One of the biggest upgrades in Chronos-2 (the chronos-2 model) is support for exogenous variables, which was a major limitation of the original:

from chronos import Chronos2Pipeline

# The full Chronos-2 pipeline supports covariates
pipeline = Chronos2Pipeline.from_pretrained(
    "amazon/chronos-2",
    device_map="cuda"
)

# Prepare a DataFrame with target + covariates
context_df = pd.DataFrame({
    "id": "store_001",
    "timestamp": df.index,
    "target": df["revenue"].values,
    "temperature": df["temperature"].values,
    "is_holiday": df["is_holiday"].values,
})

# Generate covariate-informed predictions
pred_df = pipeline.predict_df(
    context_df,
    prediction_length=14,
    quantile_levels=[0.1, 0.5, 0.9],
    id_column="id",
    timestamp_column="timestamp",
    target="target",
)

When to Use What: A Decision Framework

With so many options available, choosing the right approach can feel overwhelming. Here's a practical decision framework I've refined over the years, based on data characteristics and real-world constraints:

Use Classical Statistical Models (ARIMA, ETS) When:

  • You have a single or small number of univariate series
  • Interpretability is critical (you need to explain the model to stakeholders)
  • The series has clear trend and seasonal patterns
  • You have limited compute resources
  • You need a quick, reliable baseline

Use ML-Based Approaches (Scikit-Learn) When:

  • You have rich exogenous features that influence the target
  • The relationship between features and target is non-linear
  • You already have a feature engineering pipeline in place
  • You need point predictions rather than probabilistic forecasts
  • Your team is already comfortable with scikit-learn

Use Deep Learning (NeuralForecast) When:

  • You have large training datasets (thousands of observations per series)
  • You have multiple related series that can share learned patterns
  • The patterns are complex and non-linear
  • You need probabilistic forecasts and uncertainty quantification
  • You have GPU compute available

Use Foundation Models (Chronos-2) When:

  • You want strong performance with zero or minimal training
  • You have limited historical data (cold-start scenarios)
  • You need to forecast across many diverse series quickly
  • You want a strong baseline before investing in custom models
  • Speed of development matters more than squeezing out the last percentage of accuracy

Putting It All Together: A Model Comparison Pipeline

In practice, you should always compare multiple approaches on your specific data. Here's a pipeline that runs classical, ML, and neural models side by side:

import pandas as pd
import numpy as np
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA, AutoETS, SeasonalNaive
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import mean_absolute_error

# ========================================
# 1. Prepare data
# ========================================
df = pd.read_csv("sales.csv", parse_dates=["date"], index_col="date")
cutoff = df.index[-30]  # Hold out last 30 days
train = df[df.index < cutoff]
test = df[df.index >= cutoff]

# ========================================
# 2. Statistical models via StatsForecast
# ========================================
sf_df = train.reset_index().rename(columns={"date": "ds", "revenue": "y"})
sf_df["unique_id"] = "1"

sf = StatsForecast(
    models=[
        AutoARIMA(season_length=7),
        AutoETS(season_length=7),
        SeasonalNaive(season_length=7),
    ],
    freq="D",
)
sf.fit(sf_df)
sf_preds = sf.predict(h=30)

# ========================================
# 3. ML model via scikit-learn
# ========================================
df_feat = create_time_features(df, "revenue")
train_feat = df_feat[df_feat.index < cutoff]
test_feat = df_feat[df_feat.index >= cutoff]

feature_cols = [c for c in df_feat.columns if c != "revenue"]
model = HistGradientBoostingRegressor(max_iter=500, random_state=42)
model.fit(train_feat[feature_cols], train_feat["revenue"])
ml_preds = model.predict(test_feat[feature_cols])

# ========================================
# 4. Compare results
# ========================================
results = {
    "SeasonalNaive": mean_absolute_error(
        test["revenue"], sf_preds["SeasonalNaive"].values
    ),
    "AutoARIMA": mean_absolute_error(
        test["revenue"], sf_preds["AutoARIMA"].values
    ),
    "AutoETS": mean_absolute_error(
        test["revenue"], sf_preds["AutoETS"].values
    ),
    "HGBR": mean_absolute_error(
        test["revenue"].iloc[:len(ml_preds)], ml_preds
    ),
}

for name, mae_val in sorted(results.items(), key=lambda x: x[1]):
    print(f"{name:<20} MAE: {mae_val:.2f}")

Always include a naive baseline (like SeasonalNaive) — if your sophisticated model can't beat a simple seasonal repeat, something is fundamentally wrong with your pipeline. Don't skip this step.

Common Pitfalls and How to Avoid Them

After years of working with time series data, these are the mistakes I see over and over again:

1. Data Leakage in Feature Engineering

This is the most insidious bug in time series ML. If you calculate rolling averages or other aggregate features using data from the future, your model will look amazing in backtesting and fail spectacularly in production. Always shift(1) before computing any rolling statistic, and use the gap parameter in TimeSeriesSplit.

2. Ignoring Non-Stationarity

Fitting ARIMA to a non-stationary series without differencing will produce nonsensical confidence intervals and forecasts that diverge wildly. Always run the ADF test first. If your series has a unit root, difference it until it's stationary.

3. Overfitting to Seasonal Patterns

If you train a model on three years of data with annual seasonality, it's only seen the Christmas spike three times. Three times! That's not enough to reliably learn the pattern. Consider whether your training data contains enough full cycles of the dominant seasonality.

4. Using MAPE with Near-Zero Values

Mean Absolute Percentage Error (MAPE) explodes when actual values are near zero, which is common in intermittent demand series. Use scaled errors like MASE (Mean Absolute Scaled Error) or WAPE (Weighted Absolute Percentage Error) instead. Your future self will thank you.

5. Treating Multiple Series Independently

If you have 1,000 retail stores, don't train 1,000 separate models. Global models (one model trained on all series) often outperform local models because they can learn cross-series patterns. Both NeuralForecast and StatsForecast support this natively through the unique_id column.

6. Not Accounting for Calendar Effects

Holidays, weekends, paydays, and special events create systematic deviations that most models won't capture unless you engineer the features explicitly. Create binary indicator variables for known events and pass them as exogenous regressors.

Performance Optimization Tips

When scaling time series forecasting to production, performance matters. Here are some practical tips that have saved me hours of compute time:

# Use Polars for faster data preprocessing
import polars as pl

df_pl = pl.read_csv("large_sales.csv", try_parse_dates=True)

# Create lag features efficiently with Polars
df_pl = df_pl.with_columns([
    pl.col("revenue").shift(i).over("store_id").alias(f"lag_{i}")
    for i in [1, 7, 14, 28]
]).with_columns([
    pl.col("revenue").shift(1).rolling_mean(w).over("store_id").alias(f"rmean_{w}")
    for w in [7, 14, 30]
])

# Convert back to pandas for modeling
df_pd = df_pl.to_pandas()

For StatsForecast with many series, leverage the built-in parallelism with n_jobs=-1 and consider using the Ray backend for distributed computation across multiple machines. The speedup is substantial — we're talking orders of magnitude for large-scale forecasting workloads.

Visualization: Communicating Your Forecasts

A forecast without uncertainty bounds is just a guess. Always visualize confidence intervals:

import matplotlib.pyplot as plt
import matplotlib.dates as mdates

fig, ax = plt.subplots(figsize=(14, 6))

# Historical data
ax.plot(train.index[-90:], train["revenue"].iloc[-90:],
        color="#2c3e50", label="Historical", linewidth=1.5)

# Forecast
ax.plot(forecast_df["date"], forecast_df["forecast"],
        color="#e74c3c", label="Forecast", linewidth=2)

# Confidence interval
ax.fill_between(
    forecast_df["date"],
    forecast_df["lower_90"],
    forecast_df["upper_90"],
    alpha=0.2, color="#e74c3c", label="90% CI"
)

# Vertical line at forecast start
ax.axvline(x=train.index[-1], color="gray",
           linestyle="--", alpha=0.5, label="Forecast start")

ax.set_title("Revenue Forecast with 90% Confidence Interval", fontsize=14)
ax.set_xlabel("Date")
ax.set_ylabel("Revenue ($)")
ax.legend(loc="upper left")
ax.xaxis.set_major_formatter(mdates.DateFormatter("%b %Y"))
plt.tight_layout()
plt.show()

Conclusion: Start Simple, Iterate Fast

Time series forecasting in Python has never been more accessible or more powerful. The ecosystem now spans everything from battle-tested statistical methods to cutting-edge foundation models, all with consistent, Pythonic interfaces.

Here's the workflow I'd recommend for any new forecasting project:

  1. Explore your data with pandas — decompose, test for stationarity, and look at ACF/PACF plots
  2. Establish a baseline with SeasonalNaive and AutoARIMA via StatsForecast
  3. Try a zero-shot foundation model like Chronos-2 to see how it compares without any training
  4. Engineer features and try ML models if you have rich exogenous data
  5. Move to deep learning only if you have the data volume and the simpler approaches leave significant room for improvement
  6. Always validate temporally — use TimeSeriesSplit or StatsForecast's built-in cross-validation, never standard k-fold

The field is moving fast — foundation models are getting better every quarter, and the Nixtla ecosystem continues to unify the tooling. But the fundamentals haven't changed: understand your data, validate honestly, and start with the simplest model that works. Everything else is optimization.

About the Author Editorial Team

Our team of expert writers and editors.