ARIMAX Time Series Forecasting Tool

New

Forecast marketing metrics using ARIMAX (AutoRegressive Integrated Moving Average with eXogenous variables). Upload time series data, select external predictors like ad spend or seasonality indicators, and generate forecasts with confidence intervals.

πŸ‘¨β€πŸ« Professor Mode: Guided Learning Experience

New to ARIMAX? Enable Professor Mode for step-by-step guidance through time series forecasting with external predictors!

TEST OVERVIEW & EQUATIONS

ARIMAX extends the classic ARIMA model by incorporating external (exogenous) predictor variables. This is especially useful in marketing where outcomes like sales or conversions are influenced by controllable inputs such as advertising spend, promotions, or seasonal campaigns.

ARIMAX Model: $$ (1 - \phi_1 B - \cdots - \phi_p B^p)(1 - B)^d Y_t = (1 + \theta_1 B + \cdots + \theta_q B^q)\varepsilon_t + \sum_{j=1}^{k} \beta_j X_{j,t} $$

where \(Y_t\) is the outcome at time \(t\), \(B\) is the backshift operator, \(d\) is the differencing order, \(\phi\) are AR coefficients, \(\theta\) are MA coefficients, and \(\beta_j\) are coefficients for exogenous predictors \(X_j\).

Key Concepts
  • AR (p): Autoregressive terms capture how past values of the series influence current values.
  • I (d): Integration (differencing) removes trends to achieve stationarity.
  • MA (q): Moving average terms model the influence of past forecast errors.
  • X (exogenous): External predictors like ad spend, promotions, or economic indicators.
When to Use ARIMAX

Use ARIMAX when you have a time series outcome (sales, traffic, conversions) that you believe is influenced by external factors you can measure and potentially control. Common marketing applications include:

  • Forecasting sales with advertising spend as an exogenous variable
  • Modeling website traffic with marketing campaign indicators
  • Predicting revenue with economic indicators or seasonal dummies

MARKETING SCENARIOS

Use presets to auto-load realistic marketing time series data. The download button exposes the exact dataset used so you can tweak it in Excel before re-uploading.

INPUTS & SETTINGS

Upload Time Series Data

Upload a CSV with at least a date/time column and a numeric outcome column. Optional columns for exogenous predictors (ad spend, promotions, etc.) can be included.

Drag & drop time series file

CSV with columns: date, outcome, [optional exogenous variables]

No file uploaded.

Column Selection

Select which columns represent the time period, outcome (Y), and any exogenous predictors (X).

Date/Time Format Help

Supported date formats:

  • YYYY-MM-DD (e.g., 2024-01-15)
  • YYYY-MM (e.g., 2024-01)
  • MM/DD/YYYY (e.g., 01/15/2024)
  • DD/MM/YYYY (e.g., 15/01/2024)

Not supported:

  • Spelled-out months (e.g., "January 2024")
  • Relative dates (e.g., "Last week")
  • Inconsistent formats within the same column
πŸ’‘ Surefire Method (Recommended)

If you're having trouble with date parsing, use simple sequential labels instead:

  • 1, 2, 3, 4, ... (numeric sequence)
  • t1, t2, t3, t4, ... (labeled sequence)
  • Period1, Period2, Period3, ...
  • Week1, Week2, Week3, ... or Month1, Month2, ...

The model only needs to know the order of your observations — actual dates are just for labeling the output. Sequential numbers work perfectly!

Upload data to select exogenous predictors.

Model Specification

Set the ARIMA order (p, d, q). Use the diagnostics panel to help choose appropriate values.

Enable this if your data shows repeating patterns at regular intervals (e.g., weekly cycles, monthly patterns).

6 periods

Update Forecast re-generates predictions using a new forecast horizon or confidence level without re-fitting the model.

Analysis Settings

Used for coefficient p-values and significance stars.

VISUAL OUTPUT

Time Series with Forecasts

Upload data and fit the model to see the time series plot with forecasts and confidence intervals.

Interpretation Aid

The solid blue line shows historical values, while the red dashed line shows forecasts. The shaded area represents the confidence interval — wider bands indicate more uncertainty in future predictions. Forecasts depend on the assumed values for exogenous predictors above.

πŸ”„ SARIMAX forecasts: If you enabled seasonality, the forecast should show the expected seasonal pattern (ups and downs) based on where you are in the cycle. If the forecast looks "flat" despite obvious historical patterns, check: (1) Is the pattern truly seasonal (fixed calendar cycles) or event-driven? (2) Is your seasonal period (s) correct? (3) Do you have enough data (at least 2 full cycles)?

Residuals Diagnostics

Residuals Over Time

Residuals should appear randomly scattered around zero with no obvious patterns.

ACF of Residuals

PACF of Residuals

Interpretation Aid: Understanding Residual Diagnostics
πŸ“Š Residuals Over Time

Good model fit produces residuals that look like white noise: randomly distributed around zero with constant variance. Look for:

  • No trends: Residuals shouldn't drift up or down over time
  • Constant spread: The "band" of residuals should be roughly the same width throughout
  • No patterns: Cycles or repeating structures suggest missed seasonality
πŸ“ˆ ACF & PACF Charts

ACF (Autocorrelation): Measures correlation with past values at each lag.

PACF (Partial Autocorrelation): Direct correlation at each lag, removing intermediate effects.

  • Bars outside red lines: Statistically significant autocorrelation (potential problem)
  • All bars inside red lines: Residuals are white noise (good!)
⚠️ Problem Patterns & Solutions
PatternMeaningTry This
Significant ACF spikes at lags 1, 2, 3... MA terms needed Increase q (MA order)
Significant PACF spikes at lags 1, 2, 3... AR terms needed Increase p (AR order)
Slow decay in ACF Series not stationary Increase d (differencing)
Spikes at seasonal lags (12, 24, 52...) Seasonal pattern not captured Enable "Include Seasonality" and set s to the lag with spikes

SUMMARY STATISTICS

Descriptive Statistics

Outcome Variable

Statistic Value
Provide data to see summary statistics.

Exogenous Predictors

Variable Mean Std. Dev. Min Max
Provide data to see predictor statistics.

MODEL RESULTS

Model specification:
AIC:
BIC:
RMSE:
MAE:
Interpretation Aid

AIC/BIC: Lower values indicate better model fit relative to complexity. Use these to compare different (p,d,q) or (P,D,Q,s) specifications. Try fitting ARIMA vs SARIMAX and compare!

RMSE: Root Mean Square Error — average magnitude of prediction errors in the same units as the outcome.

MAE: Mean Absolute Error — average absolute deviation, less sensitive to outliers than RMSE.

Tip: If adding seasonality increases AIC/BIC, the seasonal pattern may not be strong enough to justify the extra complexity.

APA-Style Report

Fit the model to see the APA-style statistical report.

Managerial Interpretation

Business-focused interpretation will appear here after fitting the model.

Coefficient Estimates

Parameter Estimate Std. Error p-value 95% CI
Fit the model to view coefficient estimates.
Interpretation Aid: Understanding Coefficients
πŸ“Š Types of Coefficients
AR Coefficients (ar.L1, ar.L2, ...)

What they mean: How much yesterday's (or earlier) values influence today's value, after accounting for the trend.

  • Positive (e.g., 0.7): Strong persistence — if sales were high last period, they'll likely be high this period
  • Negative (e.g., -0.3): Mean reversion — high values tend to be followed by lower values
  • Close to 0: Past values don't strongly predict current values
  • Close to 1: Random walk behavior — today β‰ˆ yesterday + noise

Example: ar.L1 = 0.65 means "about 65% of last period's deviation from the mean carries over to this period."

MA Coefficients (ma.L1, ma.L2, ...)

What they mean: How much past forecast errors affect current values. These capture short-term adjustments.

  • Positive (e.g., 0.5): If we under-predicted last period, we adjust upward this period
  • Negative (e.g., -0.4): If we under-predicted last period, we actually adjust downward (unusual)

Example: ma.L1 = 0.45 means "if our model was off by $100 last period, add about $45 to this period's prediction."

Exogenous Coefficients (your predictor names)

What they mean: The direct effect of each external variable on your outcome, holding time-series dynamics constant.

  • Interpreted like standard regression coefficients
  • A one-unit increase in the predictor changes the outcome by the coefficient value

Example: ad_spend = 2.3 means "each additional $1 spent on advertising is associated with $2.30 more in sales, after accounting for trends and seasonality."

SigmaΒ² (sigma2)

What it means: The estimated variance of the random error term. Larger values = more unexplained variability.

πŸ”„ Seasonal Coefficients (ar.S.L*, ma.S.L*) β€” SARIMAX Only

What they mean: These appear when you enable seasonality. They capture how values from the same season last cycle influence the current value.

  • ar.S.L52: How this week's value relates to the same week last year (for s=52)
  • ma.S.L12: Adjustment based on forecast errors from the same month last year (for s=12)

Example: ar.S.L52 = 0.8 means "80% of the deviation we saw in the same week last year carries over to this week."

If seasonal coefficients are non-significant, the seasonal pattern may be weak or you may have chosen the wrong period (s).

πŸ“ˆ Statistical Significance
  • p-value < 0.05: Coefficient is statistically significant (highlighted in green)
  • p-value > 0.05: Cannot rule out that the true effect is zero
  • Confidence Interval: If it doesn't include 0, the coefficient is significant

Non-significant AR or MA terms might indicate you've over-specified the model. Try reducing p or q (or P/Q for seasonal terms).

⚠️ Common Issues
  • Very large standard errors: Possible multicollinearity or insufficient data
  • AR coefficient > 1: Model may be unstable; try increasing d (differencing)
  • All exogenous coefficients non-significant: External variables may not help; try a simpler ARIMA model
  • Seasonal coefficients non-significant: You may have the wrong seasonal period (s), or the pattern isn't truly seasonal
  • Model takes very long to fit: Large seasonal periods (s=52) with high P/D/Q can be slow β€” try reducing to (1,0,1)

Forecasts

Period Forecast Lower (95%) Upper (95%)
Fit the model to view forecasts. Use the slider to select 0-10 forecast periods.

DIAGNOSTICS & ASSUMPTIONS

Stationarity & Model Diagnostics

Click "Check Stationarity" or fit the model to see diagnostic tests.

Augmented Dickey-Fuller Test (ADF)

Tests whether the series has a unit root (non-stationary).

What is the ADF Test?

The ADF test checks if your time series is stationary (statistical properties don't change over time). ARIMA models require stationarity.

  • p-value < 0.05: βœ… Series is stationary. Good to proceed.
  • p-value β‰₯ 0.05: ⚠️ Series is non-stationary. Differencing (d > 0) is needed.

What makes a series non-stationary?

  • Trends (consistently going up or down)
  • Changing variance (volatility increases over time)
  • Seasonal patterns with changing amplitude

The fix: Differencing (setting d=1 or d=2) removes trends. The model then works with changes rather than levels.

Ljung-Box Test (Residual Autocorrelation)

Tests whether residuals exhibit significant autocorrelation.

What is the Ljung-Box Test?

This test checks if there's leftover pattern in your residuals (the differences between actual and fitted values).

  • p-value > 0.05: βœ… No significant autocorrelation. Residuals look like random noise. Model is adequate.
  • p-value ≀ 0.05: ⚠️ Significant autocorrelation detected. The model is missing some structure.

If you see significant autocorrelation:

  • Try increasing p (AR order) if PACF shows spikes
  • Try increasing q (MA order) if ACF shows spikes
  • Check for seasonal patterns you haven't accounted for
  • Consider adding more exogenous variables

Model Selection Guidance

Choosing p (AR order)

Look at the PACF plot of your original series:

  • Count significant spikes (bars outside red lines)
  • The number of spikes before cutoff = suggested p
  • Typical values: 0, 1, or 2

PACF shows 2 significant spikes β†’ try p=2

Choosing d (Differencing)

Based on the ADF test:

  • ADF p-value < 0.05 β†’ d=0 (already stationary)
  • ADF p-value β‰₯ 0.05 β†’ try d=1
  • Still non-stationary with d=1 β†’ try d=2
  • Rarely need d > 2

Upward trend in data β†’ likely need d=1

Choosing q (MA order)

Look at the ACF plot of your original series:

  • Count significant spikes after lag 0
  • Spikes that cut off sharply suggest MA terms
  • Typical values: 0, 1, or 2

ACF shows 1 significant spike at lag 1 β†’ try q=1

Common Starting Points
ModelWhen to Use
ARIMA(1,1,1)Good default for trending data with some persistence
ARIMA(0,1,1)Random walk with smoothing (simple exponential smoothing)
ARIMA(1,0,0)Stationary data with persistence (AR(1) process)
ARIMA(0,1,0)Pure random walk (tomorrow = today + noise)
ARIMA(2,1,2)More complex dynamics; try if simpler models fail diagnostics
πŸ’‘ Model Comparison Tip

Try a few different (p, d, q) combinations and compare:

  • AIC/BIC: Lower is better (penalizes complexity)
  • Ljung-Box p-value: Should be > 0.05
  • RMSE: Lower is better prediction accuracy

The best model balances fit (low RMSE) with simplicity (few parameters, low AIC).