ARIMAX Time Series Forecasting Tool

New

Forecast marketing metrics using ARIMAX (AutoRegressive Integrated Moving Average with eXogenous variables). Upload time series data, select external predictors like ad spend or seasonality indicators, and generate forecasts with confidence intervals.

TEST OVERVIEW & EQUATIONS

ARIMAX extends the classic ARIMA model by incorporating external (exogenous) predictor variables. This is especially useful in marketing where outcomes like sales or conversions are influenced by controllable inputs such as advertising spend, promotions, or seasonal campaigns.

ARIMAX Model: $$ (1 - \phi_1 B - \cdots - \phi_p B^p)(1 - B)^d Y_t = (1 + \theta_1 B + \cdots + \theta_q B^q)\varepsilon_t + \sum_{j=1}^{k} \beta_j X_{j,t} $$

where $Y_t$ is the outcome at time $t$, $B$ is the backshift operator, $d$ is the differencing order, $\phi$ are AR coefficients, $\theta$ are MA coefficients, and $\beta_j$ are coefficients for exogenous predictors $X_j$.

Key Concepts

AR (p): Autoregressive terms capture how past values of the series influence current values.
I (d): Integration (differencing) removes trends to achieve stationarity.
MA (q): Moving average terms model the influence of past forecast errors.
X (exogenous): External predictors like ad spend, promotions, or economic indicators.

When to Use ARIMAX

Use ARIMAX when you have a time series outcome (sales, traffic, conversions) that you believe is influenced by external factors you can measure and potentially control. Common marketing applications include:

Forecasting sales with advertising spend as an exogenous variable
Modeling website traffic with marketing campaign indicators
Predicting revenue with economic indicators or seasonal dummies

MARKETING SCENARIOS

Load a marketing use case:

Use presets to auto-load realistic marketing time series data. The download button exposes the exact dataset used so you can tweak it in Excel before re-uploading.

INPUTS & SETTINGS

Upload Time Series Data

Upload a CSV with at least a date/time column and a numeric outcome column. Optional columns for exogenous predictors (ad spend, promotions, etc.) can be included.

Drag & drop time series file

CSV with columns: date, outcome, [optional exogenous variables]

No file uploaded.

Column Selection

Select which columns represent the time period, outcome (Y), and any exogenous predictors (X).

Time Period Column

Date/Time Format Help

Supported date formats:

YYYY-MM-DD (e.g., 2024-01-15)
YYYY-MM (e.g., 2024-01)
MM/DD/YYYY (e.g., 01/15/2024)
DD/MM/YYYY (e.g., 15/01/2024)

Not supported:

Spelled-out months (e.g., "January 2024")
Relative dates (e.g., "Last week")
Inconsistent formats within the same column

💡 Surefire Method (Recommended)

If you're having trouble with date parsing, use simple sequential labels instead:

1, 2, 3, 4, ... (numeric sequence)
t1, t2, t3, t4, ... (labeled sequence)
Period1, Period2, Period3, ...
Week1, Week2, Week3, ... or Month1, Month2, ...

The model only needs to know the order of your observations — actual dates are just for labeling the output. Sequential numbers work perfectly!

Outcome Variable (Y) Exogenous Predictors (X)

Upload data to select exogenous predictors.

Model Specification

Set the ARIMA order (p, d, q). Use the diagnostics panel to help choose appropriate values.

AR order (p)

Differencing (d)

MA order (q)

Include Seasonality (SARIMAX)

Enable this if your data shows repeating patterns at regular intervals (e.g., weekly cycles, monthly patterns).

Seasonal Parameters

Set seasonal order (P, D, Q) and the seasonal period s (number of time periods in one complete cycle).

Seasonal AR (P)

Seasonal Diff (D)

Seasonal MA (Q)

Seasonal Period (s)

📊 Platform Limitation (s ≤ 24):

Statistically, SARIMAX can handle any seasonal period — modeling weekly data with yearly cycles (s=52) is mathematically valid. However, computation time grows exponentially with s. On this shared educational server, s=52 would take 10+ minutes and risks crashing the system.

Real-world solution: Aggregate your data! Weekly → Monthly reduces s from 52 to 12, cutting computation by ~95% with minimal information loss. This is standard practice in industry.

How to Choose Seasonal Parameters

Seasonal Period (s): The number of time periods in one complete cycle.

Your Data	Cycle	s =	Platform
Monthly data	Yearly pattern	12	✅ Supported
Quarterly data	Yearly pattern	4	✅ Supported
Daily data	Weekly pattern	7	✅ Supported
Hourly data	Daily pattern	24	✅ Supported
Weekly data	Yearly pattern	52	⚠️ Aggregate to monthly
Daily data	Yearly pattern	365	⚠️ Aggregate to weekly/monthly

Seasonal P, D, Q: Usually start with (1,0,1) and adjust based on results.

P (Seasonal AR): Try 1. Use 0 if ar.S coefficient is non-significant.
D (Seasonal Diff): Use 0 first (faster). Use 1 only if the seasonal pattern's amplitude changes over time.
Q (Seasonal MA): Try 1. Use 0 if ma.S coefficient is non-significant.

How to identify s from ACF: Look at the ACF chart after fitting a non-seasonal model. Significant spikes at regular intervals (e.g., lags 12, 24, 36...) reveal your seasonal period.

💡 Pro tip: Start with (P=1, D=0, Q=1). Seasonal differencing (D=1) dramatically increases computation time — only use it if your seasonal amplitude changes over time. In production environments with dedicated compute, you'd have more flexibility.

Forecast Periods (0-12) 6 periods

Confidence Level

Update Forecast re-generates predictions using a new forecast horizon or confidence level without re-fitting the model.

Analysis Settings

Significance level for coefficients (α)

Used for coefficient p-values and significance stars.

VISUAL OUTPUT

Time Series with Forecasts

Upload data and fit the model to see the time series plot with forecasts and confidence intervals.

Interpretation Aid

The solid blue line shows historical values, while the red dashed line shows forecasts. The shaded area represents the confidence interval — wider bands indicate more uncertainty in future predictions. Forecasts depend on the assumed values for exogenous predictors above.

🔄 SARIMAX forecasts: If you enabled seasonality, the forecast should show the expected seasonal pattern (ups and downs) based on where you are in the cycle. If the forecast looks "flat" despite obvious historical patterns, check: (1) Is the pattern truly seasonal (fixed calendar cycles) or event-driven? (2) Is your seasonal period (s) correct? (3) Do you have enough data (at least 2 full cycles)?

Residuals Diagnostics

Residuals Over Time

Residuals should appear randomly scattered around zero with no obvious patterns.

ACF of Residuals

PACF of Residuals

Interpretation Aid: Understanding Residual Diagnostics

📊 Residuals Over Time

Good model fit produces residuals that look like white noise: randomly distributed around zero with constant variance. Look for:

No trends: Residuals shouldn't drift up or down over time
Constant spread: The "band" of residuals should be roughly the same width throughout
No patterns: Cycles or repeating structures suggest missed seasonality

📈 ACF & PACF Charts

ACF (Autocorrelation): Measures correlation with past values at each lag.

PACF (Partial Autocorrelation): Direct correlation at each lag, removing intermediate effects.

Bars outside red lines: Statistically significant autocorrelation (potential problem)
All bars inside red lines: Residuals are white noise (good!)

⚠️ Problem Patterns & Solutions

Pattern	Meaning	Try This
Significant ACF spikes at lags 1, 2, 3...	MA terms needed	Increase q (MA order)
Significant PACF spikes at lags 1, 2, 3...	AR terms needed	Increase p (AR order)
Slow decay in ACF	Series not stationary	Increase d (differencing)
Spikes at seasonal lags (12, 24, 52...)	Seasonal pattern not captured	Enable "Include Seasonality" and set s to the lag with spikes

SUMMARY STATISTICS

Descriptive Statistics

Outcome Variable

Statistic	Value
Provide data to see summary statistics.

Exogenous Predictors

Variable	Mean	Std. Dev.	Min	Max
Provide data to see predictor statistics.

MODEL RESULTS

Model specification: –

AIC: –

BIC: –

RMSE: –

MAE: –

Interpretation Aid

AIC/BIC: Lower values indicate better model fit relative to complexity. Use these to compare different (p,d,q) or (P,D,Q,s) specifications. Try fitting ARIMA vs SARIMAX and compare!

RMSE: Root Mean Square Error — average magnitude of prediction errors in the same units as the outcome.

MAE: Mean Absolute Error — average absolute deviation, less sensitive to outliers than RMSE.

Tip: If adding seasonality increases AIC/BIC, the seasonal pattern may not be strong enough to justify the extra complexity.

APA-Style Report

Fit the model to see the APA-style statistical report.

Managerial Interpretation

Business-focused interpretation will appear here after fitting the model.

Coefficient Estimates

Parameter	Estimate	Std. Error	p-value	95% CI
Fit the model to view coefficient estimates.

Interpretation Aid: Understanding Coefficients

📊 Types of Coefficients

AR Coefficients (ar.L1, ar.L2, ...)

What they mean: How much yesterday's (or earlier) values influence today's value, after accounting for the trend.

Positive (e.g., 0.7): Strong persistence — if sales were high last period, they'll likely be high this period
Negative (e.g., -0.3): Mean reversion — high values tend to be followed by lower values
Close to 0: Past values don't strongly predict current values
Close to 1: Random walk behavior — today ≈ yesterday + noise

Example: ar.L1 = 0.65 means "about 65% of last period's deviation from the mean carries over to this period."

MA Coefficients (ma.L1, ma.L2, ...)

What they mean: How much past forecast errors affect current values. These capture short-term adjustments.

Positive (e.g., 0.5): If we under-predicted last period, we adjust upward this period
Negative (e.g., -0.4): If we under-predicted last period, we actually adjust downward (unusual)

Example: ma.L1 = 0.45 means "if our model was off by $100 last period, add about $45 to this period's prediction."

Exogenous Coefficients (your predictor names)

What they mean: The direct effect of each external variable on your outcome, holding time-series dynamics constant.

Interpreted like standard regression coefficients
A one-unit increase in the predictor changes the outcome by the coefficient value

Example: ad_spend = 2.3 means "each additional $1 spent on advertising is associated with $2.30 more in sales, after accounting for trends and seasonality."

Sigma² (sigma2)

What it means: The estimated variance of the random error term. Larger values = more unexplained variability.

🔄 Seasonal Coefficients (ar.S.L, ma.S.L) — SARIMAX Only

What they mean: These appear when you enable seasonality. They capture how values from the same season last cycle influence the current value.

ar.S.L52: How this week's value relates to the same week last year (for s=52)
ma.S.L12: Adjustment based on forecast errors from the same month last year (for s=12)

Example: ar.S.L52 = 0.8 means "80% of the deviation we saw in the same week last year carries over to this week."

If seasonal coefficients are non-significant, the seasonal pattern may be weak or you may have chosen the wrong period (s).

📈 Statistical Significance

p-value < 0.05: Coefficient is statistically significant (highlighted in green)
p-value > 0.05: Cannot rule out that the true effect is zero
Confidence Interval: If it doesn't include 0, the coefficient is significant

Non-significant AR or MA terms might indicate you've over-specified the model. Try reducing p or q (or P/Q for seasonal terms).

⚠️ Common Issues

Very large standard errors: Possible multicollinearity or insufficient data
AR coefficient > 1: Model may be unstable; try increasing d (differencing)
All exogenous coefficients non-significant: External variables may not help; try a simpler ARIMA model
Seasonal coefficients non-significant: You may have the wrong seasonal period (s), or the pattern isn't truly seasonal
Model takes very long to fit: Large seasonal periods (s=52) with high P/D/Q can be slow — try reducing to (1,0,1)

Forecasts

Period	Forecast	Lower (95%)	Upper (95%)
Fit the model to view forecasts. Use the slider to select 0-10 forecast periods.

DIAGNOSTICS & ASSUMPTIONS

Stationarity & Model Diagnostics

Click "Check Stationarity" or fit the model to see diagnostic tests.

Augmented Dickey-Fuller Test (ADF)

Tests whether the series has a unit root (non-stationary).

What is the ADF Test?

The ADF test checks if your time series is stationary (statistical properties don't change over time). ARIMA models require stationarity.

p-value < 0.05: ✅ Series is stationary. Good to proceed.
p-value ≥ 0.05: ⚠️ Series is non-stationary. Differencing (d > 0) is needed.

What makes a series non-stationary?

Trends (consistently going up or down)
Changing variance (volatility increases over time)
Seasonal patterns with changing amplitude

The fix: Differencing (setting d=1 or d=2) removes trends. The model then works with changes rather than levels.

Ljung-Box Test (Residual Autocorrelation)

Tests whether residuals exhibit significant autocorrelation.

What is the Ljung-Box Test?

This test checks if there's leftover pattern in your residuals (the differences between actual and fitted values).

p-value > 0.05: ✅ No significant autocorrelation. Residuals look like random noise. Model is adequate.
p-value ≤ 0.05: ⚠️ Significant autocorrelation detected. The model is missing some structure.

If you see significant autocorrelation:

Try increasing p (AR order) if PACF shows spikes
Try increasing q (MA order) if ACF shows spikes
Check for seasonal patterns you haven't accounted for
Consider adding more exogenous variables

Model Selection Guidance

Choosing p (AR order)

Look at the PACF plot of your original series:

Count significant spikes (bars outside red lines)
The number of spikes before cutoff = suggested p
Typical values: 0, 1, or 2

PACF shows 2 significant spikes → try p=2

Choosing d (Differencing)

Based on the ADF test:

ADF p-value < 0.05 → d=0 (already stationary)
ADF p-value ≥ 0.05 → try d=1
Still non-stationary with d=1 → try d=2
Rarely need d > 2

Upward trend in data → likely need d=1

Choosing q (MA order)

Look at the ACF plot of your original series:

Count significant spikes after lag 0
Spikes that cut off sharply suggest MA terms
Typical values: 0, 1, or 2

ACF shows 1 significant spike at lag 1 → try q=1

Common Starting Points

Model	When to Use
ARIMA(1,1,1)	Good default for trending data with some persistence
ARIMA(0,1,1)	Random walk with smoothing (simple exponential smoothing)
ARIMA(1,0,0)	Stationary data with persistence (AR(1) process)
ARIMA(0,1,0)	Pure random walk (tomorrow = today + noise)
ARIMA(2,1,2)	More complex dynamics; try if simpler models fail diagnostics

💡 Model Comparison Tip

Try a few different (p, d, q) combinations and compare:

AIC/BIC: Lower is better (penalizes complexity)
Ljung-Box p-value: Should be > 0.05
RMSE: Lower is better prediction accuracy

The best model balances fit (low RMSE) with simplicity (few parameters, low AIC).

👨‍🏫 Professor Mode: Guided Learning Experience

TEST OVERVIEW & EQUATIONS

MARKETING SCENARIOS

INPUTS & SETTINGS

Upload Time Series Data

Column Selection

💡 Surefire Method (Recommended)

Model Specification

Seasonal Parameters

Analysis Settings

VISUAL OUTPUT

Time Series with Forecasts

📊 Forecast Scenario: Set Exogenous Predictor Values

Residuals Diagnostics

Residuals Over Time

ACF of Residuals

PACF of Residuals

📊 Residuals Over Time

📈 ACF & PACF Charts

⚠️ Problem Patterns & Solutions

SUMMARY STATISTICS

Descriptive Statistics

Outcome Variable

Exogenous Predictors

MODEL RESULTS

APA-Style Report

Managerial Interpretation

Coefficient Estimates

📊 Types of Coefficients

AR Coefficients (ar.L1, ar.L2, ...)

MA Coefficients (ma.L1, ma.L2, ...)

Exogenous Coefficients (your predictor names)

Sigma² (sigma2)

🔄 Seasonal Coefficients (ar.S.L*, ma.S.L*) — SARIMAX Only

📈 Statistical Significance

⚠️ Common Issues

Forecasts

DIAGNOSTICS & ASSUMPTIONS

Augmented Dickey-Fuller Test (ADF)

Ljung-Box Test (Residual Autocorrelation)

Model Selection Guidance

Choosing p (AR order)

Choosing d (Differencing)

Choosing q (MA order)

Common Starting Points

💡 Model Comparison Tip

🔄 Seasonal Coefficients (ar.S.L, ma.S.L) — SARIMAX Only