Member-only story

Auto Model Specification with ML Techniques for Time Series

How to automatically select the best trend, seasonal, and autoregressive representations for time series using the Python library, scalecast

Michael Keith

Published in

TDS Archive

6 min readOct 4, 2022

Several methods exist to find the best model specification for time series, depending on the model being employed. For ARIMA models, a popular method is to monitor information criteria while searching through different AR, I, and MA orders. This has proven to be an effective technique and popular libraries in R and Python offer auto-ARIMA models for users to experiment with. Similar methods can be used for other classically statistical time series methods, such as Holt-Winters Exponential Smoothing and TBATS.

For machine learning models, it can be slightly more complicated, and other than complex deep learning models (such as N-Beats, N-HiTS, and a few others), there aren’t many automated pure ML methods that consistently out-perform the classical models (Makridakis et al., 2020).

The Python library scalecast offers a function called auto_Xvar_select() that can be used to automatically select the best trend, seasonality, and look-back representations (or lags) for any given series using models from scikit-learn.

pip install --upgrade scalecast

The function works by first searching for the ideal representations of the time series’ given trend, then seasonality, then look-back, all separately. “Ideal” in this context means minimizing some out-of-sample error (or maximizing R2) with a selected model (multiple linear regression, or MLR, by default). After each of these has been found separately, the ideal combination of all of the above representations is searched for, with the option to consider irregular cycles and other regressors as the user sees fit.

Image by author — how scalecast automatically selects regressors for forecasting models using the auto_Xvar_select() function

It is an interesting function. When applied to the 100,000 series from the M4 competition, it returns results with varying accuracy, depending on the series’ frequency. For the hourly frequency group, an OWA of under 0.6 using each of the KNN, LightGBM…

TDS Archive

Auto Model Specification with ML Techniques for Time Series

How to automatically select the best trend, seasonal, and autoregressive representations for time series using the Python library, scalecast

Create an account to read the full story.

Published in TDS Archive

Written by Michael Keith

No responses yet