Forecast a single time series with an ARIMA_PLUS univariate modelStay organized with collectionsSave and categorize content based on your preferences.
This tutorial teaches you how to use anARIMA_PLUSunivariate time series modelto forecast the future value for a given column based on the historical values
for that column.
This tutorial forecasts a single time series. Forecasted values are
calculated once for each time point in the input data.
Retrieving the forecasted site traffic information from the model by using theML.FORECASTfunction.
Retrieving components of the time series, such as seasonality and trend,
by using theML.EXPLAIN_FORECASTfunction.
You can inspect these time series components in order to explain the
forecasted values.
Costs
This tutorial uses billable components of Google Cloud, including the following:
BigQuery
BigQuery ML
For more information about BigQuery costs, see theBigQuery pricingpage.
Sign in to your Google Cloud account. If you're new to
Google Cloud,create an accountto evaluate how our products perform in
real-world scenarios. New customers also get $300 in free credits to
run, test, and deploy workloads.
In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
Roles required to select or create a project
Select a project: Selecting a project doesn't require a specific
IAM role—you can select any project that you've been
granted a role on.
Create a project: To create a project, you need the Project Creator role
(roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant
roles.
In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
Roles required to select or create a project
Select a project: Selecting a project doesn't require a specific
IAM role—you can select any project that you've been
granted a role on.
Create a project: To create a project, you need the Project Creator role
(roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant
roles.
BigQuery is automatically enabled in new projects.
To activate BigQuery in a pre-existing project, go to
Enable the BigQuery API.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM
role (roles/serviceusage.serviceUsageAdmin), which
contains theserviceusage.services.enablepermission.Learn how to grant
roles.
Before creating the model, you can optionally visualize your input
time series data to get a sense of the distribution. You can do this by using Looker Studio.
Follow these steps to visualize the time series data:
SQL
In the following GoogleSQL query, theSELECTstatement parses thedatecolumn from the input
table to theTIMESTAMPtype and renames it toparsed_date, and uses
theSUM(...)clause and theGROUP BY dateclause to create a dailytotals.visitsvalue.
In the Google Cloud console, go to theBigQuerypage.
importbigframes.pandasasbpd# Start by loading the historical data from BigQuerythat you want to analyze and forecast.# This clause indicates that you are querying the ga_sessions_* tables in the google_analytics_sample dataset.# Read and visualize the time series you want to forecast.df=bpd.read_gbq("bigquery-public-data.google_analytics_sample.ga_sessions_*")parsed_date=bpd.to_datetime(df.date,format="%Y%m%d",utc=True)parsed_date.name="parsed_date"visits=df["totals"].struct.field("visits")visits.name="total_visits"total_visits=visits.groupby(parsed_date).sum()# Expected output: total_visits.head()# parsed_date# 2016-08-01 00:00:00+00:00 1711# 2016-08-02 00:00:00+00:00 2140# 2016-08-03 00:00:00+00:00 2890# 2016-08-04 00:00:00+00:00 3161# 2016-08-05 00:00:00+00:00 2702# Name: total_visits, dtype: Int64total_visits.plot.line()
The result is similar to the following:
Create the time series model
Create a time series model to forecast total site visits as represented bytotals.visitscolumn, and train it on the Google Analytics 360
data.
SQL
In the following query, theOPTIONS(model_type='ARIMA_PLUS', time_series_timestamp_col='date', ...)clause indicates that you are creating anARIMA-based
time series model. Theauto_arimaoptionof theCREATE MODELstatement defaults toTRUE, so theauto.ARIMAalgorithm automatically tunes the hyperparameters in the model. The algorithm
fits dozens of candidate models and chooses the best model, which is the model
with the lowestAkaike information criterion (AIC).
Thedata_frequencyoptionof theCREATE MODELstatements defaults toAUTO_FREQUENCY, so the
training process automatically infers the data frequency of the input time
series. Thedecompose_time_seriesoptionof theCREATE MODELstatement defaults toTRUE, so that information about
the time series data is returned when you evaluate the model in the next step.
Follow these steps to create the model:
In the Google Cloud console, go to theBigQuerypage.
The query takes about 4 seconds to complete, after which you can access thega_arima_modelmodel. Because the query uses aCREATE MODELstatement
to create a model, you don't see query results.
frombigframes.mlimportforecastingimportbigframes.pandasasbpd# Create a time series model to forecast total site visits:# The auto_arima option defaults to True, so the auto.ARIMA algorithm automatically# tunes the hyperparameters in the model.# The data_frequency option defaults to 'auto_frequency so the training# process automatically infers the data frequency of the input time series.# The decompose_time_series option defaults to True, so that information about# the time series data is returned when you evaluate the model in the next step.model=forecasting.ARIMAPlus()model.auto_arima=Truemodel.data_frequency="auto_frequency"model.decompose_time_series=True# Use the data loaded in the previous step to fit the modeltraining_data=total_visits.to_frame().reset_index(drop=False)X=training_data[["parsed_date"]]y=training_data[["total_visits"]]model.fit(X,y)
Evaluate the candidate models
SQL
Evaluate the time series models by using theML.ARIMA_EVALUATEfunction. TheML.ARIMA_EVALUATEfunction shows you the evaluation metrics of
all the candidate models evaluated during the process of automatic
hyperparameter tuning.
Follow these steps to evaluate the model:
In the Google Cloud console, go to theBigQuerypage.
# Evaluate the time series models by using the summary() function. The summary()# function shows you the evaluation metrics of all the candidate models evaluated# during the process of automatic hyperparameter tuning.summary=model.summary(show_all_candidate_models=True,)print(summary.peek())# Expected output:# row non_seasonal_p non_seasonal_d non_seasonal_q has_drift log_likelihood AIC variance seasonal_periods has_holiday_effect has_spikes_and_dips has_step_changes error_message# 0 0 1 3 True -2464.255656 4938.511313 42772.506055 ['WEEKLY'] False False True# 1 2 1 0 False -2473.141651 4952.283303 44942.416463 ['WEEKLY'] False False True# 2 1 1 0 False -2479.880885 4963.76177 46642.953433 ['WEEKLY'] False False True# 3 0 1 1 False -2470.632377 4945.264753 44319.379307 ['WEEKLY'] False False True# 4 2 1 1 True -2463.671247 4937.342493 42633.299513 ['WEEKLY'] False False True
Thenon_seasonal_p,non_seasonal_d,non_seasonal_q, andhas_driftoutput columns define an ARIMA model in the training pipeline. Thelog_likelihood,AIC, andvarianceoutput columns are relevant to the ARIMA
model fitting process.
Theauto.ARIMAalgorithm uses theKPSS testto determine the best value
fornon_seasonal_d, which in this case is1. Whennon_seasonal_dis1,
theauto.ARIMAalgorithm trains 42 different candidate ARIMA models in parallel.
In this example, all 42 candidate models are valid, so the output contains 42
rows, one for each candidate ARIMA model; in cases where some of the models
aren't valid, they are excluded from the output. These candidate models are
returned in ascending order by AIC. The model in the first row has the lowest
AIC, and is considered the best model. The best model is saved as the final
model and is used when you call functions such asML.FORECASTon the model
Theseasonal_periodscolumn contains information about the seasonal pattern
identified in the time series data. It has nothing to do with the ARIMA
modeling, therefore it has the same value across all output rows. It reports a
weekly pattern, which agrees with the results you saw if you chose to
visualize the input data.
Thehas_holiday_effect,has_spikes_and_dips, andhas_step_changescolumns
are only populated whendecompose_time_series=TRUE. These columns also reflect
information about the input time series data, and are not related to the ARIMA
modeling. These columns also have the same values across all output rows.
Theerror_messagecolumn shows any errors that incurred during theauto.ARIMAfitting process. One possible reason for errors is when the selectednon_seasonal_p,non_seasonal_d,non_seasonal_q, andhas_driftcolumns
are not able to stabilize the time series. To retrieve the error
message of all the candidate models, set theshow_all_candidate_modelsoption toTRUEwhen you create the model.
Thear_coefficientsoutput column shows the model coefficients of the
autoregressive (AR) part of the ARIMA model. Similarly, thema_coefficientsoutput column shows the model coefficients of the moving-average (MA) part of
the ARIMA model. Both of these columns contain array values, whose lengths are
equal tonon_seasonal_pandnon_seasonal_q, respectively. You saw in the
output of theML.ARIMA_EVALUATEfunction that the best model has anon_seasonal_pvalue of2and anon_seasonal_qvalue of3. Therefore, in
theML.ARIMA_COEFFICIENTSoutput, thear_coefficientsvalue is a 2-element
array and thema_coefficientsvalue is a 3-element array. Theintercept_or_driftvalue is the constant term in the ARIMA model.
Thear_coefficientsoutput column shows the model coefficients of the
autoregressive (AR) part of the ARIMA model. Similarly, thema_coefficientsoutput column shows the model coefficients of the moving-average (MA) part of
the ARIMA model. Both of these columns contain array values, whose lengths are
equal tonon_seasonal_pandnon_seasonal_q, respectively.
Use the model to forecast data
SQL
Forecast future time series values by using theML.FORECASTfunction.
In the following GoogleSQL query, theSTRUCT(30 AS horizon, 0.8 AS confidence_level)clause indicates that the
query forecasts 30 future time points, and generates a prediction interval
with an 80% confidence level.
Follow these steps to forecast data with the model:
In the Google Cloud console, go to theBigQuerypage.
The output rows are in chronological order by theforecast_timestampcolumn value. In time series forecasting, the prediction
interval, as represented by theprediction_interval_lower_boundandprediction_interval_upper_boundcolumn values, is as important as theforecast_valuecolumn value. Theforecast_valuevalue is the middle point
of the prediction interval. The prediction interval depends on thestandard_errorandconfidence_levelcolumn values.
You can get explainability metrics in addition to forecast data by using theML.EXPLAIN_FORECASTfunction. TheML.EXPLAIN_FORECASTfunction forecasts
future time series values and also returns all the separate components of the
time series.
Similar to theML.FORECASTfunction, theSTRUCT(30 AS horizon, 0.8 AS confidence_level)clause used in theML.EXPLAIN_FORECASTfunction indicates that the query forecasts 30 future
time points and generates a prediction interval with 80% confidence.
Follow these steps to explain the model's results:
In the Google Cloud console, go to theBigQuerypage.
You can get explainability metrics in addition to forecast data by using thepredict_explainfunction. Thepredict_explainfunction forecasts
future time series values and also returns all the separate components of the
time series.
Similar to thepredictfunction, thehorizon=30, confidence_level=0.8clause used in thepredict_explainfunction indicates that the query forecasts 30 future
time points and generates a prediction interval with 80% confidence.
If you would like to visualize the results, you can use
Looker Studio as described in theVisualize the input datasection to create a chart, using the following columns as metrics:
time_series_data
prediction_interval_lower_bound
prediction_interval_upper_bound
trend
seasonal_period_weekly
step_changes
Clean up
To avoid incurring charges to your Google Cloud account for the resources used in this
tutorial, either delete the project that contains the resources, or keep the project and
delete the individual resources.
You can delete the project you created.
Or you can keep the project and delete the dataset.
Delete your dataset
Deleting your project removes all datasets and all tables in the project. If you
prefer to reuse the project, you can delete the dataset you created in this
tutorial:
If necessary, open the BigQuery page in the
Google Cloud console.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2026-03-19 UTC."],[],[]]