Anomaly detection overview

Anomaly detection is a data mining technique that you can use to identify data deviations in a given dataset. For example, if the return rate for a given product increases substantially from the baseline for that product, that might indicate a product defect or potential fraud. You can use anomaly detection to detect critical incidents, such as technical issues, or opportunities, such as changes in consumer behavior.

One challenge when you use anomaly detection is determining what counts as anomalous data. If you have labeled data that identifies anomalies, you can perform anomaly detection by using the ML.PREDICT function with one of the following supervised machine learning models:

If you aren't certain what counts as anomalous data, or you don't have labeled data to train a model on, you can use unsupervised machine learning to perform anomaly detection. Use the ML.DETECT_ANOMALIES function with one of the following models to detect anomalies in training data or new serving data:

Data type

Model types

What

 ML.DETECT_ANOMALIES

does

Time series

ARIMA_PLUS

Detect the anomalies in the time series.

ARIMA_PLUS_XREG

Detect the anomalies in the time series with external regressors.

Independent and identically distributed random variables (IID)

K-means

Detect anomalies based on the shortest distance among the normalized distances from the input data to each cluster centroid. For a definition of normalized distances, see the k-means model output for the ML.DETECT_ANOMALIES function. .

Autoencoder

Detect anomalies based on the reconstruction loss in terms of mean squared error. For more information, see

 ML.RECONSTRUCTION_LOSS

. The ML.RECONSTRUCTION_LOSS function can retrieve all types of reconstruction loss.

PCA

Detect anomalies based upon the reconstruction loss in terms of mean squared error.

Recommended knowledge

By using the default settings in the CREATE MODEL statements and the inference functions, you can create and use an anomaly detection model even without much ML knowledge. However, having basic knowledge about ML development helps you optimize both your data and your model to deliver better results. We recommend using the following resources to develop familiarity with ML techniques and processes: