Overview
This page provides an overview of the AML AI processes and covers key concepts for customers to understand. It is targeted primarily at teams who will use AML AI to train, test, and deploy models.
AML AI enables banks to automatically train, test, and deploy models for detecting money laundering. The AML AI guides are split into five left-navigation sections that correspond to the following five steps.
- Configure an engine
- Train a model
- Evaluate a model's performance
- Analysis and testing for risk governance
- Pre-production and production use
The core AML AI technical operations to create, test, and deploy models are as follows. These support steps 2-4 in the preceding table.
- Create AML AI Dataset - creates a structured set of BigQuery input data tables for AML AI
- Engine Configuration - tunes an AML AI engine to an AML AI dataset, including hyperparameter tuning
- Model Training - trains an AML AI model using an Engine Configuration and a dataset
- Backtest - tests an AML AI model against historic data on a dataset and summarizes performance
- Register Parties - registers parties (customers of the bank who have banking products and send or receive transactions) so they can be scored in prediction
- Prediction - produces party scores and explainability for use in production
Engine Configuration, Model Training, Model Backtest, and Model Prediction all require an AML AI dataset as input and return corresponding artifacts which are used in other operations. For example, Model Training returns a reference to a trained AML AI model which can be used for backtest or prediction. For technical details of the operations, see the REST Reference Overview .
Dependency tree for AML AI processes
Important considerations when using AML AI
This section is designed to give customers an introduction to the key concepts of AML AI and advise on some best practices. Topics here are covered in more detail in dedicated guides and links are provided for further reading.
Date consistency
AML AI uses different time periods for different operations. Care should be taken with the dates selected for each operation to ensure reliable results. In particular, to avoid bias in results, it is important that the months used for training an AML AI model don't overlap with the months used for backtesting.
Since an AML AI dataset contains many months of data, datasets can be used for multiple operations, subject to these correct date selections. The following diagram illustrates a development cycle using AML AI, where different time periods within a single dataset spanning 42 months are used to configure an engine (hyperparameter tuning), training, and backtesting. All these processes use lookback windows which provide context to the model, and can safely overlap with data used for other operations.
For more information on AML AI datasets and time windows for different operations, see Understand data scope and duration .
To ensure you record changes to your data over time correctly, see Data changes over time .
Production batch frequency
In prediction, AML AI produces AML risk scores on a calendar month basis. Customers commonly use AML AI as part of a monthly batch process, and they are advised to run predictions on months with complete transaction data as much as possible.
Field consistency
As with any machine learning process, data should be as consistent as possible between training data and test data. If fields are not populated consistently, the changes may cause unreliable results. It's strongly recommended that steps are taken to ensure fields are populated consistently for each operation in a development cycle——and this is especially true if different datasets are used for each operation. For more information, see dataset consistency .
Engine configs
Once an engine config has been created, it's not normally necessary to re-create it for every new dataset or in every development cycle. The hyperparameters chosen in an engine config for one dataset generally perform well on similar datasets.
Iterative development cycles are illustrated in the following diagram, whereas the preceding diagram uses a single dataset for both Model Training and Backtest operations.
For more information, see when to tune or inherit .
Data lineage
Most model governance policies define a requirement to track data lineage used across all ML operations from engine configuration, training, evaluation, and prediction. Customers are responsible for tracking this data lineage.
We recommend using a unique identifier in the names for all input data, AML AI resources, and output data to track lineage across stages. This helps to ensure strong linking between resources in a particular run. Customers can also label all AML AI resources to meet lineage requirements.
Additionally, we recommend using BigQuery snapshots in API requests to ensure accurate data lineage.
This configuration helps answer questions like "where did this engine configuration come from?" and "where did this model come from?" while helping to investigate and resolve incidents.
For details of how to create and manage AML AI resources, see the REST API pages.

