Create and manage AML AI datasets

This page shows you the steps to create and manage AML AI datasets. A dataset is used as an input for the engine configuration, training, backtest, and prediction pipelines. An AML AI dataset contains references to BigQuery tables matching the AML AI input data model in a Google Cloud project.

Prerequisites

To get the permissions that you need to create and manage datasets, ask your administrator to grant you the Financial Services Admin ( financialservices.admin ) IAM role on your project. For more information about granting roles, see Manage access to projects, folders, and organizations .

You might also be able to get the required permissions through custom roles or other predefined roles .
Create an instance
Some API methods return a long-running operation (LRO). These methods are asynchronous and return an Operation object; for details, see REST Reference . The operation might not be completed when the method returns a response. For these methods, send the request and then check for the result. In general, all POST, PUT, UPDATE, and DELETE operations are long-running.

Create a dataset

To create a dataset, send the create request and then check for the result of the LRO.

Send the request

To create a dataset, use the projects.locations.instances.datasets.create method.

Permissions required for this task

To perform this task, you must have been granted the following permissions:

Permissions

financialservices.v1datasets.create

Before using any of the request data, make the following replacements:

PROJECT_ID : your Google Cloud project ID listed in the IAM Settings
LOCATION : the location of the instance; use one of the supported regions
Show locations
- us-central1
- us-east1
- asia-south1
- europe-west1
- europe-west2
- europe-west4
- northamerica-northeast1
- southamerica-east1
- australia-southeast1
INSTANCE_ID : the user-defined identifier for the instance
DATASET_ID : a user-defined identifier for the AML AI dataset; use only lowercase letters, numbers, dashes, and underscores (for example, train_jan2018_apr2020 )
BQ_INPUT_DATASET_NAME : the BigQuery input dataset name
PARTY_TABLE : the Party table in the BigQuery input dataset
ACCOUNT_PARTY_LINK_TABLE : the AccountPartyLink table in the BigQuery input dataset
TRANSACTION_TABLE : the Transaction table in the BigQuery input dataset
RISK_CASE_EVENT_TABLE : the RiskCaseEvent table in the BigQuery input dataset
PARTY_SUPPLEMENTARY_DATA : the PartySupplementaryData table in the BigQuery input dataset; this table is optional and can be removed from the request JSON
DATA_START_DATE : the start date and time of the data to use in the dataset; use RFC3339 UTC "Zulu" format (for example, 2014-10-02T15:01:23Z )
DATA_END_DATE : the end date and time of the data to use in the dataset; use RFC3339 UTC "Zulu" format (for example, 2014-10-02T15:01:23Z )

Request JSON body:

{
  "tableSpecs": {
    "party": "bq:// PROJECT_ID 
. BQ_INPUT_DATASET_NAME 
. PARTY_TABLE 
",
    "account_party_link": "bq:// PROJECT_ID 
. BQ_INPUT_DATASET_NAME 
. ACCOUNT_PARTY_LINK_TABLE 
",
    "transaction": "bq:// PROJECT_ID 
. BQ_INPUT_DATASET_NAME 
. TRANSACTION_TABLE 
",
    "risk_case_event": "bq:// PROJECT_ID 
. BQ_INPUT_DATASET_NAME 
. RISK_CASE_EVENT_TABLE 
",
    "party_supplementary_data": "bq:// PROJECT_ID 
. BQ_INPUT_DATASET_NAME 
. PARTY_SUPPLEMENTARY_DATA 
"
  },
  "dateRange": {
    "startTime": " DATA_START_DATE 
",
    "endTime": " DATA_END_DATE 
"
  },
  "timeZone": {
    "id": "UTC"
  }
}

To send your request, choose one of these options:

curl

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login , or by using Cloud Shell , which automatically logs you into the gcloud CLI . You can check the currently active account by running gcloud auth list .

Save the request body in a file named request.json . Run the following command in the terminal to create or overwrite this file in the current directory:

cat > request.json << 'EOF'
{
  "tableSpecs": {
    "party": "bq:// PROJECT_ID 
. BQ_INPUT_DATASET_NAME 
. PARTY_TABLE 
",
    "account_party_link": "bq:// PROJECT_ID 
. BQ_INPUT_DATASET_NAME 
. ACCOUNT_PARTY_LINK_TABLE 
",
    "transaction": "bq:// PROJECT_ID 
. BQ_INPUT_DATASET_NAME 
. TRANSACTION_TABLE 
",
    "risk_case_event": "bq:// PROJECT_ID 
. BQ_INPUT_DATASET_NAME 
. RISK_CASE_EVENT_TABLE 
",
    "party_supplementary_data": "bq:// PROJECT_ID 
. BQ_INPUT_DATASET_NAME 
. PARTY_SUPPLEMENTARY_DATA 
"
  },
  "dateRange": {
    "startTime": " DATA_START_DATE 
",
    "endTime": " DATA_END_DATE 
"
  },
  "timeZone": {
    "id": "UTC"
  }
}
EOF

Then execute the following command to send your REST request:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://financialservices.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION 
/instances/ INSTANCE_ID 
/datasets?dataset_id= DATASET_ID 
"

PowerShell

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login . You can check the currently active account by running gcloud auth list .

Save the request body in a file named request.json . Run the following command in the terminal to create or overwrite this file in the current directory:

@'
{
  "tableSpecs": {
    "party": "bq:// PROJECT_ID 
. BQ_INPUT_DATASET_NAME 
. PARTY_TABLE 
",
    "account_party_link": "bq:// PROJECT_ID 
. BQ_INPUT_DATASET_NAME 
. ACCOUNT_PARTY_LINK_TABLE 
",
    "transaction": "bq:// PROJECT_ID 
. BQ_INPUT_DATASET_NAME 
. TRANSACTION_TABLE 
",
    "risk_case_event": "bq:// PROJECT_ID 
. BQ_INPUT_DATASET_NAME 
. RISK_CASE_EVENT_TABLE 
",
    "party_supplementary_data": "bq:// PROJECT_ID 
. BQ_INPUT_DATASET_NAME 
. PARTY_SUPPLEMENTARY_DATA 
"
  },
  "dateRange": {
    "startTime": " DATA_START_DATE 
",
    "endTime": " DATA_END_DATE 
"
  },
  "timeZone": {
    "id": "UTC"
  }
}
'@  | Out-File -FilePath request.json -Encoding utf8

Then execute the following command to send your REST request:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://financialservices.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION 
/instances/ INSTANCE_ID 
/datasets?dataset_id= DATASET_ID 
" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{
  "name": "projects/ PROJECT_ID 
/locations/ LOCATION 
/operations/ OPERATION_ID 
",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.financialservices.v1.OperationMetadata",
    "createTime": CREATE_TIME 
,
    "target": "projects/ PROJECT_ID 
/locations/ LOCATION 
/instances/ INSTANCE_ID 
/datasets/ DATASET_ID 
",
    "verb": "create",
    "requestedCancellation": false,
    "apiVersion": "v1"
  },
  "done": false
}

Copy the returned OPERATION_ID to use in the next section.

Check for the result

Use the projects.locations.operations.get method to check if the dataset has been created. If the response contains "done": false , repeat the command until the response contains "done": true . These operations can take a few minutes to several hours to complete.

Permissions required for this task

To perform this task, you must have been granted the following permissions:

Permissions

financialservices.operations.get

Create and manage AML AI datasets

Prerequisites

Create a dataset

Send the request

Permissions required for this task

curl

PowerShell

Check for the result

Permissions required for this task

curl

PowerShell

Get a dataset

Permissions required for this task

curl

PowerShell

Update a dataset

Permissions required for this task

curl

PowerShell

List the datasets

Permissions required for this task

curl

PowerShell

Delete a dataset

Permissions required for this task

curl

PowerShell

Create and manage AML AI datasets Stay organized with collections Save and categorize content based on your preferences.

Prerequisites

Create a dataset

Send the request

Permissions required for this task

curl

PowerShell

Check for the result

Permissions required for this task

curl

PowerShell

Get a dataset

Permissions required for this task

curl

PowerShell

Update a dataset

Permissions required for this task

curl

PowerShell

List the datasets

Permissions required for this task

curl

PowerShell

Delete a dataset

Permissions required for this task

curl

PowerShell

Create and manage AML AI datasets