Manage Storage Insights dataset configurations

This page shows you how to manage Storage Insights datasets configurations to control the source, scope, and retention of your data. You'll learn how to view, list, update, and delete configurations, as well as how to view, query and unlink your linked datasets.

Get the required roles

To get the permissions that you need to manage dataset configurations, ask your administrator to grant you the following IAM roles on your source projects:

For more information about granting roles, see Manage access to projects, folders, and organizations .

These predefined roles contain the permissions required to manage dataset configurations. To see the exact permissions that are required, expand the Required permissionssection:

Required permissions

The following permissions are required to manage dataset configurations:

  • View and list dataset configuration:
    • storageinsights.datasetConfigs.get
    • storageinsights.datasetConfigs.list
    • storage.buckets.getObjectInsights
  • Update and delete dataset configuration:
    • storageinsights.datasetConfigs.update
    • storageinsights.datasetConfigs.delete
    • storage.buckets.getObjectInsights
  • Unlink to BigQuery dataset: storageinsights.datasetConfigs.unlinkDataset
  • Query BigQuery linked datasets: bigquery.jobs.create or bigquery.jobs.*

You might also be able to get these permissions with custom roles or other predefined roles .

View and query linked datasets

To view and query linked datasets, follow these steps:

  1. In the Google Cloud console, go to the Cloud Storage Storage Insights page.

    Go to Storage Insights

    Your project shows a list of created dataset configurations.

  2. Click the BigQuery linked dataset for the dataset configuration you want to view.

    The Google Cloud console displays the BigQuery linked dataset. For information about the dataset schema of metadata, see Dataset schema of metadata .

  3. You can query tables and views in your linked datasets in the same way you would query any other BigQuery table .

To stop the dataset configuration from publishing to the BigQuery dataset, unlink the dataset. To unlink a dataset, complete the following steps:

  1. In the Google Cloud console, go to the Cloud Storage Storage Insights page.

    Go to Storage Insights

  2. Click the name of the dataset configuration that generated the dataset you want to unlink.

  3. In the BigQuery linked datasetsection, click Unlink dataset.

  1. To unlink the dataset, run the gcloud storage insights dataset-configs delete-link command:

    gcloud storage insights dataset-configs delete-link DATASET_CONFIG_ID 
    --location= LOCATION 
    

    Replace:

    • DATASET_CONFIG_ID with the name of the dataset configuration that generated the dataset you want to unlink.

    • LOCATION with the location of your dataset and dataset configuration. For example, us-central1 .

    You can also specify a full dataset configuration path. For example:

    gcloud storage insights dataset-configs delete-link projects/ DESTINATION_PROJECT_ID 
    /locations/ LOCATION 
    /datasetConfigs/ DATASET_CONFIG_ID 
    

    Replace:

    • DESTINATION_PROJECT_ID with the ID of the project that contains the dataset configuration. For more information about project IDs, see Creating and managing projects .

    • DATASET_CONFIG_ID with the name of the dataset configuration that generated the dataset you want to unlink.

    • LOCATION with the location of your dataset and dataset configuration. For example, us-central1 .

  1. Have gcloud CLI installed and initialized , which lets you generate an access token for the Authorization header.

  2. Create a JSON file that contains the following information:

     { 
      
     "name" 
     : 
      
     " DATASET_NAME 
    " 
     } 
    

    Replace:

    DATASET_NAME with the name of the dataset you want to unlink. For example, my_project.my_dataset276daa7e_2991_4f4f_b9d4_e354b48426a2 .

  3. Use cURL to call the JSON API with an unlinkDataset DatasetConfig request:

    curl --request POST --data-binary @ JSON_FILE_NAME 
    \
    "https://storageinsights.googleapis.com/v1/projects/ PROJECT_ID 
    /locations/ LOCATION 
    /datasetConfigs/ DATASET_CONFIG_ID 
    :unlinkDataset?" \
      --header "Authorization: Bearer $(gcloud auth print-access-token --impersonate-service-account= SERVICE_ACCOUNT 
    )" \
      --header "Accept: application/json" \
      --header "Content-Type: application/json"

    Replace:

    • JSON_FILE_NAME with the path to the JSON file you created in the previous step.

    • PROJECT_ID with the ID of the project that the dataset configuration belongs to.

    • LOCATION with the location of the dataset and dataset configuration. For example, us-central1 .

    • DATASET_CONFIG_ID with the name of the dataset configuration that generated the dataset you want to unlink.

    • SERVICE_ACCOUNT with the service account. For example, test-service-account@test-project.iam.gserviceaccount.com .

View a dataset configuration

To view a dataset configuration, complete the following steps:

Console

  1. In the Google Cloud console, go to the Cloud Storage Storage Insights page.

    Go to Storage Insights

  2. Click the name of the dataset configuration you want to view.

    The dataset configuration details are displayed.

Command line

  1. To describe a dataset configuration, run the gcloud storage insights dataset-configs describe command:

    gcloud storage insights dataset-configs describe DATASET_CONFIG_ID 
    \
      --location= LOCATION 
    

    Replace:

    • DATASET_CONFIG_ID with the name of the dataset configuration.

    • LOCATION with the location of the dataset and dataset configuration.

    You can also specify a full dataset configuration path. For example:

    gcloud storage insights dataset-configs describe projects/ DESTINATION_PROJECT_ID 
    /locations/ LOCATION 
    /datasetConfigs/ DATASET_CONFIG_ID 
    

    Replace:

    • DESTINATION_PROJECT_ID with the ID of the project that contains the dataset configuration. For more information about project IDs, see Creating and managing projects .

    • DATASET_CONFIG_ID with the name of the dataset configuration that generated the dataset you want to view.

    • LOCATION with the location of your dataset and dataset configuration. For example, us-central1 .

JSON API

  1. Have gcloud CLI installed and initialized , which lets you generate an access token for the Authorization header.

  2. Use cURL to call the JSON API with an Get DatasetConfig request:

    curl -X GET \
    "https://storageinsights.googleapis.com/v1/projects/ PROJECT_ID 
    /locations/ LOCATION 
    /datasetConfigs/ DATASET_CONFIG_ID 
    " \
      --header "Authorization: Bearer $(gcloud auth print-access-token --impersonate-service-account= SERVICE_ACCOUNT 
    )" \
      --header "Accept: application/json" \
      --header "Content-Type: application/json"

    Replace:

    • PROJECT_ID with the ID of the project that the dataset configuration belongs to.

    • LOCATION with the location of the dataset and dataset configuration. For example, us-central1 .

    • DATASET_CONFIG_ID with the name of the dataset configuration.

    • SERVICE_ACCOUNT with the service account. For example, test-service-account@test-project. .

List dataset configurations

To list the dataset configurations in a project, complete the following steps:

Console

  1. In the Google Cloud console, go to the Cloud Storage Storage Insights page.

    Go to Storage Insights

    The list of dataset configurations is displayed.

Command line

  1. To list dataset configurations in a project, run the gcloud storage insights dataset-configs list command:

    gcloud storage insights dataset-configs list --location= LOCATION 
    

    Replace:

    • LOCATION with the location of the dataset and dataset configuration. For example, us-central1 .

    You can use the following optional flags to specify the behavior of the listing call:

    • Use --page-size to specify the maximum number of results to return per page.

    • Use --filter= FILTER to filter results. For more information on how to use the --filter flag, run gcloud topic filters and refer to the documentation.

    • Use --sort-by= SORT_BY_VALUE to specify a comma-separated list of resource field key names to sort by. For example, --sort-by=DATASET_CONFIG_ID .

JSON API

  1. Have gcloud CLI installed and initialized , which lets you generate an access token for the Authorization header.

  2. Use cURL to call the JSON API with a Get DatasetConfig request:

    curl -X GET \
    "https://storageinsights.googleapis.com/v1/projects/ PROJECT_ID 
    /locations/ LOCATION 
    /datasetConfigs" \
      --header "Authorization: Bearer $(gcloud auth print-access-token --impersonate-service-account= SERVICE_ACCOUNT 
    )" \
      --header "Accept: application/json" \
      --header "Content-Type: application/json"

    Replace:

    • PROJECT_ID with the ID of the project that the dataset configuration belongs to.

    • LOCATION with the location of the dataset and dataset configuration. For example, us-central1 .

    • SERVICE_ACCOUNT with the service account. For example, test-service-account@test-project.iam.gserviceaccount.com .

Update a dataset configuration

To update a dataset configuration, complete the following steps:

Console

  1. In the Google Cloud console, go to the Cloud Storage Storage Insights page.

    Go to Storage Insights

  2. Click the name of the dataset configuration you want to update.

  3. In the Dataset configurationtab, click Edit to update the fields.

Command line

  1. To update a dataset configuration, run the gcloud storage insights dataset-configs update command:

    gcloud storage insights dataset-configs update DATASET_CONFIG_ID 
    \
      --location= LOCATION 
    

    Replace:

    • DATASET_CONFIG_ID with the name of the dataset configuration.

    • LOCATION with the location of the dataset and dataset configuration.

    Use the following flags to update properties of the dataset configuration:

    • Use --skip-verification to skip checks and failures from the verification process, which includes checks for required IAM permissions. If used, some or all buckets might be excluded from the dataset.

    • Use --retention-period-days= DAYS to specify the moving number of days of data to capture in the dataset snapshot. For example, 90 .

    • Use --activity-data-retention-period-days= ACTIVITY_RETENTION_PERIOD_DAYS to specify the retention period for the activity data in the dataset. By default, activity data is included in the dataset, and inherits the retention period of the dataset. To override the dataset retention period, specify the number of days to retain activity data for. To exclude activity data, set the ACTIVITY_RETENTION_PERIOD_DAYS to 0 .

    • Use --description= DESCRIPTION to write a description for the dataset configuration.

    • Use --organization= ORGANIZATION_ID to specify the organization ID of the source project. If unspecified, defaults to the source project's organization ID.

JSON API

  1. Have gcloud CLI installed and initialized , which lets you generate an access token for the Authorization header.

  2. Create a JSON file that contains the following optional information:

     { 
      
     "organization_number" 
     : 
      
     " ORGANIZATION_ID 
    " 
     , 
      
     "source_projects" 
     : 
      
     { 
      
     "project_numbers" 
     : 
      
     " PROJECT_NUMBERS 
    " 
      
     }, 
      
     "retention_period_days" 
     : 
      
     " RETENTION_PERIOD 
    " 
     , 
      
     "activityDataRetentionPeriodDays" 
     : 
      
     " ACTIVITY_DATA_RETENTION_PERIOD_DAYS 
    " 
     } 
    

    Replace:

    • ORGANIZATION_ID with the resource ID of the organization to which the source projects belong to. If unspecified, defaults to the source project's organization ID.

    • PROJECT_NUMBERS with the project numbers to include in the dataset. You can specify one or more projects in a list format.

    • RETENTION_PERIOD with the moving number of days of data to capture in the dataset snapshot. For example, 90 .

    • ACTIVITY_DATA_RETENTION_PERIOD_DAYS with the number of days of activity data to capture in the dataset snapshot. By default, activity data is included in the dataset, and inherits the retention period of the dataset. To override the dataset retention period, specify the number of days to retain activity data for. To exclude activity data, set the ACTIVITY_RETENTION_PERIOD_DAYS to 0 .

  3. To update the dataset configuration, use cURL to call the JSON API with a Patch DatasetConfig request:

    curl -X PATCH --data-binary @ JSON_FILE_NAME 
    \
    "https://storageinsights.googleapis.com/v1/projects/ PROJECT_ID 
    /locations/ LOCATION 
    /datasetConfigs/ DATASET_CONFIG_ID 
    ?updateMask= UPDATE_MASK 
    " \
      --header "Authorization: Bearer $(gcloud auth print-access-token --impersonate-service-account= SERVICE_ACCOUNT 
    )" \
      --header "Accept: application/json" \
      --header "Content-Type: application/json"

    Replace:

    • JSON_FILE_NAME with the path to the JSON file you created in the previous step.

    • PROJECT_ID with the ID of the project that the dataset configuration belongs to.

    • LOCATION with the location of the dataset and dataset configuration. For example, us-central1 .

    • DATASET_CONFIG_ID with the name of the dataset configuration you want to update.

    • UPDATE_MASK is the comma-separated list of field names that this request updates. The fields use the fieldMask format and are part of the DatasetConfig resource.

    • SERVICE_ACCOUNT with the service account. For example, test-service-account@test-project.iam.gserviceaccount.com .

Delete a dataset configuration

To delete a dataset configuration, complete the following steps:

Console

  1. In the Google Cloud console, go to the Cloud Storage Storage Insights page.

    Go to Storage Insights

  2. Click the name of the dataset configuration you want to delete.

  3. Click Delete .

Command line

  1. To delete a dataset configuration, run the gcloud storage insights dataset-configs delete command:

    gcloud storage insights dataset-configs delete DATASET_CONFIG_ID 
    \
      --location= LOCATION 
    

    Replace:

    • DATASET_CONFIG_ID with the name of the dataset configuration you want to delete.

    • LOCATION with the location of the dataset and dataset configuration. For example, us-central1 .

    Use the following flags to delete a dataset configuration:

    • Use --auto-delete-link to unlink the dataset that was generated from the dataset configuration you want to delete. You must unlink a dataset before you can delete the dataset configuration that generated the dataset.

    You can also specify a full dataset configuration path. For example:

    gcloud storage insights dataset-configs delete projects/ DESTINATION_PROJECT_ID 
    /locations/ LOCATION 
    /datasetConfigs/ DATASET_CONFIG_ID 
    

JSON API

  1. Have gcloud CLI installed and initialized , which lets you generate an access token for the Authorization header.

  2. Use cURL to call the JSON API with an Delete DatasetConfig request:

    curl -X DELETE \
      "https://storageinsights.googleapis.com/v1/projects/ PROJECT_ID 
    /locations/ LOCATION 
    /datasetConfigs/ DATASET_CONFIG_ID 
    " \
      --header "Authorization: Bearer $(gcloud auth print-access-token --impersonate-service-account= SERVICE_ACCOUNT 
    )" \
        --header "Accept: application/json" \
        --header "Content-Type: application/json"

    Replace:

    • PROJECT_ID with the ID of the project that the dataset configuration belongs to.

    • LOCATION with the location of the dataset and dataset configuration. For example, us-central1 .

    • DATASET_CONFIG_ID with the name of the dataset configuration you want to delete.

    • SERVICE_ACCOUNT with the service account. For example, test-service-account@test-project.iam.gserviceaccount.com .

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: