This page shows you how to manage Storage Insights datasets configurations to control the source, scope, and retention of your data. You'll learn how to view, list, update, and delete configurations, as well as how to view, query and unlink your linked datasets.
Get the required roles
To get the permissions that you need to manage dataset configurations, ask your administrator to grant you the following IAM roles on your source projects:
- To list, update, delete, and view dataset configurations: Storage Insights Admin
(
roles/storageinsights.admin) - To view and unlink datasets:
- Storage Insights Analyst
(
roles/storageinsights.analyst) - BigQuery Admin
(
roles/bigquery.admin)
- Storage Insights Analyst
(
- To delete linked datasets: BigQuery Admin
(
roles/bigquery.admin) - To view and query datasets in BigQuery:
- Storage Insights Viewer
(
roles/storageinsights.viewer) - BigQuery Job User
(
roles/bigquery.jobUser) - BigQuery Data Viewer
(
roles/bigquery.dataViewer)
- Storage Insights Viewer
(
For more information about granting roles, see Manage access to projects, folders, and organizations .
These predefined roles contain the permissions required to manage dataset configurations. To see the exact permissions that are required, expand the Required permissionssection:
Required permissions
The following permissions are required to manage dataset configurations:
- View and list dataset configuration:
-
storageinsights.datasetConfigs.get -
storageinsights.datasetConfigs.list -
storage.buckets.getObjectInsights
-
- Update and delete dataset configuration:
-
storageinsights.datasetConfigs.update -
storageinsights.datasetConfigs.delete -
storage.buckets.getObjectInsights
-
- Unlink to BigQuery dataset:
storageinsights.datasetConfigs.unlinkDataset - Query BigQuery linked datasets:
bigquery.jobs.create or bigquery.jobs.*
You might also be able to get these permissions with custom roles or other predefined roles .
View and query linked datasets
To view and query linked datasets, follow these steps:
- In the Google Cloud console, go to the Cloud Storage Storage Insights
page.
Your project shows a list of created dataset configurations.
-
Click the BigQuery linked dataset for the dataset configuration you want to view.
The Google Cloud console displays the BigQuery linked dataset. For information about the dataset schema of metadata, see Dataset schema of metadata .
-
You can query tables and views in your linked datasets in the same way you would query any other BigQuery table .
Unlink a dataset
To stop the dataset configuration from publishing to the BigQuery dataset, unlink the dataset. To unlink a dataset, complete the following steps:
Console
- In the Google Cloud console, go to the Cloud Storage Storage Insights
page.
-
Click the name of the dataset configuration that generated the dataset you want to unlink.
-
In the BigQuery linked datasetsection, click Unlink dataset.
Command line
-
To unlink the dataset, run the
gcloud storage insights dataset-configs delete-linkcommand:gcloud storage insights dataset-configs delete-link DATASET_CONFIG_ID --location= LOCATION
Replace:
-
DATASET_CONFIG_IDwith the name of the dataset configuration that generated the dataset you want to unlink. -
LOCATIONwith the location of your dataset and dataset configuration. For example,us-central1.
You can also specify a full dataset configuration path. For example:
gcloud storage insights dataset-configs delete-link projects/ DESTINATION_PROJECT_ID /locations/ LOCATION /datasetConfigs/ DATASET_CONFIG_ID
Replace:
-
DESTINATION_PROJECT_IDwith the ID of the project that contains the dataset configuration. For more information about project IDs, see Creating and managing projects . -
DATASET_CONFIG_IDwith the name of the dataset configuration that generated the dataset you want to unlink. -
LOCATIONwith the location of your dataset and dataset configuration. For example,us-central1.
-
JSON API
-
Have gcloud CLI installed and initialized , which lets you generate an access token for the
Authorizationheader. -
Create a JSON file that contains the following information:
{ "name" : " DATASET_NAME " }
Replace:
DATASET_NAMEwith the name of the dataset you want to unlink. For example,my_project.my_dataset276daa7e_2991_4f4f_b9d4_e354b48426a2. -
Use
cURLto call the JSON API with anunlinkDatasetDatasetConfig request:curl --request POST --data-binary @ JSON_FILE_NAME \ "https://storageinsights.googleapis.com/v1/projects/ PROJECT_ID /locations/ LOCATION /datasetConfigs/ DATASET_CONFIG_ID :unlinkDataset?" \ --header "Authorization: Bearer $(gcloud auth print-access-token --impersonate-service-account= SERVICE_ACCOUNT )" \ --header "Accept: application/json" \ --header "Content-Type: application/json"
Replace:
-
JSON_FILE_NAMEwith the path to the JSON file you created in the previous step. -
PROJECT_IDwith the ID of the project that the dataset configuration belongs to. -
LOCATIONwith the location of the dataset and dataset configuration. For example,us-central1. -
DATASET_CONFIG_IDwith the name of the dataset configuration that generated the dataset you want to unlink. -
SERVICE_ACCOUNTwith the service account. For example,test-service-account@test-project.iam.gserviceaccount.com.
-
View a dataset configuration
To view a dataset configuration, complete the following steps:
Console
- In the Google Cloud console, go to the Cloud Storage Storage Insights
page.
-
Click the name of the dataset configuration you want to view.
The dataset configuration details are displayed.
Command line
-
To describe a dataset configuration, run the
gcloud storage insights dataset-configs describecommand:gcloud storage insights dataset-configs describe DATASET_CONFIG_ID \ --location= LOCATION
Replace:
-
DATASET_CONFIG_IDwith the name of the dataset configuration. -
LOCATIONwith the location of the dataset and dataset configuration.
You can also specify a full dataset configuration path. For example:
gcloud storage insights dataset-configs describe projects/ DESTINATION_PROJECT_ID /locations/ LOCATION /datasetConfigs/ DATASET_CONFIG_ID
Replace:
-
DESTINATION_PROJECT_IDwith the ID of the project that contains the dataset configuration. For more information about project IDs, see Creating and managing projects . -
DATASET_CONFIG_IDwith the name of the dataset configuration that generated the dataset you want to view. -
LOCATIONwith the location of your dataset and dataset configuration. For example,us-central1.
-
JSON API
-
Have gcloud CLI installed and initialized , which lets you generate an access token for the
Authorizationheader. -
Use
cURLto call the JSON API with anGetDatasetConfig request:curl -X GET \ "https://storageinsights.googleapis.com/v1/projects/ PROJECT_ID /locations/ LOCATION /datasetConfigs/ DATASET_CONFIG_ID " \ --header "Authorization: Bearer $(gcloud auth print-access-token --impersonate-service-account= SERVICE_ACCOUNT )" \ --header "Accept: application/json" \ --header "Content-Type: application/json"
Replace:
-
PROJECT_IDwith the ID of the project that the dataset configuration belongs to. -
LOCATIONwith the location of the dataset and dataset configuration. For example,us-central1. -
DATASET_CONFIG_IDwith the name of the dataset configuration. -
SERVICE_ACCOUNTwith the service account. For example,test-service-account@test-project..
-
List dataset configurations
To list the dataset configurations in a project, complete the following steps:
Console
- In the Google Cloud console, go to the Cloud Storage Storage Insights
page.
The list of dataset configurations is displayed.
Command line
-
To list dataset configurations in a project, run the
gcloud storage insights dataset-configs listcommand:gcloud storage insights dataset-configs list --location= LOCATIONReplace:
-
LOCATIONwith the location of the dataset and dataset configuration. For example,us-central1.
You can use the following optional flags to specify the behavior of the listing call:
-
Use
--page-sizeto specify the maximum number of results to return per page. -
Use
--filter= FILTERto filter results. For more information on how to use the--filterflag, rungcloud topic filtersand refer to the documentation. -
Use
--sort-by= SORT_BY_VALUEto specify a comma-separated list of resource field key names to sort by. For example,--sort-by=DATASET_CONFIG_ID.
-
JSON API
-
Have gcloud CLI installed and initialized , which lets you generate an access token for the
Authorizationheader. -
Use
cURLto call the JSON API with aGetDatasetConfig request:curl -X GET \ "https://storageinsights.googleapis.com/v1/projects/ PROJECT_ID /locations/ LOCATION /datasetConfigs" \ --header "Authorization: Bearer $(gcloud auth print-access-token --impersonate-service-account= SERVICE_ACCOUNT )" \ --header "Accept: application/json" \ --header "Content-Type: application/json"
Replace:
-
PROJECT_IDwith the ID of the project that the dataset configuration belongs to. -
LOCATIONwith the location of the dataset and dataset configuration. For example,us-central1. -
SERVICE_ACCOUNTwith the service account. For example,test-service-account@test-project.iam.gserviceaccount.com.
-
Update a dataset configuration
To update a dataset configuration, complete the following steps:
Console
- In the Google Cloud console, go to the Cloud Storage Storage Insights
page.
-
Click the name of the dataset configuration you want to update.
-
In the Dataset configurationtab, click Edit to update the fields.
Command line
-
To update a dataset configuration, run the
gcloud storage insights dataset-configs updatecommand:gcloud storage insights dataset-configs update DATASET_CONFIG_ID \ --location= LOCATION
Replace:
-
DATASET_CONFIG_IDwith the name of the dataset configuration. -
LOCATIONwith the location of the dataset and dataset configuration.
Use the following flags to update properties of the dataset configuration:
-
Use
--skip-verificationto skip checks and failures from the verification process, which includes checks for required IAM permissions. If used, some or all buckets might be excluded from the dataset. -
Use
--retention-period-days= DAYSto specify the moving number of days of data to capture in the dataset snapshot. For example,90. -
Use
--activity-data-retention-period-days= ACTIVITY_RETENTION_PERIOD_DAYSto specify the retention period for the activity data in the dataset. By default, activity data is included in the dataset, and inherits the retention period of the dataset. To override the dataset retention period, specify the number of days to retain activity data for. To exclude activity data, set the ACTIVITY_RETENTION_PERIOD_DAYS to0. -
Use
--description= DESCRIPTIONto write a description for the dataset configuration. -
Use
--organization= ORGANIZATION_IDto specify the organization ID of the source project. If unspecified, defaults to the source project's organization ID.
-
JSON API
-
Have gcloud CLI installed and initialized , which lets you generate an access token for the
Authorizationheader. -
Create a JSON file that contains the following optional information:
{ "organization_number" : " ORGANIZATION_ID " , "source_projects" : { "project_numbers" : " PROJECT_NUMBERS " }, "retention_period_days" : " RETENTION_PERIOD " , "activityDataRetentionPeriodDays" : " ACTIVITY_DATA_RETENTION_PERIOD_DAYS " }
Replace:
-
ORGANIZATION_IDwith the resource ID of the organization to which the source projects belong to. If unspecified, defaults to the source project's organization ID. -
PROJECT_NUMBERSwith the project numbers to include in the dataset. You can specify one or more projects in a list format. -
RETENTION_PERIODwith the moving number of days of data to capture in the dataset snapshot. For example,90. -
ACTIVITY_DATA_RETENTION_PERIOD_DAYSwith the number of days of activity data to capture in the dataset snapshot. By default, activity data is included in the dataset, and inherits the retention period of the dataset. To override the dataset retention period, specify the number of days to retain activity data for. To exclude activity data, set the ACTIVITY_RETENTION_PERIOD_DAYS to0.
-
-
To update the dataset configuration, use
cURLto call the JSON API with aPatchDatasetConfig request:curl -X PATCH --data-binary @ JSON_FILE_NAME \ "https://storageinsights.googleapis.com/v1/projects/ PROJECT_ID /locations/ LOCATION /datasetConfigs/ DATASET_CONFIG_ID ?updateMask= UPDATE_MASK " \ --header "Authorization: Bearer $(gcloud auth print-access-token --impersonate-service-account= SERVICE_ACCOUNT )" \ --header "Accept: application/json" \ --header "Content-Type: application/json"
Replace:
-
JSON_FILE_NAMEwith the path to the JSON file you created in the previous step. -
PROJECT_IDwith the ID of the project that the dataset configuration belongs to. -
LOCATIONwith the location of the dataset and dataset configuration. For example,us-central1. -
DATASET_CONFIG_IDwith the name of the dataset configuration you want to update. -
UPDATE_MASKis the comma-separated list of field names that this request updates. The fields use the fieldMask format and are part of theDatasetConfigresource. -
SERVICE_ACCOUNTwith the service account. For example,test-service-account@test-project.iam.gserviceaccount.com.
-
Delete a dataset configuration
To delete a dataset configuration, complete the following steps:
Console
- In the Google Cloud console, go to the Cloud Storage Storage Insights
page.
-
Click the name of the dataset configuration you want to delete.
-
Click Delete .
Command line
-
To delete a dataset configuration, run the
gcloud storage insights dataset-configs deletecommand:gcloud storage insights dataset-configs delete DATASET_CONFIG_ID \ --location= LOCATION
Replace:
-
DATASET_CONFIG_IDwith the name of the dataset configuration you want to delete. -
LOCATIONwith the location of the dataset and dataset configuration. For example,us-central1.
Use the following flags to delete a dataset configuration:
- Use
--auto-delete-linkto unlink the dataset that was generated from the dataset configuration you want to delete. You must unlink a dataset before you can delete the dataset configuration that generated the dataset.
You can also specify a full dataset configuration path. For example:
gcloud storage insights dataset-configs delete projects/ DESTINATION_PROJECT_ID /locations/ LOCATION /datasetConfigs/ DATASET_CONFIG_ID
-
JSON API
-
Have gcloud CLI installed and initialized , which lets you generate an access token for the
Authorizationheader. -
Use
cURLto call the JSON API with anDeleteDatasetConfig request:curl -X DELETE \ "https://storageinsights.googleapis.com/v1/projects/ PROJECT_ID /locations/ LOCATION /datasetConfigs/ DATASET_CONFIG_ID " \ --header "Authorization: Bearer $(gcloud auth print-access-token --impersonate-service-account= SERVICE_ACCOUNT )" \ --header "Accept: application/json" \ --header "Content-Type: application/json"
Replace:
-
PROJECT_IDwith the ID of the project that the dataset configuration belongs to. -
LOCATIONwith the location of the dataset and dataset configuration. For example,us-central1. -
DATASET_CONFIG_IDwith the name of the dataset configuration you want to delete. -
SERVICE_ACCOUNTwith the service account. For example,test-service-account@test-project.iam.gserviceaccount.com.
-

