This document includes the best practices and guidelines for Cloud Storage when running generative AI workloads that use Google Cloud. Use Cloud Storage with Vertex AI to store training data, model artifacts, and production data.
Consider the following use cases for Cloud Storage with Vertex AI:
- Store training data storage: Vertex AI lets you store your
training datasets in Cloud Storage buckets. Using Cloud Storage
offers several advantages:
- Cloud Storage can handle datasets of any size, allowing you to train models on massive amounts of data without storage limitations.
- You can set granular access controls and encryption on your Cloud Storage buckets to ensure that your sensitive training data is protected.
- Cloud Storage lets you track changes and revert to previous versions of your data, providing valuable audit trails and facilitating reproducible training experiments.
- Vertex AI seamlessly integrates with Cloud Storage, letting you access your training data within the platform.
- Store model artifacts: You can store trained model artifacts such as
including model files, hyperparameter configurations, and training logs, in
Cloud Storage buckets. Using Cloud Storage lets you do the following:
- Keep all your model artifacts in Cloud Storage as a centralized repository to conveniently access and manage them.
- Track and manage different versions of your models, facilitating comparisons and rollbacks if needed.
- Grant teammates and collaborators access to specific Cloud Storage buckets to efficiently share models.
- Store production data: For models used in production, Cloud Storage
can store the data being fed to the model for prediction. For example, you can
use Cloud Storage to do the following:
- Store user data and interactions for real-time personalized recommendations.
- Keep images for on-demand processing and classification using your models.
- Maintain transaction data for real-time fraud identification using your models.
- Integrate with other services: Cloud Storage integrates seamlessly
with other Google Cloud services used in Vertex AI
workflows, such as the following:
- Dataflow for streamline data preprocessing and transformation pipelines.
- BigQuery for access to large datasets stored in BigQuery for model training and inference.
- Cloud Run functions for actions based on model predictions or data changes in Cloud Storage buckets.
- Manage costs: Cloud Storage offers a pay-as-you-go pricing model, meaning you only pay for the storage you use. This provides cost efficiency, especially for large datasets.
- Enable high availability and durability: Cloud Storage ensures your data is highly available and protected against failures or outages, guaranteeing reliability and robust access to your ML assets.
- Enable multi-region support: Store your data in multiple Cloud Storage regions that are geographically closer to your users or applications, enhancing performance and reducing latency for data access and model predictions.
Required Cloud Storage controls
The following controls are strongly recommended when using Cloud Storage.
Block public access to Cloud Storage buckets
The storage.publicAccessPrevention
boolean constraint prevents access to existing and future resources over the internet. It disables and blocks access control lists (ACLs) and Identity and Access Management (IAM) permissions that grant access to allUsers
and allAuthenticatedUsers
.
- Organization Policy Service
- Cloud Storage
constraints/storage.publicAccessPrevention
==
-
True
- AC-3
- AC-17
- AC-20
- PR.AC-3.1
- PR.AC-3.2
- PR.AC-4.1
- PR.AC-4.2
- PR.AC-4.3
- PR.AC-6.1
- PR.PT-3.1
- PR.PT-4.1
Use uniform bucket-level access
The storage.uniformBucketLevelAccess
boolean constraint requires buckets to use uniform bucket-level access. Uniform bucket-level access lets you only use bucket-level Identity and Access Management (IAM) permissions to grant access to your Cloud Storage resources.
- Organization Policy Service
- Cloud Storage
constraints/storage.uniformBucketLevelAccess
==
-
True
- AC-3
- AC-17
- AC-20
- PR.AC-3.1
- PR.AC-3.2
- PR.AC-4.1
- PR.AC-4.2
- PR.AC-4.3
- PR.AC-6.1
- PR.PT-3.1
- PR.PT-4.1
Protect HMAC keys for service accounts
An HMAC key is a long-lived type of credential that is associated with a service account or a user account in Cloud Storage. Use an HMAC key to create signatures that are included in requests to Cloud Storage. A signature proves a user or service account has authorized a request.
Unlike short-lived credentials (such as. OAuth 2.0 tokens), HMAC keys don't expire automatically and remain valid until manually revoked. HMAC keys are high-risk credentials: if compromised, they provide persistent access to your resources. You must ensure appropriate mechanisms are in place to help protect them.
- Cloud Storage
storage.projects.hmacKeys/id
Exists
-
[]
- SC-12
- SC-13
- PR.DS-1.1
- PR.DS-1.2
- PR.DS-2.1
- PR.DS-2.2
- PR.DS-5.1
Detect enumeration of Cloud Storage buckets by service accounts
Service accounts are non-human identities that are designed for applications, and their behavior is predictable and automated. Normally, service accounts don't need to itemize buckets, as they're already mapped. Therefore, if you detect a service account attempting to retrieve a list of all Cloud Storage buckets, investigate it immediately. Reconnaissance enumeration is often used as a recon technique by a malicious actor that has gained access to the service account.
- Cloud Storage
- Cloud Audit Logs
==
-
storage.bucket.list
- AU-2
- AU-3
- AU-8
- AU-9
- DM.ED-7.1
- DM.ED-7.2
- DM.ED-7.3
- DM.ED-7.4
- PR.IP-1.4
Detect Identity and Access Management (IAM) policy modifications of Cloud Storage buckets by service accounts
Configure an alert that detects when a Cloud Storage bucket's IAM policy is modified to grant public access. This alert fires when the allUsers
or allAuthenticatedUsers
principals are added to a bucket's IAM policy. This alert is a critical, high-severity event because it can expose all data in the bucket. Investigate this alert immediately to confirm if the change was authorized or is a sign of a misconfiguration or malicious actor.
In the alert, set the data.protoPayload.serviceData.policyData.bindingDeltas.member
JSON attribute to allUsers
or allAuthenticatedUsers
and the action to ADD
.
- Cloud Storage
- Cloud Audit Logs
- AU-2
- AU-3
- AU-8
- AU-9
- DM.ED-7.1
- DM.ED-7.2
- DM.ED-7.3
- DM.ED-7.4
- PR.IP-1.4
Recommended controls based on generative AI use case
Depending on your use cases around generative AI, we recommend that you use additional controls. These controls include data retention controls and other policy-driven controls that are based on your enterprise policies.
Ensure Cloud Storage bucket retention policy uses Bucket Lock
Depending on your regulatory requirements, ensure that each Cloud Storage bucket retention policy is locked. Set the retention period to a timeframe that meets your requirements.
- Cloud Storage
storage.buckets/retentionPolicy.isLocked
!=
-
True
- SI-12
- PR.IP-2.1
- PR.IP-2.2
- PR.IP-2.3
Set lifecycle rules for the SetStorageClass action
Apply lifecycle rules to each Cloud Storage bucket that has a SetStorageClass
action type.
- Cloud Storage
storage.buckets/lifecycle.rule.action.type
==
-
SetStorageClass
- SI-12
- PR.IP-2.1
- PR.IP-2.2
- PR.IP-2.3
Set permitted regions for storage classes
- Cloud Storage
storage.buckets/lifecycle.rule.action.storageClass
nin
-
MULTI_REGIONAL -
REGIONAL
- SI-12
- PR.IP-2.1
- PR.IP-2.2
- PR.IP-2.3
Enable lifecycle management for Cloud Storage buckets
Ensure that lifecycle management of Cloud Storage is enabled and configured. The lifecycle control contains the configuration for the storage lifecycle. Verify that the policies in this setting match your requirements.
- Cloud Storage
storage.buckets/lifecycle
Exists
-
[]
- SI-12
- PR.IP-2.1
- PR.IP-2.2
- PR.IP-2.3
Enable lifecycle management rules for Cloud Storage buckets
Ensure that lifecycle management rules for Cloud Storage are enabled and configured. The rule control contains the configuration for the storage lifecycle. Verify that the policies in this setting match your requirements.
- Cloud Storage
storage.buckets/lifecycle.rule
Empty
-
[]
- SI-12
- PR.IP-2.1
- PR.IP-2.2
- PR.IP-2.3
Review and evaluate temporary holds on active objects
Identify all objects where temporaryHold is set to TRUE and start an investigation and validation process. This evaluation is appropriate for the following use cases:
- Legal hold:To comply with legal requirements for storing data, temporary hold can be used to prevent the deletion of sensitive data that may be relevant to ongoing investigations or litigation.
- Data loss prevention:To prevent accidental deletion of important data, temporary hold can be used as a safety measure to protect critical business information.
- Content moderation:To review potentially sensitive or inappropriate content before it becomes publicly accessible, apply a temporary hold to content uploaded to Cloud Storage for further inspection and moderation decisions.
- Cloud Storage
storage.objects/temporaryHold
==
-
TRUE
- SI-12
- PR.IP-2.1
- PR.IP-2.2
- PR.IP-2.3
Enforce retention policies on Cloud Storage buckets
Ensure that all the Cloud Storage buckets have a retention policy.
- Cloud Storage
storage.buckets/retentionPolicy.retentionPeriod
agesmaller
-
[90,"DAY","AFTER","yyyy-MM-dd'T'HH:mm:ss'Z'"]
- SI-12
- PR.IP-2.1
- PR.IP-2.2
- PR.IP-2.3
Enforce classification tags for Cloud Storage buckets
Data classification is a foundational component of any data governance and security program. Applying a classification label with values like public, internal, confidential, or restricted to each bucket is essential.
Confirm that google_storage_bucket.labels
has an expression for classification and create a violation if it doesn't.
- Cloud Storage
storage.buckets/labels.classification
notexists
-
[]
- SI-12
- PR.IP-2.1
- PR.IP-2.2
- PR.IP-2.3
Enforce log buckets for Cloud Storage buckets
Ensure that every Cloud Storage bucket includes a log bucket.
- Cloud Storage
storage.buckets/logging.logBucket
notexists
-
[]
- AU-2
- AU-3
- AU-8
- AU-9
- DM.ED-7.1
- DM.ED-7.2
- DM.ED-7.3
- DM.ED-7.4
- PR.IP-1.4
Configure deletion rules for Cloud Storage buckets
In Cloud Storage, storage.buckets/lifecycle.rule.action.type
refers to the type of action to be taken on a specific object based on a lifecycle rule within a bucket. This configuration helps automate the management and lifecycle of your data stored in the cloud.
Configure the storage.buckets/lifecycle.rule.action.type
to ensure that objects are permanently deleted from the bucket.
- Cloud Storage
storage.buckets/lifecycle.rule.action.type
==
-
Delete
- SI-12
- PR.IP-2.1
- PR.IP-2.2
- PR.IP-2.3
Ensure isLive condition is False for deletion rules
For deletion rules, ensure that the isLive
condition of the rule is set to false
.
In Cloud Storage, storage.buckets/lifecycle.rule.condition.isLive
is a boolean condition that is used in lifecycle rules to determine whether an object is considered live. This filter helps ensure that actions within a lifecycle rule are applied only to desired objects based on their live status.
Use cases:
- Archive historical versions:Archive only non-current versions of objects to save storage costs while keeping the latest version readily accessible.
- Clean up deleted objects:Automate permanent deletion of objects that have been deleted by users, freeing up space in the bucket.
- Protect live data:Ensure that actions like setting temporary holds are applied only to live objects, preventing accidental modification of archived or deleted versions
- Cloud Storage
storage.buckets/lifecycle.rule.condition.isLive
==
-
False
- SI-12
- PR.IP-2.1
- PR.IP-2.2
- PR.IP-2.3
Enforce versioning for Cloud Storage buckets
Ensure that all Cloud Storage buckets have versioning enabled. Use cases include the following:
- Data protection and recovery:Protect against accidental data loss by preventing overwrites and enabling recovery of deleted or modified data.
- Compliance and auditing:Maintain a history of all object edits for regulatory compliance or internal auditing purposes.
- Version control:Track changes to files and data sets, enabling collaboration and rollback to previous versions if necessary.
- Cloud Storage
storage.buckets/versioning.enabled
!=
-
True
- SI-12
- PR.IP-2.1
- PR.IP-2.2
- PR.IP-2.3
Enforce owners for Cloud Storage buckets
Ensure that google_storage_bucket.labels
has an expression for an owner.
- Cloud Storage
storage.buckets/labels.owner
notexists
-
[]
- SI-12
- PR.IP-2.1
- PR.IP-2.2
- PR.IP-2.3
Enable logging of key Cloud Storage activities
Enable additional logging around particular storage objects based on their use case. For example, log access to sensitive data buckets so that you can trace who gained access and when. When enabling additional logging, consider the volume of logs that you might generate.
- Cloud Storage
- AU-2
- AU-3
- AU-8
- AU-9
- DM.ED-7.1
- DM.ED-7.2
- DM.ED-7.3
- DM.ED-7.4
- PR.IP-1.4

