Running business-critical workloads on Dataproc requires multiple parties to carry different responsibilities. While not an exhaustive list, this page lists the responsibilities for Google and the customer.
Dataproc: Google responsibilities
-
Protecting the underlying infrastructure, including hardware, firmware, kernel, OS, storage, network, and more. This includes:
- encrypting data at rest by default
- providing additional customer-managed disk encryption
- encrypting data in transit
- using custom-designed hardware
- laying private network cables
- protecting data centers from physical access
- protecting the bootloader and kernel against modification using Shielded Nodes
- providing network protection with VPC Service Controls
- following secure software development practices
-
Releasing security patches for Dataproc images . This includes:
- patches for the base operating systems included in Dataproc images (Ubuntu, Debian, and Rocky Linux)
- patches and fixes available for the open source components included in Dataproc images
-
Providing Google Cloud integrations for Connect, Identity and Access Management, Cloud Audit Logs, Cloud Key Management Service, Security Command Center, and others.
-
Restricting and logging Google administrative access to customer clusters for contractual support purposes with Access Transparency and Access Approval
-
Recommending best practices for configuring Dataproc and the open source components included in Dataproc images
Dataproc: Customer responsibilities
-
Maintaining your workloads, including your application code, custom images, data, IAM policy, and clusters that you run
-
Running clusters on up-to-date Dataproc images by leveraging the latest subminor image version , promptly refreshing your custom images, and migrating to the most recent minor image version as soon as it is feasible. Image metadata includes a
previous-subminorlabel, which is set totrueif the cluster is not using the latest subminor image version. For information on how to view image metadata, see Important notes about versioning . -
Providing Google with environmental details when requested for troubleshooting purposes
-
Following best practices for the configuration of Dataproc and other Google Cloud services, and for the configuration of open source components included in Dataproc images

