"Managed Service for Apache Spark" is the new name for the product formerly known as "Dataproc on Compute Engine" (cluster deployment) and "Google Cloud Serverless for Apache Spark" (serverless deployment).

MCP Tools Reference: dataproc.googleapis.com

Tool: `create_cluster`

Create a Dataproc cluster in a Google Cloud project

The following sample demonstrate how to use curl to invoke the create_cluster MCP tool.

Curl Request
curl --location 'https://dataproc.googleapis.com/mcp' \ --header 'content-type: application/json' \ --header 'accept: application/json, text/event-stream' \ --data '{ "method": "tools/call", "params": { "name": "create_cluster", "arguments": { // provide these details according to the tool' s MCP specification } } , "jsonrpc" : "2.0" , "id" : 1 } '

Curl Request

  
curl  
--location  
 'https://dataproc.googleapis.com/mcp' 
  
 \ 
--header  
 'content-type: application/json' 
  
 \ 
--header  
 'accept: application/json, text/event-stream' 
  
 \ 
--data  
 '{ 
 "method": "tools/call", 
 "params": { 
 "name": "create_cluster", 
 "arguments": { 
 // provide these details according to the tool' 
s  
MCP  
specification  
 } 
  
 } 
,  
 "jsonrpc" 
:  
 "2.0" 
,  
 "id" 
:  
 1 
 } 
 '

Input Schema

A request to create a Dataproc cluster.

CreateClusterRequest

JSON representation

JSON representation
{ "projectId" : string , "region" : string , "clusterName" : string , "masterConfig" : { object ( `InstanceGroupConfig` ) } , "workerConfig" : { object ( `InstanceGroupConfig` ) } , "secondaryWorkerConfig" : { object ( `InstanceGroupConfig` ) } , "imageVersion" : string , "image" : string , "zone" : string , "labels" : { string : string , ... } , "properties" : { string : string , ... } , "bucket" : string , "tempBucket" : string , "enableComponentGateway" : boolean , "serviceAccount" : string , "network" : string , "subnetwork" : string , "optionalComponents" : [ enum ( `Component` ) ] , "tier" : enum ( `ClusterTier` ) , "initializationActions" : [ { object ( `NodeInitializationAction` ) } ] , "autoscalingPolicy" : string , "deleteMaxIdle" : string , "deleteMaxAge" : string , "stopMaxIdle" : string , "stopMaxAge" : string , "tags" : [ string ] , "resourceManagerTags" : { string : string , ... } }

 { 
 "projectId" 
 : 
 string 
 , 
 "region" 
 : 
 string 
 , 
 "clusterName" 
 : 
 string 
 , 
 "masterConfig" 
 : 
 { 
 object (  InstanceGroupConfig 
 
) 
 } 
 , 
 "workerConfig" 
 : 
 { 
 object (  InstanceGroupConfig 
 
) 
 } 
 , 
 "secondaryWorkerConfig" 
 : 
 { 
 object (  InstanceGroupConfig 
 
) 
 } 
 , 
 "imageVersion" 
 : 
 string 
 , 
 "image" 
 : 
 string 
 , 
 "zone" 
 : 
 string 
 , 
 "labels" 
 : 
 { 
 string 
 : 
 string 
 , 
 ... 
 } 
 , 
 "properties" 
 : 
 { 
 string 
 : 
 string 
 , 
 ... 
 } 
 , 
 "bucket" 
 : 
 string 
 , 
 "tempBucket" 
 : 
 string 
 , 
 "enableComponentGateway" 
 : 
 boolean 
 , 
 "serviceAccount" 
 : 
 string 
 , 
 "network" 
 : 
 string 
 , 
 "subnetwork" 
 : 
 string 
 , 
 "optionalComponents" 
 : 
 [ 
 enum ( Component 
) 
 ] 
 , 
 "tier" 
 : 
 enum ( ClusterTier 
) 
 , 
 "initializationActions" 
 : 
 [ 
 { 
 object (  NodeInitializationAction 
 
) 
 } 
 ] 
 , 
 "autoscalingPolicy" 
 : 
 string 
 , 
 "deleteMaxIdle" 
 : 
 string 
 , 
 "deleteMaxAge" 
 : 
 string 
 , 
 "stopMaxIdle" 
 : 
 string 
 , 
 "stopMaxAge" 
 : 
 string 
 , 
 "tags" 
 : 
 [ 
 string 
 ] 
 , 
 "resourceManagerTags" 
 : 
 { 
 string 
 : 
 string 
 , 
 ... 
 } 
 }

Fields

projectId

string

Required. The ID of the Google Cloud Platform project that the cluster belongs to.

region

string

Required. The Dataproc region in which to handle the request.

clusterName

string

Required. The cluster name. Cluster names within a project must be unique. Names of deleted clusters can be reused.

masterConfig

object ( InstanceGroupConfig )

Optional. Configuration for master instances.

workerConfig

object ( InstanceGroupConfig )

Optional. Configuration for worker instances.

secondaryWorkerConfig

object ( InstanceGroupConfig )

Optional. Configuration for secondary worker instances.

imageVersion

string

Optional. The version of software inside the cluster. It must be one of the supported Dataproc Versions , such as "1.2" (including a subminor version, such as "1.2.29"), or the "preview" version . If unspecified, it defaults to the latest Debian version. E.g. "2.2-debian12"

image

string

Optional. The Compute Engine image resource used for cluster instances.

The URI can represent an image or image family.

Image examples:

https://www.googleapis.com/compute/v1/projects/[project_id]/global/images/[image-id]
projects/[project_id]/global/images/[image-id]
image-id

Image family examples. Dataproc will use the most recent image from the family:

https://www.googleapis.com/compute/v1/projects/[project_id]/global/images/family/[custom-image-family-name]
projects/[project_id]/global/images/family/[custom-image-family-name]

If the URI is unspecified, it will be inferred from SoftwareConfig.image_version or the system default.

zone

string

Optional. The Compute Engine zone where the cluster will be located. On a get request, zone will always be present.

A full URL, partial URI, or short name are valid. Examples:

https://www.googleapis.com/compute/v1/projects/[project_id]/zones/[zone]
projects/[project_id]/zones/[zone]
[zone]

labels

map (key: string, value: string)

Optional. The labels to associate with this cluster. Label keysmust contain 1 to 63 characters, and must conform to RFC 1035 . Label valuesmay be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 . No more than 32 labels can be associated with a cluster.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" } .

properties

map (key: string, value: string)

Optional. The properties to set on daemon config files.

Property keys are specified in prefix:property format, for example core:hadoop.tmp.dir . The following are supported prefixes and their mappings:

capacity-scheduler: capacity-scheduler.xml
core: core-site.xml
distcp: distcp-default.xml
hdfs: hdfs-site.xml
hive: hive-site.xml
mapred: mapred-site.xml
pig: pig.properties
spark: spark-defaults.conf
yarn: yarn-site.xml

For more information, see Cluster properties .

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" } .

bucket

string

Optional. A Cloud Storage bucket used to stage job dependencies, config files, and job driver console output. If you do not specify a staging bucket, Cloud Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's staging bucket according to the Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket (see Dataproc staging and temp buckets ). This field requires a Cloud Storage bucket name, not a gs://... URI to a Cloud Storage bucket.

tempBucket

string

Optional. A Cloud Storage bucket used to store ephemeral cluster and jobs data, such as Spark and MapReduce history files. If you do not specify a temp bucket, Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's temp bucket according to the Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket. The default bucket has a TTL of 90 days, but you can use any TTL (or none) if you specify a bucket (see Dataproc staging and temp buckets ). This field requires a Cloud Storage bucket name, not a gs://... URI to a Cloud Storage bucket.

enableComponentGateway

boolean

Optional. If true, enable http access to specific ports on the cluster from external sources. Defaults to false.

serviceAccount

string

Optional. The Dataproc service account (also see VM Data Plane identity ) used by Dataproc cluster VM instances to access Google Cloud Platform services.

If not specified, the Compute Engine default service account is used.

network

string

Optional. The Compute Engine network to be used for machine communications. Cannot be specified with subnetwork_uri. If neither network_uri nor subnetwork_uri is specified, the "default" network of the project is used, if it exists. Cannot be a "Custom Subnet Network" (see Using Subnetworks for more information).

A full URL, partial URI, or short name are valid. Examples:

https://www.googleapis.com/compute/v1/projects/[project_id]/global/networks/default
projects/[project_id]/global/networks/default
default

subnetwork

string

Optional. The Compute Engine subnetwork to be used for machine communications. Cannot be specified with network_uri.

A full URL, partial URI, or short name are valid. Examples:

https://www.googleapis.com/compute/v1/projects/[project_id]/regions/[region]/subnetworks/sub0
projects/[project_id]/regions/[region]/subnetworks/sub0
sub0

optionalComponents[]

enum ( Component )

Optional. The set of components to activate on the cluster.

tier

enum ( ClusterTier )

Optional. The cluster tier.

initializationActions[]

object ( NodeInitializationAction )

Optional. Commands to execute on each node after config is completed. By default, executables are run on master and all worker nodes.

autoscalingPolicy

string

Optional. The autoscaling policy used by the cluster.

You can specify either the short name (e.g., my-policy ) or the full resource name (e.g., projects/[project_id]/locations/[region]/autoscalingPolicies/[policy_id] ).

deleteMaxIdle

string ( Duration format)

Optional. The duration to keep the cluster alive while idling (when no jobs are running). Passing this threshold will cause the cluster to be deleted. Minimum value is 5 minutes; maximum value is 14 days.

deleteMaxAge

string ( Duration format)

Optional. The lifetime duration of cluster. The cluster will be auto-deleted at the end of this period. Minimum value is 10 minutes; maximum value is 14 days.

stopMaxIdle

string ( Duration format)

Optional. The duration to keep the cluster alive while idling (when no jobs are running). Passing this threshold will cause the cluster to be stopped. Minimum value is 5 minutes; maximum value is 14 days.

stopMaxAge

string ( Duration format)

Optional. The lifetime duration of cluster. The cluster will be auto-stopped at the end of this period. Minimum value is 10 minutes; maximum value is 14 days.

tags[]

string

Optional. The Compute Engine tags to add to all instances (see Tagging instances ).

resourceManagerTags

map (key: string, value: string)

Optional. The Resource Manager tags associated with this cluster.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" } .

InstanceGroupConfig

JSON representation

JSON representation
{ "numInstances" : integer , "machineType" : string , "bootDiskSizeGb" : integer , "bootDiskType" : string , "preemptibility" : enum ( `Preemptibility` ) , "accelerators" : [ { object ( `AcceleratorConfig` ) } ] }

 { 
 "numInstances" 
 : 
 integer 
 , 
 "machineType" 
 : 
 string 
 , 
 "bootDiskSizeGb" 
 : 
 integer 
 , 
 "bootDiskType" 
 : 
 string 
 , 
 "preemptibility" 
 : 
 enum ( Preemptibility 
) 
 , 
 "accelerators" 
 : 
 [ 
 { 
 object (  AcceleratorConfig 
 
) 
 } 
 ] 
 }

Fields

numInstances

integer

Optional. The number of VM instances in the instance group. For HA cluster master_config groups, must be set to 3. For standard cluster master_config groups, must be set to 1.

machineType

string

Optional. The Compute Engine machine type used for cluster instances.

A full URL, partial URI, or short name are valid. Examples:

https://www.googleapis.com/compute/v1/projects/[project_id]/zones/[zone]/machineTypes/n1-standard-2
projects/[project_id]/zones/[zone]/machineTypes/n1-standard-2
n1-standard-2

Auto Zone Exception: If you are using the Dataproc Auto Zone Placement feature, you must use the short name of the machine type resource, for example, n1-standard-2 .

bootDiskSizeGb

integer

Optional. Size in GB of the boot disk (default is 500GB).

bootDiskType

string

Optional. Type of the boot disk (default is "pd-standard"). Valid values: "pd-balanced" (Persistent Disk Balanced Solid State Drive), "pd-ssd" (Persistent Disk Solid State Drive), or "pd-standard" (Persistent Disk Hard Disk Drive). See Disk types .

preemptibility

enum ( Preemptibility )

Optional. Specifies the preemptibility of the instance group.

The default value for master and worker groups is NON_PREEMPTIBLE . This default cannot be changed.

The default value for secondary instances is PREEMPTIBLE .

accelerators[]

object ( AcceleratorConfig )

Optional. The Compute Engine accelerator configuration for these instances.

AcceleratorConfig

JSON representation
{ "acceleratorTypeUri" : string , "acceleratorCount" : integer }

Fields

acceleratorTypeUri

string

Full URL, partial URI, or short name of the accelerator type resource to expose to this instance. See Compute Engine AcceleratorTypes .

Examples:

https://www.googleapis.com/compute/v1/projects/[project_id]/zones/[zone]/acceleratorTypes/nvidia-tesla-t4
projects/[project_id]/zones/[zone]/acceleratorTypes/nvidia-tesla-t4
nvidia-tesla-t4

Auto Zone Exception: If you are using the Dataproc Auto Zone Placement feature, you must use the short name of the accelerator type resource, for example, nvidia-tesla-t4 .

acceleratorCount

integer

The number of the accelerator cards of this type exposed to this instance.

LabelsEntry

JSON representation
{ "key" : string , "value" : string }

Fields
`key`	`string`
`value`	`string`

PropertiesEntry

JSON representation
{ "key" : string , "value" : string }

Fields
`key`	`string`
`value`	`string`

NodeInitializationAction

JSON representation
{ "executableFile" : string , "executionTimeout" : string }

Fields

Fields
`executableFile`	`string` Required. Cloud Storage URI of executable file.
`executionTimeout`	`string ( Duration format)` Optional. Amount of time executable has to complete. Default is 10 minutes (see JSON representation of Duration ). Cluster creation fails with an explanatory error message (the name of the executable that caused the error and the exceeded timeout period) if the executable is not completed at end of the timeout period.

executableFile

string

Required. Cloud Storage URI of executable file.

executionTimeout

string ( Duration format)

Optional. Amount of time executable has to complete. Default is 10 minutes (see JSON representation of Duration ).

Cluster creation fails with an explanatory error message (the name of the executable that caused the error and the exceeded timeout period) if the executable is not completed at end of the timeout period.

Duration

JSON representation
{ "seconds" : string , "nanos" : integer }

Fields

Fields
`seconds`	`string ( int64 format)` Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years
`nanos`	`integer` Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 `seconds` field and a positive or negative `nanos` field. For durations of one second or more, a non-zero value for the `nanos` field must be of the same sign as the `seconds` field. Must be from -999,999,999 to +999,999,999 inclusive.

seconds

string ( int64 format)

Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years

nanos

integer

Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 seconds field and a positive or negative nanos field. For durations of one second or more, a non-zero value for the nanos field must be of the same sign as the seconds field. Must be from -999,999,999 to +999,999,999 inclusive.

ResourceManagerTagsEntry

JSON representation
{ "key" : string , "value" : string }

Fields
`key`	`string`
`value`	`string`

Output Schema

This resource represents a long-running operation that is the result of a network API call.

Operation

JSON representation

JSON representation
{ "name" : string , "metadata" : { "@type" : string , field1 : ... , ... } , "done" : boolean , // Union field `result` can be only one of the following: "error" : { object ( `Status` ) } , "response" : { "@type" : string , field1 : ... , ... } // End of list of possible types for union field `result` . }

 { 
 "name" 
 : 
 string 
 , 
 "metadata" 
 : 
 { 
 "@type" 
 : 
 string 
 , 
 field1 
 : 
 ... 
 , 
 ... 
 } 
 , 
 "done" 
 : 
 boolean 
 , 
 // Union field result 
can be only one of the following: 
 "error" 
 : 
 { 
 object (  Status 
 
) 
 } 
 , 
 "response" 
 : 
 { 
 "@type" 
 : 
 string 
 , 
 field1 
 : 
 ... 
 , 
 ... 
 } 
 // End of list of possible types for union field result 
. 
 }

Fields

name

string

The server-assigned name, which is only unique within the same service that originally returns it. If you use the default HTTP mapping, the name should be a resource name ending with operations/{unique_id} .

metadata

object

Service-specific metadata associated with the operation. It typically contains progress information and common metadata such as create time. Some services might not provide such metadata. Any method that returns a long-running operation should document the metadata type, if any.

An object containing fields of an arbitrary type. An additional field "@type" contains a URI identifying the type. Example: { "id": 1234, "@type": "types.example.com/standard/id" } .

done

boolean

If the value is false , it means the operation is still in progress. If true , the operation is completed, and either error or response is available.

Union field result . The operation result, which can be either an error or a valid response . If done == false , neither error nor response is set. If done == true , exactly one of error or response can be set. Some services might not provide the result. result can be only one of the following:

error

object ( Status )

The error result of the operation in case of failure or cancellation.

response

object

The normal, successful response of the operation. If the original method returns no data on success, such as Delete , the response is google.protobuf.Empty . If the original method is standard Get / Create / Update , the response should be the resource. For other methods, the response should have the type XxxResponse , where Xxx is the original method name. For example, if the original method name is TakeSnapshot() , the inferred response type is TakeSnapshotResponse .

An object containing fields of an arbitrary type. An additional field "@type" contains a URI identifying the type. Example: { "id": 1234, "@type": "types.example.com/standard/id" } .

Any

JSON representation
{ "typeUrl" : string , "value" : string }

Fields

Fields
`typeUrl`	`string` Identifies the type of the serialized Protobuf message with a URI reference consisting of a prefix ending in a slash and the fully-qualified type name. Example: type.googleapis.com/google.protobuf.StringValue This string must contain at least one `/` character, and the content after the last `/` must be the fully-qualified name of the type in canonical form, without a leading dot. Do not write a scheme on these URI references so that clients do not attempt to contact them. The prefix is arbitrary and Protobuf implementations are expected to simply strip off everything up to and including the last `/` to identify the type. `type.googleapis.com/` is a common default prefix that some legacy implementations require. This prefix does not indicate the origin of the type, and URIs containing it are not expected to respond to any requests. All type URL strings must be legal URI references with the additional restriction (for the text format) that the content of the reference must consist only of alphanumeric characters, percent-encoded escapes, and characters in the following set (not including the outer backticks): `/-.~_!$&()*+,;=` . Despite our allowing percent encodings, implementations should not unescape them to prevent confusion with existing parsers. For example, `type.googleapis.com%2FFoo` should be rejected. In the original design of `Any` , the possibility of launching a type resolution service at these type URLs was considered but Protobuf never implemented one and considers contacting these URLs to be problematic and a potential security issue. Do not attempt to contact type URLs.
`value`	`string ( bytes format)` Holds a Protobuf serialization of the type described by type_url. A base64-encoded string.

typeUrl

string

Identifies the type of the serialized Protobuf message with a URI reference consisting of a prefix ending in a slash and the fully-qualified type name.

Example: type.googleapis.com/google.protobuf.StringValue

This string must contain at least one / character, and the content after the last / must be the fully-qualified name of the type in canonical form, without a leading dot. Do not write a scheme on these URI references so that clients do not attempt to contact them.

The prefix is arbitrary and Protobuf implementations are expected to simply strip off everything up to and including the last / to identify the type. type.googleapis.com/ is a common default prefix that some legacy implementations require. This prefix does not indicate the origin of the type, and URIs containing it are not expected to respond to any requests.

All type URL strings must be legal URI references with the additional restriction (for the text format) that the content of the reference must consist only of alphanumeric characters, percent-encoded escapes, and characters in the following set (not including the outer backticks): /-.~_!$&()*+,;= . Despite our allowing percent encodings, implementations should not unescape them to prevent confusion with existing parsers. For example, type.googleapis.com%2FFoo should be rejected.

In the original design of Any , the possibility of launching a type resolution service at these type URLs was considered but Protobuf never implemented one and considers contacting these URLs to be problematic and a potential security issue. Do not attempt to contact type URLs.

value

string ( bytes format)

Holds a Protobuf serialization of the type described by type_url.

A base64-encoded string.

Status

JSON representation
{ "code" : integer , "message" : string , "details" : [ { "@type" : string , field1 : ... , ... } ] }

Fields

Fields
`code`	`integer` The status code, which should be an enum value of `google.rpc.Code` .
`message`	`string` A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the `google.rpc.Status.details` field, or localized by the client.
`details[]`	`object` A list of messages that carry the error details. There is a common set of message types for APIs to use. An object containing fields of an arbitrary type. An additional field `"@type"` contains a URI identifying the type. Example: `{ "id": 1234, "@type": "types.example.com/standard/id" }` .

code

integer

The status code, which should be an enum value of google.rpc.Code .

message

string

A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client.

details[]

object

A list of messages that carry the error details. There is a common set of message types for APIs to use.

An object containing fields of an arbitrary type. An additional field "@type" contains a URI identifying the type. Example: { "id": 1234, "@type": "types.example.com/standard/id" } .

Tool Annotations

Destructive Hint: ❌ | Idempotent Hint: ✅ | Read Only Hint: ❌ | Open World Hint: ❌

MCP Tools Reference: dataproc.googleapis.com Stay organized with collections Save and categorize content based on your preferences.

Tool: create_cluster

Input Schema

CreateClusterRequest

InstanceGroupConfig

AcceleratorConfig

LabelsEntry

PropertiesEntry

NodeInitializationAction

Duration

ResourceManagerTagsEntry

Output Schema

Operation

Any

Status

Tool Annotations

MCP Tools Reference: dataproc.googleapis.com

Tool: `create_cluster`