Tool: create_cluster
Create a Dataproc cluster in a Google Cloud project
The following sample demonstrate how to use curl
to invoke the create_cluster
MCP tool.
| Curl Request |
|---|
curl --location 'https://dataproc.googleapis.com/mcp' \ --header 'content-type: application/json' \ --header 'accept: application/json, text/event-stream' \ --data '{ "method": "tools/call", "params": { "name": "create_cluster", "arguments": { // provide these details according to the tool' s MCP specification } } , "jsonrpc" : "2.0" , "id" : 1 } ' |
Input Schema
A request to create a Dataproc cluster.
CreateClusterRequest
| JSON representation |
|---|
{ "projectId" : string , "region" : string , "clusterName" : string , "masterConfig" : { object ( |
projectId
string
Required. The ID of the Google Cloud Platform project that the cluster belongs to.
region
string
Required. The Dataproc region in which to handle the request.
clusterName
string
Required. The cluster name. Cluster names within a project must be unique. Names of deleted clusters can be reused.
masterConfig
object (
InstanceGroupConfig
)
Optional. Configuration for master instances.
workerConfig
object (
InstanceGroupConfig
)
Optional. Configuration for worker instances.
secondaryWorkerConfig
object (
InstanceGroupConfig
)
Optional. Configuration for secondary worker instances.
imageVersion
string
Optional. The version of software inside the cluster. It must be one of the supported Dataproc Versions , such as "1.2" (including a subminor version, such as "1.2.29"), or the "preview" version . If unspecified, it defaults to the latest Debian version. E.g. "2.2-debian12"
image
string
Optional. The Compute Engine image resource used for cluster instances.
The URI can represent an image or image family.
Image examples:
-
https://www.googleapis.com/compute/v1/projects/[project_id]/global/images/[image-id] -
projects/[project_id]/global/images/[image-id] -
image-id
Image family examples. Dataproc will use the most recent image from the family:
-
https://www.googleapis.com/compute/v1/projects/[project_id]/global/images/family/[custom-image-family-name] -
projects/[project_id]/global/images/family/[custom-image-family-name]
If the URI is unspecified, it will be inferred from SoftwareConfig.image_version
or the system default.
zone
string
Optional. The Compute Engine zone where the cluster will be located. On a get request, zone will always be present.
A full URL, partial URI, or short name are valid. Examples:
-
https://www.googleapis.com/compute/v1/projects/[project_id]/zones/[zone] -
projects/[project_id]/zones/[zone] -
[zone]
labels
map (key: string, value: string)
Optional. The labels to associate with this cluster. Label keysmust contain 1 to 63 characters, and must conform to RFC 1035 . Label valuesmay be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 . No more than 32 labels can be associated with a cluster.
An object containing a list of "key": value
pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }
.
properties
map (key: string, value: string)
Optional. The properties to set on daemon config files.
Property keys are specified in prefix:property
format, for example core:hadoop.tmp.dir
. The following are supported prefixes and their mappings:
- capacity-scheduler:
capacity-scheduler.xml - core:
core-site.xml - distcp:
distcp-default.xml - hdfs:
hdfs-site.xml - hive:
hive-site.xml - mapred:
mapred-site.xml - pig:
pig.properties - spark:
spark-defaults.conf - yarn:
yarn-site.xml
For more information, see Cluster properties .
An object containing a list of "key": value
pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }
.
bucket
string
Optional. A Cloud Storage bucket used to stage job dependencies, config files, and job driver console output. If you do not specify a staging bucket, Cloud Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's staging bucket according to the Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket (see Dataproc staging and temp buckets
). This field requires a Cloud Storage bucket name, not a gs://...
URI to a Cloud Storage bucket.
tempBucket
string
Optional. A Cloud Storage bucket used to store ephemeral cluster and jobs data, such as Spark and MapReduce history files. If you do not specify a temp bucket, Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's temp bucket according to the Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket. The default bucket has a TTL of 90 days, but you can use any TTL (or none) if you specify a bucket (see Dataproc staging and temp buckets
). This field requires a Cloud Storage bucket name, not a gs://...
URI to a Cloud Storage bucket.
enableComponentGateway
boolean
Optional. If true, enable http access to specific ports on the cluster from external sources. Defaults to false.
serviceAccount
string
Optional. The Dataproc service account (also see VM Data Plane identity ) used by Dataproc cluster VM instances to access Google Cloud Platform services.
If not specified, the Compute Engine default service account is used.
network
string
Optional. The Compute Engine network to be used for machine communications. Cannot be specified with subnetwork_uri. If neither network_uri
nor subnetwork_uri
is specified, the "default" network of the project is used, if it exists. Cannot be a "Custom Subnet Network" (see Using Subnetworks
for more information).
A full URL, partial URI, or short name are valid. Examples:
-
https://www.googleapis.com/compute/v1/projects/[project_id]/global/networks/default -
projects/[project_id]/global/networks/default -
default
subnetwork
string
Optional. The Compute Engine subnetwork to be used for machine communications. Cannot be specified with network_uri.
A full URL, partial URI, or short name are valid. Examples:
-
https://www.googleapis.com/compute/v1/projects/[project_id]/regions/[region]/subnetworks/sub0 -
projects/[project_id]/regions/[region]/subnetworks/sub0 -
sub0
optionalComponents[]
enum (
Component
)
Optional. The set of components to activate on the cluster.
tier
enum (
ClusterTier
)
Optional. The cluster tier.
initializationActions[]
object (
NodeInitializationAction
)
Optional. Commands to execute on each node after config is completed. By default, executables are run on master and all worker nodes.
autoscalingPolicy
string
Optional. The autoscaling policy used by the cluster.
You can specify either the short name (e.g., my-policy
) or the full resource name (e.g., projects/[project_id]/locations/[region]/autoscalingPolicies/[policy_id]
).
deleteMaxIdle
string (
Duration
format)
Optional. The duration to keep the cluster alive while idling (when no jobs are running). Passing this threshold will cause the cluster to be deleted. Minimum value is 5 minutes; maximum value is 14 days.
deleteMaxAge
string (
Duration
format)
Optional. The lifetime duration of cluster. The cluster will be auto-deleted at the end of this period. Minimum value is 10 minutes; maximum value is 14 days.
stopMaxIdle
string (
Duration
format)
Optional. The duration to keep the cluster alive while idling (when no jobs are running). Passing this threshold will cause the cluster to be stopped. Minimum value is 5 minutes; maximum value is 14 days.
stopMaxAge
string (
Duration
format)
Optional. The lifetime duration of cluster. The cluster will be auto-stopped at the end of this period. Minimum value is 10 minutes; maximum value is 14 days.
tags[]
string
Optional. The Compute Engine tags to add to all instances (see Tagging instances ).
resourceManagerTags
map (key: string, value: string)
Optional. The Resource Manager tags associated with this cluster.
An object containing a list of "key": value
pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }
.
InstanceGroupConfig
| JSON representation |
|---|
{ "numInstances" : integer , "machineType" : string , "bootDiskSizeGb" : integer , "bootDiskType" : string , "preemptibility" : enum ( |
numInstances
integer
Optional. The number of VM instances in the instance group. For HA cluster master_config groups, must be set to 3. For standard cluster master_config groups, must be set to 1.
machineType
string
Optional. The Compute Engine machine type used for cluster instances.
A full URL, partial URI, or short name are valid. Examples:
-
https://www.googleapis.com/compute/v1/projects/[project_id]/zones/[zone]/machineTypes/n1-standard-2 -
projects/[project_id]/zones/[zone]/machineTypes/n1-standard-2 -
n1-standard-2
Auto Zone Exception: If you are using the Dataproc Auto Zone Placement
feature, you must use the short name of the machine type resource, for example, n1-standard-2
.
bootDiskSizeGb
integer
Optional. Size in GB of the boot disk (default is 500GB).
bootDiskType
string
Optional. Type of the boot disk (default is "pd-standard"). Valid values: "pd-balanced" (Persistent Disk Balanced Solid State Drive), "pd-ssd" (Persistent Disk Solid State Drive), or "pd-standard" (Persistent Disk Hard Disk Drive). See Disk types .
preemptibility
enum (
Preemptibility
)
Optional. Specifies the preemptibility of the instance group.
The default value for master and worker groups is NON_PREEMPTIBLE
. This default cannot be changed.
The default value for secondary instances is PREEMPTIBLE
.
accelerators[]
object (
AcceleratorConfig
)
Optional. The Compute Engine accelerator configuration for these instances.
AcceleratorConfig
| JSON representation |
|---|
{ "acceleratorTypeUri" : string , "acceleratorCount" : integer } |
acceleratorTypeUri
string
Full URL, partial URI, or short name of the accelerator type resource to expose to this instance. See Compute Engine AcceleratorTypes .
Examples:
-
https://www.googleapis.com/compute/v1/projects/[project_id]/zones/[zone]/acceleratorTypes/nvidia-tesla-t4 -
projects/[project_id]/zones/[zone]/acceleratorTypes/nvidia-tesla-t4 -
nvidia-tesla-t4
Auto Zone Exception: If you are using the Dataproc Auto Zone Placement
feature, you must use the short name of the accelerator type resource, for example, nvidia-tesla-t4
.
acceleratorCount
integer
The number of the accelerator cards of this type exposed to this instance.
LabelsEntry
| JSON representation |
|---|
{ "key" : string , "value" : string } |
| Fields | |
|---|---|
key
|
|
value
|
|
PropertiesEntry
| JSON representation |
|---|
{ "key" : string , "value" : string } |
| Fields | |
|---|---|
key
|
|
value
|
|
NodeInitializationAction
| JSON representation |
|---|
{ "executableFile" : string , "executionTimeout" : string } |
| Fields | |
|---|---|
executableFile
|
Required. Cloud Storage URI of executable file. |
executionTimeout
|
Optional. Amount of time executable has to complete. Default is 10 minutes (see JSON representation of Duration ). Cluster creation fails with an explanatory error message (the name of the executable that caused the error and the exceeded timeout period) if the executable is not completed at end of the timeout period. |
Duration
| JSON representation |
|---|
{ "seconds" : string , "nanos" : integer } |
| Fields | |
|---|---|
seconds
|
Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years |
nanos
|
Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 |
ResourceManagerTagsEntry
| JSON representation |
|---|
{ "key" : string , "value" : string } |
| Fields | |
|---|---|
key
|
|
value
|
|
Output Schema
This resource represents a long-running operation that is the result of a network API call.
Operation
| JSON representation |
|---|
{ "name" : string , "metadata" : { "@type" : string , field1 : ... , ... } , "done" : boolean , // Union field |
name
string
The server-assigned name, which is only unique within the same service that originally returns it. If you use the default HTTP mapping, the name
should be a resource name ending with operations/{unique_id}
.
metadata
object
Service-specific metadata associated with the operation. It typically contains progress information and common metadata such as create time. Some services might not provide such metadata. Any method that returns a long-running operation should document the metadata type, if any.
An object containing fields of an arbitrary type. An additional field "@type"
contains a URI identifying the type. Example: { "id": 1234, "@type": "types.example.com/standard/id" }
.
done
boolean
If the value is false
, it means the operation is still in progress. If true
, the operation is completed, and either error
or response
is available.
result
. The operation result, which can be either an error
or a valid response
. If done
== false
, neither error
nor response
is set. If done
== true
, exactly one of error
or response
can be set. Some services might not provide the result. result
can be only one of the following:error
object (
Status
)
The error result of the operation in case of failure or cancellation.
response
object
The normal, successful response of the operation. If the original method returns no data on success, such as Delete
, the response is google.protobuf.Empty
. If the original method is standard Get
/ Create
/ Update
, the response should be the resource. For other methods, the response should have the type XxxResponse
, where Xxx
is the original method name. For example, if the original method name is TakeSnapshot()
, the inferred response type is TakeSnapshotResponse
.
An object containing fields of an arbitrary type. An additional field "@type"
contains a URI identifying the type. Example: { "id": 1234, "@type": "types.example.com/standard/id" }
.
Any
| JSON representation |
|---|
{ "typeUrl" : string , "value" : string } |
| Fields | |
|---|---|
typeUrl
|
Identifies the type of the serialized Protobuf message with a URI reference consisting of a prefix ending in a slash and the fully-qualified type name. Example: type.googleapis.com/google.protobuf.StringValue This string must contain at least one The prefix is arbitrary and Protobuf implementations are expected to simply strip off everything up to and including the last All type URL strings must be legal URI references with the additional restriction (for the text format) that the content of the reference must consist only of alphanumeric characters, percent-encoded escapes, and characters in the following set (not including the outer backticks): In the original design of |
value
|
Holds a Protobuf serialization of the type described by type_url. A base64-encoded string. |
Status
| JSON representation |
|---|
{ "code" : integer , "message" : string , "details" : [ { "@type" : string , field1 : ... , ... } ] } |
| Fields | |
|---|---|
code
|
The status code, which should be an enum value of |
message
|
A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the |
details[]
|
A list of messages that carry the error details. There is a common set of message types for APIs to use. An object containing fields of an arbitrary type. An additional field |
Tool Annotations
Destructive Hint: ❌ | Idempotent Hint: ✅ | Read Only Hint: ❌ | Open World Hint: ❌

