Package google.cloud.dataproc.v1

Index

BatchController (interface)
SessionController (interface)
SessionTemplateController (interface)
AnalyzeOperationMetadata (message)
AnalyzeOperationMetadata.WorkloadType (enum)
AuthenticationConfig (message)
AuthenticationConfig.AuthenticationType (enum)
AutotuningConfig (message)
AutotuningConfig.Scenario (enum)
Batch (message)
Batch.State (enum)
Batch.StateHistory (message)
BatchOperationMetadata (message)
BatchOperationMetadata.BatchOperationType (enum)
CreateBatchRequest (message)
CreateSessionRequest (message)
CreateSessionTemplateRequest (message)
DeleteBatchRequest (message)
DeleteSessionRequest (message)
DeleteSessionTemplateRequest (message)
DiagnoseClusterResults (message)
EnvironmentConfig (message)
ExecutionConfig (message)
GetBatchRequest (message)
GetSessionRequest (message)
GetSessionTemplateRequest (message)
JupyterConfig (message)
JupyterConfig.Kernel (enum)
ListBatchesRequest (message)
ListBatchesResponse (message)
ListSessionTemplatesRequest (message)
ListSessionTemplatesResponse (message)
ListSessionsRequest (message)
ListSessionsResponse (message)
PeripheralsConfig (message)
PropertiesInfo (message)
PropertiesInfo.ValueInfo (message)
PyPiRepositoryConfig (message)
PySparkBatch (message)
RepositoryConfig (message)
RuntimeConfig (message)
RuntimeInfo (message)
Session (message)
Session.SessionStateHistory (message)
Session.State (enum)
SessionOperationMetadata (message)
SessionOperationMetadata.SessionOperationType (enum)
SessionTemplate (message)
SparkBatch (message)
SparkConnectConfig (message)
SparkHistoryServerConfig (message)
SparkRBatch (message)
SparkSqlBatch (message)
TerminateSessionRequest (message)
UpdateSessionTemplateRequest (message)
UsageMetrics (message)
UsageSnapshot (message)

BatchController

The BatchController provides methods to manage batch workloads.

CreateBatch

rpc CreateBatch( CreateBatchRequest ) returns ( Operation )

Creates a batch workload that executes asynchronously.

Authorization scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview .

DeleteBatch

rpc DeleteBatch( DeleteBatchRequest ) returns ( Empty )

Deletes the batch workload resource. If the batch is not in a CANCELLED , SUCCEEDED or FAILED State , the delete operation fails and the response returns FAILED_PRECONDITION .

Authorization scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview .

GetBatch

rpc GetBatch( GetBatchRequest ) returns ( Batch )

Gets the batch workload resource representation.

Authorization scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview .

ListBatches

rpc ListBatches( ListBatchesRequest ) returns ( ListBatchesResponse )

Lists batch workloads.

Authorization scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview .

SessionController

The SessionController provides methods to manage interactive sessions.

CreateSession

rpc CreateSession( CreateSessionRequest ) returns ( Operation )

Create an interactive session asynchronously.

Authorization scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview .

DeleteSession

rpc DeleteSession( DeleteSessionRequest ) returns ( Operation )

Deletes the interactive session resource. If the session is not in terminal state, it is terminated, and then deleted.

Authorization scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview .

GetSession

rpc GetSession( GetSessionRequest ) returns ( Session )

Gets the resource representation for an interactive session.

Authorization scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview .

ListSessions

rpc ListSessions( ListSessionsRequest ) returns ( ListSessionsResponse )

Lists interactive sessions.

Authorization scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview .

TerminateSession

rpc TerminateSession( TerminateSessionRequest ) returns ( Operation )

Terminates the interactive session.

Authorization scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview .

SessionTemplateController

The SessionTemplateController provides methods to manage session templates.

CreateSessionTemplate

rpc CreateSessionTemplate( CreateSessionTemplateRequest ) returns ( SessionTemplate )

Create a session template synchronously.

Authorization scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview .

DeleteSessionTemplate

rpc DeleteSessionTemplate( DeleteSessionTemplateRequest ) returns ( Empty )

Deletes a session template.

Authorization scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview .

GetSessionTemplate

rpc GetSessionTemplate( GetSessionTemplateRequest ) returns ( SessionTemplate )

Gets the resource representation for a session template.

Authorization scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview .

ListSessionTemplates

rpc ListSessionTemplates( ListSessionTemplatesRequest ) returns ( ListSessionTemplatesResponse )

Lists session templates.

Authorization scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview .

UpdateSessionTemplate

rpc UpdateSessionTemplate( UpdateSessionTemplateRequest ) returns ( SessionTemplate )

Updates the session template synchronously.

Authorization scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview .

AnalyzeOperationMetadata

Metadata describing the Analyze operation.

Fields
`analyzed_workload_name`	`string` Output only. name of the workload being analyzed.
`analyzed_workload_type`	`WorkloadType` Output only. Type of the workload being analyzed.
`analyzed_workload_uuid`	`string` Output only. unique identifier of the workload typically generated by control plane. E.g. batch uuid.
`create_time`	`Timestamp` Output only. The time when the operation was created.
`done_time`	`Timestamp` Output only. The time when the operation finished.
`description`	`string` Output only. Short description of the operation.
`labels`	`map<string, string>` Output only. Labels associated with the operation.
`warnings[]`	`string` Output only. Warnings encountered during operation execution.

WorkloadType

Workload type

Enums
`WORKLOAD_TYPE_UNSPECIFIED`	Undefined option
`BATCH`	Serverless batch job

AuthenticationConfig

Authentication configuration for a workload is used to set the default identity for the workload execution. The config specifies the type of identity (service account or user) that will be used by workloads to access resources on the project(s).

Fields

Fields
`user_workload_authentication_type`	`AuthenticationType` Optional. Authentication type for the user workload running in containers.

user_workload_authentication_type

AuthenticationType

Optional. Authentication type for the user workload running in containers.

AuthenticationType

Authentication types for workload execution.

Enums
`AUTHENTICATION_TYPE_UNSPECIFIED`	If AuthenticationType is unspecified then END_USER_CREDENTIALS is used for 3.0 and newer runtimes, and SERVICE_ACCOUNT is used for older runtimes.
`SERVICE_ACCOUNT`	Use service account credentials for authenticating to other services.
`END_USER_CREDENTIALS`	Use OAuth credentials associated with the workload creator/user for authenticating to other services.

AutotuningConfig

Autotuning configuration of the workload.

Fields

Fields
`scenarios[]`	`Scenario` Optional. Scenarios for which tunings are applied.

scenarios[]

Scenario

Optional. Scenarios for which tunings are applied.

Scenario

Scenario represents a specific goal that autotuning will attempt to achieve by modifying workloads.

Enums
`SCENARIO_UNSPECIFIED`	Default value.
`SCALING`	Scaling recommendations such as initialExecutors.
`BROADCAST_HASH_JOIN`	Adding hints for potential relation broadcasts.
`MEMORY`	Memory management for workloads.
`NONE`	No autotuning.
`AUTO`	Automatic selection of scenarios.

Batch

A representation of a batch workload in the service.

Fields

name

string

Output only. The resource name of the batch.

uuid

string

Output only. A batch UUID (Unique Universal Identifier). The service generates this value when it creates the batch.

create_time

Timestamp

Output only. The time when the batch was created.

runtime_info

RuntimeInfo

Output only. Runtime information about batch execution.

state

State

Output only. The state of the batch.

state_message

string

Output only. Batch state details, such as a failure description if the state is FAILED .

state_time

Timestamp

Output only. The time when the batch entered a current state.

creator

string

Output only. The email address of the user who created the batch.

labels

map<string, string>

Optional. The labels to associate with this batch. Label keysmust contain 1 to 63 characters, and must conform to RFC 1035 . Label valuesmay be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 . No more than 32 labels can be associated with a batch.

runtime_config

RuntimeConfig

Optional. Runtime configuration for the batch execution.

environment_config

EnvironmentConfig

Optional. Environment configuration for the batch execution.

operation

string

Output only. The resource name of the operation associated with this batch.

state_history[]

StateHistory

Output only. Historical state information for the batch.

Union field batch_config . The application/framework-specific portion of the batch configuration. batch_config can be only one of the following:

pyspark_batch

PySparkBatch

Optional. PySpark batch config.

spark_batch

SparkBatch

Optional. Spark batch config.

spark_r_batch

SparkRBatch

Optional. SparkR batch config.

spark_sql_batch

SparkSqlBatch

Optional. SparkSql batch config.

State

The batch state.

Enums
`STATE_UNSPECIFIED`	The batch state is unknown.
`PENDING`	The batch is created before running.
`RUNNING`	The batch is running.
`CANCELLING`	The batch is cancelling.
`CANCELLED`	The batch cancellation was successful.
`SUCCEEDED`	The batch completed successfully.
`FAILED`	The batch is no longer running due to an error.

StateHistory

Historical state information.

Fields

Fields
`state`	`State` Output only. The state of the batch at this point in history.
`state_message`	`string` Output only. Details about the state at this point in history.
`state_start_time`	`Timestamp` Output only. The time when the batch entered the historical state.

state

State

Output only. The state of the batch at this point in history.

state_message

string

Output only. Details about the state at this point in history.

state_start_time

Timestamp

Output only. The time when the batch entered the historical state.

BatchOperationMetadata

Metadata describing the Batch operation.

Fields
`batch`	`string` Name of the batch for the operation.
`batch_uuid`	`string` Batch UUID for the operation.
`create_time`	`Timestamp` The time when the operation was created.
`done_time`	`Timestamp` The time when the operation finished.
`operation_type`	`BatchOperationType` The operation type.
`description`	`string` Short description of the operation.
`labels`	`map<string, string>` Labels associated with the operation.
`warnings[]`	`string` Warnings encountered during operation execution.

BatchOperationType

Operation type for Batch resources

Enums
`BATCH_OPERATION_TYPE_UNSPECIFIED`	Batch operation type is unknown.
`BATCH`	Batch operation type.

CreateBatchRequest

A request to create a batch workload.

Fields

parent

string

Required. The parent resource where this batch will be created.

Authorization requires the following IAM permission on the specified resource parent :

dataproc.batches.create

batch

Batch

Required. The batch to create.

batch_id

string

Optional. The ID to use for the batch, which will become the final component of the batch's resource name.

This value must be 4-63 characters. Valid characters are /[a-z][0-9]-/ .

request_id

string

Optional. A unique ID used to identify the request. If the service receives two CreateBatchRequest s with the same request_id, the second request is ignored and the Operation that corresponds to the first Batch created and stored in the backend is returned.

Recommendation: Set this value to a UUID .

The value must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). The maximum length is 40 characters.

CreateSessionRequest

A request to create a session.

Fields

parent

string

Required. The parent resource where this session will be created.

Authorization requires the following IAM permission on the specified resource parent :

dataproc.sessions.create

session

Session

Required. The interactive session to create.

session_id

string

Required. The ID to use for the session, which becomes the final component of the session's resource name.

This value must be 4-63 characters. Valid characters are /[a-z][0-9]-/.

request_id

string

Optional. A unique ID used to identify the request. If the service receives two CreateSessionRequests s with the same ID, the second request is ignored, and the first Session is created and stored in the backend.

Recommendation: Set this value to a UUID .

The value must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). The maximum length is 40 characters.

CreateSessionTemplateRequest

A request to create a session template.

Fields

parent

string

Required. The parent resource where this session template will be created.

Authorization requires the following IAM permission on the specified resource parent :

dataproc.sessionTemplates.create

session_template

SessionTemplate

Required. The session template to create.

DeleteBatchRequest

A request to delete a batch workload.

Fields

name

string

Required. The fully qualified name of the batch to retrieve in the format "projects/PROJECT_ID/locations/DATAPROC_REGION/batches/BATCH_ID"

Authorization requires the following IAM permission on the specified resource name :

dataproc.batches.delete

DeleteSessionRequest

A request to delete a session.

Fields

name

string

Required. The name of the session resource to delete.

Authorization requires the following IAM permission on the specified resource name :

dataproc.sessions.delete

request_id

string

Optional. A unique ID used to identify the request. If the service receives two DeleteSessionRequest s with the same ID, the second request is ignored.

Recommendation: Set this value to a UUID .

The value must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). The maximum length is 40 characters.

DeleteSessionTemplateRequest

A request to delete a session template.

Fields

name

string

Required. The name of the session template resource to delete.

Authorization requires the following IAM permission on the specified resource name :

dataproc.sessionTemplates.delete

DiagnoseClusterResults

The location of diagnostic output.

Fields

Fields
`output_uri`	`string` Output only. The Cloud Storage URI of the diagnostic output. The output report is a plain text file with a summary of collected diagnostics.

output_uri

string

Output only. The Cloud Storage URI of the diagnostic output. The output report is a plain text file with a summary of collected diagnostics.

EnvironmentConfig

Environment configuration for a workload.

Fields

Fields
`execution_config`	`ExecutionConfig` Optional. Execution configuration for a workload.
`peripherals_config`	`PeripheralsConfig` Optional. Peripherals configuration that workload has access to.

execution_config

ExecutionConfig

Optional. Execution configuration for a workload.

peripherals_config

PeripheralsConfig

Optional. Peripherals configuration that workload has access to.

ExecutionConfig

Execution configuration for a workload.

Fields

service_account

string

Optional. Service account that used to execute workload.

network_tags[]

string

Optional. Tags used for network traffic control.

kms_key

string

Optional. The Cloud KMS key to use for encryption.

idle_ttl

Duration

Optional. Applies to sessions only. The duration to keep the session alive while it's idling. Exceeding this threshold causes the session to terminate. This field cannot be set on a batch workload. Minimum value is 10 minutes; maximum value is 14 days (see JSON representation of Duration ). Defaults to 1 hour if not set. If both ttl and idle_ttl are specified for an interactive session, the conditions are treated as OR conditions: the workload will be terminated when it has been idle for idle_ttl or when ttl has been exceeded, whichever occurs first.

ttl

Duration

Optional. The duration after which the workload will be terminated, specified as the JSON representation for Duration . When the workload exceeds this duration, it will be unconditionally terminated without waiting for ongoing work to finish. If ttl is not specified for a batch workload, the workload will be allowed to run until it exits naturally (or run forever without exiting). If ttl is not specified for an interactive session, it defaults to 24 hours. If ttl is not specified for a batch that uses 2.1+ runtime version, it defaults to 4 hours. Minimum value is 10 minutes; maximum value is 14 days. If both ttl and idle_ttl are specified (for an interactive session), the conditions are treated as OR conditions: the workload will be terminated when it has been idle for idle_ttl or when ttl has been exceeded, whichever occurs first.

staging_bucket

string

Optional. A Cloud Storage bucket used to stage workload dependencies, config files, and store workload output and other ephemeral data, such as Spark history files. If you do not specify a staging bucket, Cloud Dataproc will determine a Cloud Storage location according to the region where your workload is running, and then create and manage project-level, per-location staging and temporary buckets. This field requires a Cloud Storage bucket name, not a gs://... URI to a Cloud Storage bucket.

authentication_config

AuthenticationConfig

Optional. Authentication configuration used to set the default identity for the workload execution. The config specifies the type of identity (service account or user) that will be used by workloads to access resources on the project(s).

Union field network . Network configuration for workload execution. network can be only one of the following:

network_uri

string

Optional. Network URI to connect workload to.

subnetwork_uri

string

Optional. Subnetwork URI to connect workload to.

GetBatchRequest

A request to get the resource representation for a batch workload.

Fields

name

string

Required. The fully qualified name of the batch to retrieve in the format "projects/PROJECT_ID/locations/DATAPROC_REGION/batches/BATCH_ID"

Authorization requires the following IAM permission on the specified resource name :

dataproc.batches.get

GetSessionRequest

A request to get the resource representation for a session.

Fields

name

string

Required. The name of the session to retrieve.

Authorization requires the following IAM permission on the specified resource name :

dataproc.sessions.get

GetSessionTemplateRequest

A request to get the resource representation for a session template.

Fields

name

string

Required. The name of the session template to retrieve.

Authorization requires the following IAM permission on the specified resource name :

dataproc.sessionTemplates.get

JupyterConfig

Jupyter configuration for an interactive session.

Fields

Fields
`kernel`	`Kernel` Optional. Kernel
`display_name`	`string` Optional. Display name, shown in the Jupyter kernelspec card.

kernel

Kernel

Optional. Kernel

display_name

string

Optional. Display name, shown in the Jupyter kernelspec card.

Kernel

Jupyter kernel types.

Enums
`KERNEL_UNSPECIFIED`	The kernel is unknown.
`PYTHON`	Python kernel.
`SCALA`	Scala kernel.

ListBatchesRequest

A request to list batch workloads in a project.

Fields

parent

string

Required. The parent, which owns this collection of batches.

Authorization requires the following IAM permission on the specified resource parent :

dataproc.batches.list

page_size

int32

Optional. The maximum number of batches to return in each response. The service may return fewer than this value. The default page size is 20; the maximum page size is 1000.

page_token

string

Optional. A page token received from a previous ListBatches call. Provide this token to retrieve the subsequent page.

filter

string

Optional. A filter for the batches to return in the response.

A filter is a logical expression constraining the values of various fields in each batch resource. Filters are case sensitive, and may contain multiple clauses combined with logical operators (AND/OR). Supported fields are batch_id , batch_uuid , state , create_time , and labels .

e.g. state = RUNNING and create_time < "2023-01-01T00:00:00Z" filters for batches in state RUNNING that were created before 2023-01-01. state = RUNNING and labels.environment=production filters for batches in state in a RUNNING state that have a production environment label.

See https://google.aip.dev/assets/misc/ebnf-filtering.txt for a detailed description of the filter syntax and a list of supported comparisons.

order_by

string

Optional. Field(s) on which to sort the list of batches.

Currently the only supported sort orders are unspecified (empty) and create_time desc to sort by most recently created batches first.

See https://google.aip.dev/132#ordering for more details.

ListBatchesResponse

A list of batch workloads.

Fields

Fields
`batches[]`	`Batch` Output only. The batches from the specified collection.
`next_page_token`	`string` A token, which can be sent as `page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages.
`unreachable[]`	`string` Output only. List of Batches that could not be included in the response. Attempting to get one of these resources may indicate why it was not included in the list response.

batches[]

Batch

Output only. The batches from the specified collection.

next_page_token

string

A token, which can be sent as page_token to retrieve the next page. If this field is omitted, there are no subsequent pages.

unreachable[]

string

Output only. List of Batches that could not be included in the response. Attempting to get one of these resources may indicate why it was not included in the list response.

ListSessionTemplatesRequest

A request to list session templates in a project.

Fields

parent

string

Required. The parent that owns this collection of session templates.

Authorization requires the following IAM permission on the specified resource parent :

dataproc.sessionTemplates.list

page_size

int32

Optional. The maximum number of sessions to return in each response. The service may return fewer than this value.

page_token

string

Optional. A page token received from a previous ListSessions call. Provide this token to retrieve the subsequent page.

filter

string

Optional. A filter for the session templates to return in the response. Filters are case sensitive and have the following syntax:

[field = value] AND [field [= value]] ...

ListSessionTemplatesResponse

A list of session templates.

Fields

Fields
`session_templates[]`	`SessionTemplate` Output only. Session template list
`next_page_token`	`string` A token, which can be sent as `page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages.

session_templates[]

SessionTemplate

Output only. Session template list

next_page_token

string

A token, which can be sent as page_token to retrieve the next page. If this field is omitted, there are no subsequent pages.

ListSessionsRequest

A request to list sessions in a project.

Fields

parent

string

Required. The parent, which owns this collection of sessions.

Authorization requires the following IAM permission on the specified resource parent :

dataproc.sessions.list

page_size

int32

Optional. The maximum number of sessions to return in each response. The service may return fewer than this value.

page_token

string

Optional. A page token received from a previous ListSessions call. Provide this token to retrieve the subsequent page.

filter

string

Optional. A filter for the sessions to return in the response.

A filter is a logical expression constraining the values of various fields in each session resource. Filters are case sensitive, and may contain multiple clauses combined with logical operators (AND, OR). Supported fields are session_id , session_uuid , state , create_time , and labels .

Example: state = ACTIVE and create_time < "2023-01-01T00:00:00Z" is a filter for sessions in an ACTIVE state that were created before 2023-01-01. state = ACTIVE and labels.environment=production is a filter for sessions in an ACTIVE state that have a production environment label.

See https://google.aip.dev/assets/misc/ebnf-filtering.txt for a detailed description of the filter syntax and a list of supported comparators.

ListSessionsResponse

A list of interactive sessions.

Fields

Fields
`sessions[]`	`Session` Output only. The sessions from the specified collection.
`next_page_token`	`string` A token, which can be sent as `page_token` , to retrieve the next page. If this field is omitted, there are no subsequent pages.

sessions[]

Session

Output only. The sessions from the specified collection.

next_page_token

string

A token, which can be sent as page_token , to retrieve the next page. If this field is omitted, there are no subsequent pages.

PeripheralsConfig

Auxiliary services configuration for a workload.

Fields

metastore_service

string

Optional. Resource name of an existing Dataproc Metastore service.

Example:

projects/[project_id]/locations/[region]/services/[service_id]

spark_history_server_config

SparkHistoryServerConfig

Optional. The Spark History Server configuration for the workload.

PropertiesInfo

Properties of the workload organized by origin.

Fields

Fields
`autotuning_properties`	`map<string, ValueInfo >` Output only. Properties set by autotuning engine.

autotuning_properties

map<string, ValueInfo >

Output only. Properties set by autotuning engine.

ValueInfo

Annotatated property value.

Fields

Fields
`value`	`string` Property value.
`annotation`	`string` Annotation, comment or explanation why the property was set.
`overridden_value`	`string` Optional. Value which was replaced by the corresponding component.

value

string

Property value.

annotation

string

Annotation, comment or explanation why the property was set.

overridden_value

string

Optional. Value which was replaced by the corresponding component.

PyPiRepositoryConfig

Configuration for PyPi repository

Fields

Fields
`pypi_repository`	`string` Optional. The PyPi repository address. Note: This field is not available for batch workloads.

pypi_repository

string

Optional. The PyPi repository address. Note: This field is not available for batch workloads.

PySparkBatch

A configuration for running an Apache PySpark batch workload.

Fields
`main_python_file_uri`	`string` Required. The HCFS URI of the main Python file to use as the Spark driver. Must be a .py file.
`args[]`	`string` Optional. The arguments to pass to the driver. Do not include arguments that can be set as batch properties, such as `--conf` , since a collision can occur that causes an incorrect batch submission.
`python_file_uris[]`	`string` Optional. HCFS file URIs of Python files to pass to the PySpark framework. Supported file types: `.py` , `.egg` , and `.zip` .
`jar_file_uris[]`	`string` Optional. HCFS URIs of jar files to add to the classpath of the Spark driver and tasks.
`file_uris[]`	`string` Optional. HCFS URIs of files to be placed in the working directory of each executor.
`archive_uris[]`	`string` Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: `.jar` , `.tar` , `.tar.gz` , `.tgz` , and `.zip` .

RepositoryConfig

Configuration for dependency repositories

Fields

Fields
`pypi_repository_config`	`PyPiRepositoryConfig` Optional. Configuration for PyPi repository.

pypi_repository_config

PyPiRepositoryConfig

Optional. Configuration for PyPi repository.

RuntimeConfig

Runtime configuration for a workload.

Fields
`version`	`string` Optional. Version of the batch runtime.
`container_image`	`string` Optional. Optional custom container image for the job runtime environment. If not specified, a default container image will be used.
`properties`	`map<string, string>` Optional. A mapping of property names to values, which are used to configure workload execution.
`repository_config`	`RepositoryConfig` Optional. Dependency repository configuration.
`autotuning_config`	`AutotuningConfig` Optional. Autotuning configuration of the workload.
`cohort`	`string` Optional. Cohort identifier. Identifies families of the workloads that have the same shape, for example, daily ETL jobs.

RuntimeInfo

Runtime information about workload execution.

Fields
`endpoints`	`map<string, string>` Output only. Map of remote access endpoints (such as web interfaces and APIs) to their URIs.
`output_uri`	`string` Output only. A URI pointing to the location of the stdout and stderr of the workload.
`diagnostic_output_uri`	`string` Output only. A URI pointing to the location of the diagnostics tarball.
`approximate_usage`	`UsageMetrics` Output only. Approximate workload resource usage, calculated when the workload completes (see Dataproc Serverless pricing ). Note:This metric calculation may change in the future, for example, to capture cumulative workload resource consumption during workload execution (see the Dataproc Serverless release notes for announcements, changes, fixes and other Dataproc developments).
`current_usage`	`UsageSnapshot` Output only. Snapshot of current workload resource usage.
`properties_info`	`PropertiesInfo` Optional. Properties of the workload organized by origin.

Session

A representation of a session.

Fields

name

string

Identifier. The resource name of the session.

uuid

string

Output only. A session UUID (Unique Universal Identifier). The service generates this value when it creates the session.

create_time

Timestamp

Output only. The time when the session was created.

runtime_info

RuntimeInfo

Output only. Runtime information about session execution.

state

State

Output only. A state of the session.

state_message

string

Output only. Session state details, such as the failure description if the state is FAILED .

state_time

Timestamp

Output only. The time when the session entered the current state.

creator

string

Output only. The email address of the user who created the session.

labels

map<string, string>

Optional. The labels to associate with the session. Label keysmust contain 1 to 63 characters, and must conform to RFC 1035 . Label valuesmay be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 . No more than 32 labels can be associated with a session.

runtime_config

RuntimeConfig

Optional. Runtime configuration for the session execution.

environment_config

EnvironmentConfig

Optional. Environment configuration for the session execution.

user

string

Optional. The email address of the user who owns the session.

state_history[]

SessionStateHistory

Output only. Historical state information for the session.

session_template

string

Optional. The session template used by the session.

Only resource names, including project ID and location, are valid.

Example: * https://www.googleapis.com/compute/v1/projects/[project_id]/locations/[dataproc_region]/sessionTemplates/[template_id] * projects/[project_id]/locations/[dataproc_region]/sessionTemplates/[template_id]

The template must be in the same project and Dataproc region as the session.

Union field session_config . The session configuration. session_config can be only one of the following:

jupyter_session

JupyterConfig

Optional. Jupyter session config.

spark_connect_session

SparkConnectConfig

Optional. Spark connect session config.

SessionStateHistory

Historical state information.

Fields

Fields
`state`	`State` Output only. The state of the session at this point in the session history.
`state_message`	`string` Output only. Details about the state at this point in the session history.
`state_start_time`	`Timestamp` Output only. The time when the session entered the historical state.

state

State

Output only. The state of the session at this point in the session history.

state_message

string

Output only. Details about the state at this point in the session history.

state_start_time

Timestamp

Output only. The time when the session entered the historical state.

State

The session state.

Enums
`STATE_UNSPECIFIED`	The session state is unknown.
`CREATING`	The session is created prior to running.
`ACTIVE`	The session is running.
`TERMINATING`	The session is terminating.
`TERMINATED`	The session is terminated successfully.
`FAILED`	The session is no longer running due to an error.

SessionOperationMetadata

Metadata describing the Session operation.

Fields
`session`	`string` Name of the session for the operation.
`session_uuid`	`string` Session UUID for the operation.
`create_time`	`Timestamp` The time when the operation was created.
`done_time`	`Timestamp` The time when the operation was finished.
`operation_type`	`SessionOperationType` The operation type.
`description`	`string` Short description of the operation.
`labels`	`map<string, string>` Labels associated with the operation.
`warnings[]`	`string` Warnings encountered during operation execution.

SessionOperationType

Operation type for Session resources

Enums
`SESSION_OPERATION_TYPE_UNSPECIFIED`	Session operation type is unknown.
`CREATE`	Create Session operation type.
`TERMINATE`	Terminate Session operation type.
`DELETE`	Delete Session operation type.

SessionTemplate

A representation of a session template.

Fields

name

string

Required. Identifier. The resource name of the session template.

description

string

Optional. Brief description of the template.

create_time

Timestamp

Output only. The time when the template was created.

creator

string

Output only. The email address of the user who created the template.

labels

map<string, string>

Optional. Labels to associate with sessions created using this template. Label keysmust contain 1 to 63 characters, and must conform to RFC 1035 . Label valuescan be empty, but, if present, must contain 1 to 63 characters and conform to RFC 1035 . No more than 32 labels can be associated with a session.

runtime_config

RuntimeConfig

Optional. Runtime configuration for session execution.

environment_config

EnvironmentConfig

Optional. Environment configuration for session execution.

update_time

Timestamp

Output only. The time the template was last updated.

uuid

string

Output only. A session template UUID (Unique Universal Identifier). The service generates this value when it creates the session template.

Union field session_config . The session configuration. session_config can be only one of the following:

jupyter_session

JupyterConfig

Optional. Jupyter session config.

spark_connect_session

SparkConnectConfig

Optional. Spark connect session config.

SparkBatch

A configuration for running an Apache Spark batch workload.

Fields

args[]

string

Optional. The arguments to pass to the driver. Do not include arguments that can be set as batch properties, such as --conf , since a collision can occur that causes an incorrect batch submission.

jar_file_uris[]

string

Optional. HCFS URIs of jar files to add to the classpath of the Spark driver and tasks.

file_uris[]

string

Optional. HCFS URIs of files to be placed in the working directory of each executor.

archive_uris[]

string

Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar , .tar , .tar.gz , .tgz , and .zip .

Union field driver . The specification of the main method to call to drive the Spark workload. Specify either the jar file that contains the main class or the main class name. To pass both a main jar and a main class in that jar, add the jar to jar_file_uris , and then specify the main class name in main_class . driver can be only one of the following:

main_jar_file_uri

string

Optional. The HCFS URI of the jar file that contains the main class.

main_class

string

Optional. The name of the driver main class. The jar file that contains the class must be in the classpath or specified in jar_file_uris .

SparkConnectConfig

This type has no fields.

Spark connect configuration for an interactive session.

SparkHistoryServerConfig

Spark History Server configuration for the workload.

Fields

dataproc_cluster

string

Optional. Resource name of an existing Dataproc Cluster to act as a Spark History Server for the workload.

Example:

projects/[project_id]/regions/[region]/clusters/[cluster_name]

SparkRBatch

A configuration for running an Apache SparkR batch workload.

Fields
`main_r_file_uri`	`string` Required. The HCFS URI of the main R file to use as the driver. Must be a `.R` or `.r` file.
`args[]`	`string` Optional. The arguments to pass to the Spark driver. Do not include arguments that can be set as batch properties, such as `--conf` , since a collision can occur that causes an incorrect batch submission.
`file_uris[]`	`string` Optional. HCFS URIs of files to be placed in the working directory of each executor.
`archive_uris[]`	`string` Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: `.jar` , `.tar` , `.tar.gz` , `.tgz` , and `.zip` .

SparkSqlBatch

A configuration for running Apache Spark SQL queries as a batch workload.

Fields

Fields
`query_file_uri`	`string` Required. The HCFS URI of the script that contains Spark SQL queries to execute.
`query_variables`	`map<string, string>` Optional. Mapping of query variable names to values (equivalent to the Spark SQL command: `SET name="value";` ).
`jar_file_uris[]`	`string` Optional. HCFS URIs of jar files to be added to the Spark CLASSPATH.

query_file_uri

string

Required. The HCFS URI of the script that contains Spark SQL queries to execute.

query_variables

map<string, string>

Optional. Mapping of query variable names to values (equivalent to the Spark SQL command: SET name="value"; ).

jar_file_uris[]

string

Optional. HCFS URIs of jar files to be added to the Spark CLASSPATH.

TerminateSessionRequest

A request to terminate an interactive session.

Fields

name

string

Required. The name of the session resource to terminate.

Authorization requires the following IAM permission on the specified resource name :

dataproc.sessions.terminate

request_id

string

Optional. A unique ID used to identify the request. If the service receives two TerminateSessionRequest s with the same ID, the second request is ignored.

Recommendation: Set this value to a UUID .

The value must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). The maximum length is 40 characters.

UpdateSessionTemplateRequest

A request to update a session template.

Fields

session_template

SessionTemplate

Required. The updated session template.

Authorization requires the following IAM permission on the specified resource sessionTemplate :

dataproc.sessionTemplates.update

UsageMetrics

Usage metrics represent approximate total resources consumed by a workload.

Fields
`milli_dcu_seconds`	`int64` Optional. DCU (Dataproc Compute Units) usage in ( `milliDCU` x `seconds` ) (see Dataproc Serverless pricing ).
`shuffle_storage_gb_seconds`	`int64` Optional. Shuffle storage usage in ( `GB` x `seconds` ) (see Dataproc Serverless pricing ).
`milli_accelerator_seconds`	`int64` Optional. [DEPRECATED] Accelerator usage in ( `milliAccelerator` x `seconds` ) (see Dataproc Serverless pricing ).
`accelerator_type`	`string` Optional. [DEPRECATED] Accelerator type being used, if any
`update_time`	`Timestamp` Optional. The timestamp of the usage metrics.

UsageSnapshot

The usage snapshot represents the resources consumed by a workload at a specified time.

Fields
`milli_dcu`	`int64` Optional. Milli (one-thousandth) Dataproc Compute Units (DCUs) (see Dataproc Serverless pricing ).
`shuffle_storage_gb`	`int64` Optional. Shuffle Storage in gigabytes (GB). (see Dataproc Serverless pricing )
`milli_dcu_premium`	`int64` Optional. Milli (one-thousandth) Dataproc Compute Units (DCUs) charged at premium tier (see Dataproc Serverless pricing ).
`shuffle_storage_gb_premium`	`int64` Optional. Shuffle Storage in gigabytes (GB) charged at premium tier. (see Dataproc Serverless pricing )
`milli_accelerator`	`int64` Optional. Milli (one-thousandth) accelerator. (see Dataproc Serverless pricing )
`accelerator_type`	`string` Optional. Accelerator type being used, if any
`snapshot_time`	`Timestamp` Optional. The timestamp of the usage snapshot.

Package google.cloud.dataproc.v1 Stay organized with collections Save and categorize content based on your preferences.

Index

BatchController

SessionController

SessionTemplateController

AnalyzeOperationMetadata

WorkloadType

AuthenticationConfig

AuthenticationType

AutotuningConfig

Scenario

Batch

State

StateHistory

BatchOperationMetadata

BatchOperationType

CreateBatchRequest

CreateSessionRequest

CreateSessionTemplateRequest

DeleteBatchRequest

DeleteSessionRequest

DeleteSessionTemplateRequest

DiagnoseClusterResults

EnvironmentConfig

ExecutionConfig

GetBatchRequest

GetSessionRequest

GetSessionTemplateRequest

JupyterConfig

Kernel

ListBatchesRequest

ListBatchesResponse

ListSessionTemplatesRequest

ListSessionTemplatesResponse

ListSessionsRequest

ListSessionsResponse

PeripheralsConfig

PropertiesInfo

ValueInfo

PyPiRepositoryConfig

PySparkBatch

RepositoryConfig

RuntimeConfig

RuntimeInfo

Session

SessionStateHistory

State

SessionOperationMetadata

SessionOperationType

SessionTemplate

SparkBatch

SparkConnectConfig

SparkHistoryServerConfig

SparkRBatch

SparkSqlBatch

TerminateSessionRequest

UpdateSessionTemplateRequest

UsageMetrics

UsageSnapshot

Package google.cloud.dataproc.v1