Service options are a type of pipeline option that let you specify additional job modes and configurations for a Dataflow job. Set these options by setting the Dataflow service options pipeline option .
Java
--dataflowServiceOptions= SERVICE_OPTION
Replace SERVICE_OPTION with the service option that you want to use.
Python
--dataflow_service_options= SERVICE_OPTION
Replace SERVICE_OPTION with the service option that you want to use.
Go
--dataflow_service_options= SERVICE_OPTION
Replace SERVICE_OPTION with the service option that you want to use.
gcloud
Use the gcloud dataflow jobs run
command
with the additional-experiments
option. If you're using Flex Templates, use
the gcloud dataflow flex-template run
command.
--additional-experiments= SERVICE_OPTION
For example:
gcloud
dataflow
jobs
run
JOB_NAME
--additional-experiments =
SERVICE_OPTION
Replace the following values:
- JOB_NAME : the name of your Dataflow job
- SERVICE_OPTION : the service option that you want to use
REST
Use the additionalExperiments
field in the RuntimeEnvironment
object. If you're using Flex Templates, use the additionalExperiments
field
in the FlexTemplateRuntimeEnvironment
object.
{
addi
t
io
nal
Experime
nts
:
[
" SERVICE_OPTION
"
]
...
}
Replace SERVICE_OPTION with the service option that you want to use.
For more information, see Set Dataflow pipeline options .
Dataflow supports the following service options.
automatically_use_created_reservation
block_project_ssh_keys
disable_image_streaming
enable_image_streaming
.enable_confidential_compute
Enables Confidential VM with AMD Secure Encryption Virtualization (SEV) on Dataflow worker VMs. For more information, see Confidential Computing concepts . This service option is not compatible with Dataflow Prime or worker accelerators. You must specify a supported machine type . When this option is enabled, the job incurs additional flat per-vCPU and per-GB costs. For more information, see Dataflow pricing .
enable_lineage
Enable data lineage for your Dataflow jobs. For more information, see Use data lineage in Dataflow .
enable_dynamic_thread_scaling
Enable dynamic thread scaling on Dataflow worker VMs. For more information, see Dynamic thread scaling .
enable_google_cloud_heap_sampling
enable_google_cloud_profiler
enable_image_streaming
Download container content as-needed instead of downloading their full content up-front. This option improves startup time and autoscaling latency for pipelines using custom containers that can process data before their full content is available.
You must have the Container File System API enabled to benefit from this option.
For more information, see Dataflow container image streaming .
enable_preflight_validation
false
.
For more information, see Pipeline validation
.enable_prime
false
.
For more information, see Use
Dataflow Prime
.enable_streaming_engine_resource_based_billing
graph_validate_only
map_task_backup_mode
Enables speculative execution for batch pipelines to mitigate the impact of slow-running or stuck tasks. When a task is identified as a straggler, a backup worker is initiated in parallel. The first task to finish is used, and the other is cancelled. This feature is not supported for streaming pipelines.
Set one of the following modes as a parameter to the flag:
-
ON: A backup worker is created if the original task is estimated to take 20% longer to complete than a new task. -
CAUTIOUS: A backup worker is created if the original task is estimated to take 70% longer to complete than a new task.
For example:
--dataflow_service_options=map_task_backup_mode=ON
For more information, see Use speculative execution to avoid stragglers .
max_workflow_runtime_walltime_seconds
The maximum number of seconds the job can run. If the job exceeds this limit, Dataflow cancels the job. This service option is supported for batch jobs only. Batch jobs can't run for more than 10 days. After 10 days, the job is cancelled.
Specify the number of seconds as a parameter to the flag. For example:
--dataflowServiceOptions=max_workflow_runtime_walltime_seconds=300
min_num_workers
num_pubsub_keys
parallel_replace_job_id
When performing an automated parallel pipeline
update
, identifies the job to replace by job ID. Use this option
with parallel_replace_job_min_parallel_pipelines_duration
.
You must provide either this option or parallel_replace_job_name
.
parallel_replace_job_min_parallel_pipelines_duration
When performing an automated parallel pipeline
update
, specifies the minimum amount of time the two pipelines run
in parallel. After this duration passes, the old job is sent a drain
signal. The duration must be between 0 seconds ( 0s
) and
31 days ( 744h
). The duration must be formatted as a
string that ends in s
, m
, or h
.
The default value is 60m
.
Specify the duration as a parameter. For example, to specify 10 minutes, use the following syntax:
--dataflowServiceOptions=parallel_replace_job_min_parallel_pipelines_duration=10m
parallel_replace_job_name
When performing an automated parallel pipeline
update
, identifies the job to replace by job name. Use this
option with parallel_replace_job_min_parallel_pipelines_duration
.
You must provide either this option or parallel_replace_job_id
.
sdf_checkpoint_after_duration
The maximum duration each worker buffers splittable DoFn
(SDF) outputs before checkpointing
for further processing. Set this duration when you want
low-latency processing on pipelines that have low throughput per
worker, such as when reading change streams from Spanner
.
The worker checkpoints when either the duration limit or the bytes limit
is triggered, so you can use this service option with sdf_checkpoint_after_output_bytes
or by itself.
This service option is supported for Streaming Engine jobs that use Runner v2 .
Specify the duration as a parameter. For example, to change the default from 5 seconds to 500 milliseconds, use the following syntax:
--dataflowServiceOptions=sdf_checkpoint_after_duration=500ms
sdf_checkpoint_after_output_bytes
The maximum splittable DoFn
(SDF) output bytes each worker produces and buffers before checkpointing
for further processing. Set this value when you want
low-latency processing on pipelines that have low throughput per worker,
such as when reading change streams from Spanner
.
The worker checkpoints when either the duration limit or the bytes limit
is triggered, so you can use this service option with sdf_checkpoint_after_duration
or by itself.
This service option is supported for Streaming Engine jobs that use Runner v2 .
Specify the number of bytes as a parameter. For example, to change the default from 5 MiB to 512 KiB, use the following syntax:
--dataflowServiceOptions=sdf_checkpoint_after_output_bytes=524288
streaming_mode_at_least_once
streaming_enable_pubsub_direct_output
use_network_tags
Apply network tags to a Dataflow job. For more information, see Use network tags with Dataflow .
use_vm_tags
Apply secure tags to a Dataflow job. For more information, see Use secure tags with Dataflow .
worker_accelerator
Enable GPUs or TPUs for this job. If you use right fitting , don't use this service option.
GPUs
Specify the type and number of GPUs to attach to Dataflow workers as parameters to the flag. For a list of GPU types that are supported with Dataflow, see Dataflow support for GPUs . For example:
--dataflow_service_options "worker_accelerator=type: GPU_TYPE
;count: GPU_COUNT
;install-nvidia-driver"
If you're using NVIDIA Multi-Process Service (MPS)
,
append the use_nvidia_mps
parameter to the end of the list
of parameters. For example:
"worker_accelerator=type: GPU_TYPE
;count: GPU_COUNT
;install-nvidia-driver;use_nvidia_mps"
For more information about using GPUs, see GPUs with Dataflow .
TPUs
Specify the type and topology of TPUs to attach to Dataflow workers as parameters to the flag. For a list of TPU types that are supported with Dataflow, see Supported TPU accelerators . For example:
--dataflow_service_options "worker_accelerator=type: TPU_TYPE
;topology: TPU_TOPOLOGY
"
For a TPU type tpu-v5-lite-podslice
with a 1x1
topology, the flag looks like the following:
"worker_accelerator=type:tpu-v5-lite-podslice;topology:1x1"
worker_utilization_hint

