If you run multiple SDK processes on a shared Dataflow GPU, you can improve GPU efficiency and utilization by enabling the NVIDIA Multi-Process Service (MPS). MPS supports concurrent processing on a GPU by enabling processes to share CUDA contexts and scheduling resources. MPS can reduce context-switching costs, increase parallelism, and reduce storage requirements.
Target workflows are Python pipelines that run on workers with more than one vCPU.
MPS is an NVIDIA technology that implements the CUDA API, an NVIDIA platform that supports general-purpose GPU computing. For more information, see the NVIDIA Multi-Process Service user guide .
Benefits
- Improves parallel processing and overall throughput for GPU pipelines, especially for workloads with low GPU resource usage.
- Improves GPU utilization, which might reduce your costs.
Support and limitations
- MPS is supported only on Dataflow workers that use a single GPU.
- The pipeline can't use pipeline options that restrict parallelism.
- Don't use the pipeline option
--experiments=no_use_multiple_sdk_containerswhen using NVIDIA MPS. This option restricts Dataflow to a single Python process per VM. This prevents MPS from effectively dividing GPU resources across processes. It also severely limits parallelism due to Python's Global Interpreter Lock (GIL) and makes many other tuning parameters ineffective. - Avoid exceeding the available GPU memory, especially for use cases that involve loading large machine learning models. Balance the number of vCPUs and SDK processes with the available GPU memory that these processes need.
- MPS doesn't affect the concurrency of non-GPU operations.
- Dataflow Prime doesn't support MPS.
Enable MPS
We recommend enabling MPS any time you are loading more than one copy of your
model onto a single GPU (for example, by using model_copies > 1
in the RunInference
transform). This allows multiple processes to access the GPU
simultaneously, improving efficiency and utilization.
When you run a pipeline with GPUs , enable MPS by doing the following:
- In the pipeline option
--dataflow_service_options, appenduse_nvidia_mpsto theworker_acceleratorparameter. - Set the
countto 1. - Don't use the pipeline option
--experiments=no_use_multiple_sdk_containers.
The pipeline option --dataflow_service_options
looks like the following:
--dataflow_service_options =
"worker_accelerator=type: GPU_TYPE
;count:1;install-nvidia-driver;use_nvidia_mps"
If you use TensorFlow and enable MPS, do the following:
- Enable dynamic memory allocation
on the GPU. Use either of the following TensorFlow options:
- Turn on memory growth by calling
tf.config.experimental.set_memory_growth(gpu, True). - Set the environmental variable
TF_FORCE_GPU_ALLOW_GROWTHto true.
- Turn on memory growth by calling
- Use logical devices with appropriate memory limits.
- For optimal performance, enforce the use of the GPU when possible by using soft device placement or manual placement .
What's next
- To review more best practices, see GPUs and worker parallelism .

