Improve performance on a shared GPU by using NVIDIA MPS

If you run multiple SDK processes on a shared Dataflow GPU, you can improve GPU efficiency and utilization by enabling the NVIDIA Multi-Process Service (MPS). MPS supports concurrent processing on a GPU by enabling processes to share CUDA contexts and scheduling resources. MPS can reduce context-switching costs, increase parallelism, and reduce storage requirements.

Target workflows are Python pipelines that run on workers with more than one vCPU.

MPS is an NVIDIA technology that implements the CUDA API, an NVIDIA platform that supports general-purpose GPU computing. For more information, see the NVIDIA Multi-Process Service user guide .

Benefits

Improves parallel processing and overall throughput for GPU pipelines, especially for workloads with low GPU resource usage.
Improves GPU utilization, which might reduce your costs.

Support and limitations

MPS is supported only on Dataflow workers that use a single GPU.
The pipeline can't use pipeline options that restrict parallelism.
Don't use the pipeline option --experiments=no_use_multiple_sdk_containers when using NVIDIA MPS. This option restricts Dataflow to a single Python process per VM. This prevents MPS from effectively dividing GPU resources across processes. It also severely limits parallelism due to Python's Global Interpreter Lock (GIL) and makes many other tuning parameters ineffective.
Avoid exceeding the available GPU memory, especially for use cases that involve loading large machine learning models. Balance the number of vCPUs and SDK processes with the available GPU memory that these processes need.
MPS doesn't affect the concurrency of non-GPU operations.
Dataflow Prime doesn't support MPS.

Enable MPS

We recommend enabling MPS any time you are loading more than one copy of your model onto a single GPU (for example, by using model_copies > 1 in the RunInference transform). This allows multiple processes to access the GPU simultaneously, improving efficiency and utilization.

When you run a pipeline with GPUs , enable MPS by doing the following:

In the pipeline option --dataflow_service_options , append use_nvidia_mps to the worker_accelerator parameter.
Set the count to 1.
Don't use the pipeline option --experiments=no_use_multiple_sdk_containers .

The pipeline option --dataflow_service_options looks like the following:

 --dataflow_service_options = 
 "worker_accelerator=type: GPU_TYPE 
;count:1;install-nvidia-driver;use_nvidia_mps"

If you use TensorFlow and enable MPS, do the following:

Enable dynamic memory allocation on the GPU. Use either of the following TensorFlow options:
- Turn on memory growth by calling tf.config.experimental.set_memory_growth(gpu, True) .
- Set the environmental variable TF_FORCE_GPU_ALLOW_GROWTH to true.
Use logical devices with appropriate memory limits.
For optimal performance, enforce the use of the GPU when possible by using soft device placement or manual placement .

What's next

To review more best practices, see GPUs and worker parallelism .

Improve performance on a shared GPU by using NVIDIA MPS Stay organized with collections Save and categorize content based on your preferences.

Benefits

Support and limitations

Enable MPS

What's next

Improve performance on a shared GPU by using NVIDIA MPS