Use custom containers in Dataflow

You can customize the runtime environment of user code in Dataflow pipelines by supplying a custom container image. Custom containers are supported for pipelines that use Dataflow Runner v2 .

When Dataflow starts up worker VMs, it uses Docker container images to launch containerized SDK processes on the workers. By default, a pipeline uses a prebuilt Apache Beam image . However, you can provide a custom container image for your Dataflow job. When you specify a custom container image, Dataflow launches workers that pull the specified image.

You might use a custom container for the following reasons:

  • Preinstall pipeline dependencies to reduce worker start time.
  • Preinstall pipeline dependencies that are not available in public repositories.
  • Preinstall pipeline dependencies when access to public repositories is turned off. Access might be turned off for security reasons.
  • Prestage large files to reduce worker start time.
  • Launch third-party software in the background.
  • Customize the execution environment.

For more information about custom containers in Apache Beam, see the Apache Beam custom container guide . For examples of Python pipelines that use custom containers, see Dataflow custom containers .

Next steps

Create a Mobile Website
View Site in Mobile | Classic
Share by: