Package and import transforms

Apache Beam YAML lets you package and reuse transforms through Beam YAML providers . Providers allow you to encapsulate transforms into a reusable unit that you can then import in your Beam YAML pipelines. YAML, Python, and Java Apache Beam transforms can all be packaged in this way.

With the job builder, you can load providers from Cloud Storage to use them in your job.

Writing providers

Beam YAML providers are defined in YAML files. These files specify the implementation and configuration of the provided transforms. Individual provider listings are expressed as YAML list items with type and config keys. Java and Python providers also have a config key that specifies the transform implementation. YAML-defined provider implementations are expressed inline.

YAML providers

YAML providers define new YAML transforms as a map of names to transform definitions. For example, this provider defines a transform that squares a field from its input:

  - 
  
 type 
 : 
  
 yaml 
  
 transforms 
 : 
  
 SquareElement 
 : 
  
 body 
 : 
  
 type 
 : 
  
 chain 
  
 transforms 
 : 
  
 - 
  
 type 
 : 
  
 MapToFields 
  
 config 
 : 
  
 language 
 : 
  
 python 
  
 append 
 : 
  
 true 
  
 fields 
 : 
  
 power 
 : 
  
 "element 
  
 ** 
  
 2" 
 

YAML providers can also specify transform parameters with a config_schema key in the transform definition and use these parameters using Jinja2 templatization :

  - 
  
 type 
 : 
  
 yaml 
  
 transforms 
 : 
  
 RaiseElementToPower 
 : 
  
 config_schema 
 : 
  
 properties 
 : 
  
 n 
 : 
  
 { 
 type 
 : 
  
 integer 
 } 
  
 body 
 : 
  
 type 
 : 
  
 chain 
  
 transforms 
 : 
  
 - 
  
 type 
 : 
  
 MapToFields 
  
 config 
 : 
  
 language 
 : 
  
 python 
  
 append 
 : 
  
 true 
  
 fields 
 : 
  
 power 
 : 
  
 "element 
  
 ** 
  
 {{n}}" 
 

If a provided transform functions as a source, it must set requires_inputs: false :

  - 
  
 type 
 : 
  
 yaml 
  
 transforms 
 : 
  
 CreateTestElements 
 : 
  
  requires_inputs 
 : 
  
 false 
  
 body 
 : 
  
 | 
  
 type: Create 
  
 config: 
  
 elements: [1,2,3,4] 
 

It is also possible to define composite transforms:

  - 
  
 type 
 : 
  
 yaml 
  
 transforms 
 : 
  
 ConsecutivePowers 
 : 
  
 config_schema 
 : 
  
 properties 
 : 
  
 end 
 : 
  
 { 
 type 
 : 
  
 integer 
 } 
  
 n 
 : 
  
 { 
 type 
 : 
  
 integer 
 } 
  
 requires_inputs 
 : 
  
 false 
  
 body 
 : 
  
 | 
  
 type: chain 
  
 transforms: 
  
 - type: Range 
  
 config: 
  
 end: {{end}} 
  
 - type: RaiseElementToPower 
  
 config: 
  
 n: {{n}} 
 

Python providers

Python transforms can be provided using the following syntax:

  - 
  
 type 
 : 
  
 pythonPackage 
  
 config 
 : 
  
 packages 
 : 
  
 - 
  
 pypi_package>=version 
  
 transforms 
 : 
  
 MyCustomTransform 
 : 
  
 "pkg.module.PTransformClassOrCallable" 
 

For an in-depth example, see the Python provider starter project on GitHub.

Java providers

Java transforms can be provided using the following syntax:

  - 
  
 type 
 : 
  
 javaJar 
  
 config 
 : 
  
 jar 
 : 
  
 gs://your-bucket/your-java-transform.jar 
  
 transforms 
 : 
  
 MyCustomTransform 
 : 
  
 "urn:registered:in:transform" 
 

For an in-depth example, see the Java provider starter project on GitHub.

Using providers in the job builder

Transforms defined in providers can be imported from Cloud Storage and used in the job builder. To use a provider in the job builder:

  1. Save a provider as a YAML file in Cloud Storage.

    Go to Cloud Storage

  2. Go to the Jobspage in the Google Cloud console.

    Go to Jobs

  3. Click Create job from builder.

  4. Locate the YAML Providerssection. You might need to scroll.

  5. In the YAML provider pathbox, enter the Cloud Storage location of the provider file.

  6. Wait for the provider to load. If the provider is valid, the transform(s) defined in the provider will appear in the Loaded transformssection.

  7. Locate your transform's name in the Loaded transformssection and click the button to insert the transform in your job.

  8. If your transform requires parameters, define them in the YAML transform configurationeditor for your transform. Parameters should be defined as a YAML object mapping parameter names to parameter values.

What's next

Design a Mobile Site
View Site in Mobile | Classic
Share by: