Most streaming data pipelines require data transformations. Some users prefer transforming data after it reaches its destination in an extract, load, transform (ELT) pipeline, while others opt for transforming data before its ingestion in an extract, transform, and load (ETL) pipeline. Traditionally, this architecture required complex pipelines with tools like Dataflow or Apache Flink to perform data transformations.
Pub/Sub offers Single Message Transforms(SMTs) to simplify data transformations for streaming pipelines. SMTs enable lightweight modifications to message data and attributes directly within Pub/Sub. SMTs eliminate the need for additional data processing steps or separate data transformation products.
When an SMT is run, it takes the Pub/Sub message as input, including the message data and attributes. The output is a transformed Pub/Sub message, with modifications to the data or attributes. SMTs are integrated into the Pub/Sub API, so you can manage them as part of your topic or subscription configurations.
SMTs use cases
Consider designing an online store that wants to give customers personalized product recommendations as they browse the website. To do this, you can use Pub/Sub to collect real-time data about customer activity on the site. This includes data about the products viewed, the products added to the cart, and the ratings given to products.
However, this raw data often needs some adjustments before it can be used to generate recommendations. For example, the raw data might contain extraneous details which are irrelevant for your use case. Examples of such details are the customer's browser type or the time they visited the site. The data might also not be in the required format for the recommendation system. For example, timestamps might be in different formats, or product IDs might need to be converted to a different type.
You can use Pub/Sub SMTs to make data transformations such as the following:
-
Remove personally identifiable information (PII), such as full names and addresses, to protect customer privacy.
-
Retain only recommendation-relevant events, such as product views and purchases, and discard others, such as customer profile changes.
-
Ensure all timestamps, currency values, and product IDs adhere to a consistent format and type compatible with the recommendation system.
-
Generate new data fields from raw data, such as shopping cart total value or product page dwell time.
-
Add inferences from Vertex AI models to event data, such as classifications, predictions, sentiments, or embeddings.
In summary, SMTs enable a wide range of use cases, including the following:
-
Data masking and redaction: Protect sensitive data by masking or redacting fields like credit card numbers or PII, aiding compliance with data privacy regulations.
-
Data format conversion: Transform data between different formats to ensure compatibility with downstream systems.
-
Message filtering: Process only relevant messages by filtering out unwanted messages based on content or attributes. SMTs allow for more complex filtering conditions than Pub/Sub's built-in filters .
-
Simple data transformations: Perform basic data manipulation tasks, such as string manipulation, date formatting, or mathematical operations.
-
AI Inferences: Integrate AI models seamlessly into your Pub/Sub pipelines, by using the AI Inference SMT.
Types of SMTs
Pub/Sub supports the following SMTs:
- AI Inference : Gets inferences on Pub/Sub messages from a Vertex AI model.
- User Defined Functions : Calls a JavaScript user-defined function (UDF) to perform custom transformations on Pub/Sub messages.
Sample message flow for SMTs
The image shows an example Pub/Sub system with SMTs applied at both the
topic and subscription levels.
The following procedure shows how the messages flow in the Pub/Sub system:
-
The publisher applications Publisher 1and Publisher 2publish messages Aand Brespectively to the Pub/Sub topic.
-
The topic's SMTs transform messages Aand Binto messages A'and B', respectively.
-
If a schema is attached to the topic, the transformed messages A'and B'are validated against the schema. If, for example, A'does not match the schema, the publish of message Afails with an error.
-
The transformed messages A'and B'are written to Pub/Sub storage.
-
Pub/Sub delivers messages A'and B'to all attached subscriptions, which are Subscription 1and Subscription 2as shown in the image.
-
If Subscription 1has a filter configured, messages A'and B'are evaluated against the filter. Only messages matching the filter proceed to the next step. Other messages are automatically acknowledged by Pub/Sub.
-
If Subscription 2has a filter configured, messages A'and B'are evaluated against the filter. Only messages matching the filter proceed to the next step. Other messages are automatically acknowledged by Pub/Sub.
-
Subscription 1's SMTs transform messages A'and B'. A'becomes A''and B'becomes B''.
-
Subscription 2's SMTs transform messages A'and B'. A'remains as A'and B'is filtered out.
-
If Subscription 1is a push subscription with payload unwrapping enabled, messages A''and B''are unwrapped. If Subscription 2is a push subscription with payload unwrapping enabled, A'is unwrapped.
-
Subscriber 1receives message B'', Subscriber 2receives message A'', and Subscriber 3receives message A'.
-
Subscribers acknowledge the received messages.
-
Pub/Sub deletes the acknowledged messages from storage.
Limitations
-
Up to 5 SMTs can be enabled on a topic or subscription.
-
SMTs operate on a single Pub/Sub message. They can't aggregate multiple Pub/Sub messages.

