This document describes how you deploy a Cloud Logging log sink and a Dataflow pipeline to stream logs from Google Cloud to Datadog. It assumes that you're familiar with the reference architecture in Stream logs from Google Cloud to Datadog .
These instructions are intended for IT professionals who want to stream logs from Google Cloud to Datadog. Although it's not required, having experience with the following Google products is useful for deploying this architecture:
- Dataflow pipelines
- Pub/Sub
- Cloud Logging
- Identity and Access Management (IAM)
- Cloud Storage
You must have a Datadog account to complete this deployment. However, you don't need any familiarity with Datadog Log Management.
Architecture
The following diagram shows the architecture that's described in this document. This diagram demonstrates how log files that are generated by Google Cloud are ingested by Datadog and shown to Datadog users. Click the diagram to enlarge it.
As shown in the preceding diagram, the following events occur:
- Cloud Logging collects log files from a Google Cloud project into a designated Cloud Logging log sink and then forwards them to a Pub/Sub topic.
- A Dataflow pipeline pulls the logs from the
Pub/Sub topic, batches them, compresses them into a payload,
and then delivers them to Datadog.
- If there's a delivery failure, a secondary Dataflow pipeline sends messages from a dead-letter topic back to the primary log-forwarding topic to be redelivered.
- The logs arrive in Datadog for further analysis and monitoring.
For more information, see the Architecture section of the reference architecture.
Objectives
- Create the secure networking infrastructure.
- Create the logging and Pub/Sub infrastructure.
- Create the credentials and storage infrastructure.
- Create the Dataflow infrastructure.
- Validate that Datadog Log Explorer received logs.
- Manage delivery errors.
Costs
In this document, you use the following billable components of Google Cloud:
To generate a cost estimate based on your projected usage, use the pricing calculator .
You also use the following billable components for Datadog:
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Verify that billing is enabled for your Google Cloud project .
-
Enable the Cloud Monitoring, Secret Manager, Compute Engine, Pub/Sub, Logging, and Dataflow APIs.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Verify that billing is enabled for your Google Cloud project .
-
Enable the Cloud Monitoring, Secret Manager, Compute Engine, Pub/Sub, Logging, and Dataflow APIs.
IAM role requirements
-
Make sure that you have the following role or roles on the project: Compute > Compute Network Admin, Compute > Compute Security Admin, Dataflow > Dataflow Admin, Dataflow > Dataflow Worker, IAM > Project IAM Admin, IAM > Service Account Admin, IAM > Service Account User, Logging > Logs Configuration Writer, Logging > Logs Viewer, Pub/Sub > Pub/Sub Admin, Secret Manager > Secret Manager Admin, Storage > Storage Admin
Check for the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
-
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
- For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.
Grant the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
- Click Grant access .
-
In the New principals field, enter your user identifier. This is typically the email address for a Google Account.
- In the Select a role list, select a role.
- To grant additional roles, click Add another role and add each additional role.
- Click Save .
-
Create network infrastructure
This section describes how to create your network infrastructure to support the deployment of a Cloud Logging log sink and a Dataflow pipeline to stream logs from Google Cloud to Datadog.
Create a Virtual Private Cloud (VPC) network and subnet
To host the Dataflow pipeline worker VMs, create a Virtual Private Cloud (VPC) network and subnet:
-
In the Google Cloud console, go to the VPC networkspage.
-
Click Create VPC network.
-
In the Namefield, provide a name for the network.
-
In the Subnetssection, provide a name, region, and IP address range for the subnetwork. The size of the IP address range might vary based on your environment. A subnet mask of length
/24
is sufficient for most use cases. -
In the Private Google Accesssection, select On.
-
Click Doneand then click Create.
Create a VPC firewall rule
To restrict traffic to the Dataflow VMs, create a VPC firewall rule:
-
In the Google Cloud console, go to the Create a firewall rulepage.
-
In the Namefield, provide a name for the rule.
-
In the Descriptionfield, explain what the rule does.
-
In the Networklist, select the network for your Dataflow VMs.
-
In the Priorityfield, specify the order in which this rule is applied. Set the Priorityto
0
.Rules with lower numbers get prioritized first. The default value for this field is
1,000
. -
In the Direction of trafficsection, select Ingress.
-
In the Action on matchsection, select Allow.
Create targets, source tags, protocols, and ports
-
In the Google Cloud console, go to the Create a firewall rulepage.
-
Find the Targetslist and select Specified target tags.
-
In the Target tagsfield, enter
dataflow
. -
In the Source filterlist, select Source tags.
-
In the Source tagsfield, enter
dataflow
. -
In the Protocols and Portssection complete the following tasks:
- Select Specified protocols and ports.
- Select the TCPcheckbox.
- In the Portsfield, enter
12345-12346
.
-
Click Create.
Create a Cloud NAT gateway
To help enable secure outbound connections between Google Cloud and Datadog, create a Cloud NAT gateway.
-
In the Google Cloud console, go to the Cloud NATpage.
-
In the Cloud NAT page, click Create Cloud NAT gateway.
-
In the Gateway namefield, provide a name for the gateway.
-
In the NAT typesection, select Public.
-
In the Select Cloud Routersection, in the Networklist, select your network from the list of available networks.
-
In the Regionlist, select the region that contains your Cloud Router.
-
In the Cloud Routerlist, select or create a new router in the same network and region.
-
In the Cloud NAT mappingsection, in the Cloud NAT IP addresseslist, select Automatic.
-
Click Create.
Create logging and Pub/Sub infrastructure
Create Pub/Sub topics and subscriptions to receive and forward your logs, and to handle any delivery failures.
-
In the Google Cloud console, go to the Create a Pub/Sub topicpage.
-
In the Topic IDfield, provide a name for the topic.
- Leave the Add a default subscriptioncheckbox selected.
-
Click Create.
-
To handle any log messages that are rejected by the Datadog API, create an additional topic and default subscription. To create an additional topic and default subscription, repeat the steps in this procedure.
The additional topic is used within the Datadog Dataflow template as part of the path configuration for the
outputDeadletterTopic
template parameter.
Route the logs to Pub/Sub
This deployment describes how to create a project-level Cloud Logging log sink
.
However, you can also create an organization-level aggregated sink
that combines logs from multiple projects. Set the includeChildren
parameter
on the organization-level sink:
-
In the Google Cloud console, go to the Create logs routing sinkpage.
-
In the Sink detailssection, in the Sink namefield, enter a name.
-
Optional: In the Sink descriptionfield, explain the purpose of the log sink.
-
Click Next.
-
In the Sink destinationsection, in the Select sink servicelist, select Cloud Pub/Sub topic.
-
In the Select a Cloud Pub/Sub topiclist, select the input topic that you just created.
-
Click Next.
-
Optional: In the Choose logs to include in sinksection, in the Build inclusion filterfield, specify which logs to include in the sink by entering your logging queries.
For example, to include only 10% of the logs with a severity level of
INFO
, create an inclusion filter with severity=INFO AND sample(insertId, 0.1)
.For more information, see Logging query language .
-
Click Next.
-
Optional: In the Choose logs to filter out of sink (optional)section, create logging queries to specify which logs to exclude from the sink:
- To build an exclusion filter, click Add exclusion.
- In the Exclusion filter namefield, enter a name.
-
In the Build an exclusion filterfield, enter a filter expression that matches the log entries that you want to exclude. You can also use the
sample
function to select a portion of the log entries to exclude.To create the sink with your new exclusion filter turned off, click Disableafter you enter the expression. You can update the sink later to enable the filter.
-
Click Create sink.
Identify writer-identity values
-
In the Google Cloud console, go to the Log Routerpage.
-
In the Log Router Sinkssection, find your log sink and then click More actions.
-
Click View sink details.
-
In the Writer identityrow, next to
serviceAccount
, copy the service account ID. You use the copied service account ID value in the next section.
Add a principal value
-
Go to the Pub/Sub Topicspage.
-
Select your input topic.
-
Click Show info panel.
-
On the Info Panel, in the Permissionstab, click Add principal.
-
In the Add principalssection, in the New principalsfield, paste the Writer identityservice account ID that you copied in the previous section.
-
In the Assign rolessection, in the Select a rolelist, point to Pub/Suband click Pub/Sub Publisher.
-
Click Save.
Create credentials and storage infrastructure
To store your Datadog API key value, create a secret in Secret Manager . This API key is used by the Dataflow pipeline to forward logs to Datadog.
-
In the Google Cloud console, go to the Create secretpage.
-
In the Namefield, provide a name for your secret—for example,
my_secret
. A secret name can contain uppercase and lowercase letters, numerals, hyphens, and underscores. The maximum allowed length for a name is 255 characters. -
In the Secret valuesection, in the Secret valuefield, paste your Datadog API key value.
You can find the Datadog API key value on the Datadog Organization Settings page.
-
Click Create secret.
Create storage infrastructure
To stage temporary files for the Dataflow pipeline, create a Cloud Storage bucket with Uniform bucket-level access enabled:
-
In the Google Cloud console, go to the Create a bucketpage.
-
In the Get Startedsection, enter a globally unique, permanent name for the bucket .
-
Click Continue.
-
In the Choose where to store your datasection, select Region, select a region for your bucket, and then click Continue.
-
In the Choose a storage class for your datasection, select Standard, and then click Continue.
-
In the Choose how to control access to objectssection, find the Access controlsection, select Uniform, and then click Continue.
-
Optional: In the Choose how to protect object datasection, configure additional security settings.
-
Click Create. If prompted, leave the Enforce public access prevention on this bucketitem selected.
Create Dataflow infrastructure
In this section you create a custom Dataflow worker service account . This account should follow the principle of least privilege .
The default behavior for Dataflow pipeline workers is to use your project's Compute Engine default service account , which grants permissions to all resources in the project. If you are forwarding logs from a production environment, create a custom worker service account with only the necessary roles and permissions. Assign this service account to your Dataflow pipeline workers.
The following IAM roles are required for the Dataflow worker service account that you create in this section. The service account uses these IAM roles to interact with your Google Cloud resources and to forward your logs to Datadog through Dataflow.
- Dataflow Admin
- Dataflow Worker
- Pub/Sub Publisher
- Pub/Sub Subscriber
- Pub/Sub Viewer
- Secret Manager Secret Accessor
- Storage Object Admin
Create a Dataflow worker service account
-
In the Google Cloud console, go to the Service Accountspage.
-
In the Select a recent projectsection, select your project.
-
On the Service Accountspage, click Create service account.
-
In the Service account detailssection, in the Service account namefield, enter a name.
-
Click Create and continue.
-
In the Grant this service account access to projectsection, add the following project-level roles to the service account:
- Dataflow Admin
- Dataflow Worker
-
Click Done. The Service Accountspage appears.
-
On the Service Accountspage, click your service account.
-
In the Service account detailssection, copy the Emailvalue. You use this value in the next section. The system uses the value to configure access to your Google Cloud resources, so that the service account can interact with them.
Provide access to the Dataflow worker service account
To view and consume messages from the Pub/Sub input subscription, provide access to the Dataflow worker service account:
-
In the Google Cloud console, go to the Pub/Sub Subscriptionspage.
-
Select the checkbox next to your input subscription.
-
Click Show info panel.
-
In the Permissionstab, click Add principal.
-
In the Add principalssection, in the New principalsfield, paste the email of the service account that you created earlier.
-
In the Assign rolessection, assign the following resource-level roles to the service account:
- Pub/Sub Subscriber
- Pub/Sub Viewer
-
Click Save.
Handle failed messages
To handle failed messages, you configure the Dataflow worker service account to send any failed messages to a dead-letter topic. To send the messages back to the primary input topic after any issues are resolved, the service account needs to view and consume messages from the dead-letter subscription.
Grant access to the service account
-
In the Google Cloud console, go to the Pub/Sub Topicspage.
-
Select the checklist next to your input topic.
-
Click Show info panel.
-
In the Permissionstab, click Add principal.
-
In the Add principalssection, in the New principalsfield, paste the email of the service account that you created earlier.
-
In the Assign rolessection, assign the following resource-level role to the service account:
- Pub/Sub Publisher
-
Click Save.
Create a dead-letter topic
-
In the Google Cloud console, go to the Pub/Sub Topicspage.
-
Select the checkbox next to your dead-letter topic.
-
Click Show info panel.
-
In the Permissionstab, click Add principal.
-
In the Add principalssection, in the New principalsfield, paste the email of the service account that you created earlier.
-
In the Assign rolessection, assign the following resource-level role to the service account:
- Pub/Sub Publisher
-
Click Save.
Create a dead-letter subscription
-
In the Google Cloud console, go to the Pub/Sub Subscriptionspage.
-
Select the checkbox next to your dead-letter subscription.
-
Click Show info panel.
-
In the Permissionstab, click Add principal.
-
In the Add principalssection, in the New principalsfield, paste the email of the service account that you created earlier.
-
In the Assign rolessection, assign the following resource-level roles to the service account:
- Pub/Sub Subscriber
- Pub/Sub Viewer
-
Click Save.
Enable the Dataflow worker service account
To access the Datadog API key secret in Secret Manager, you must first enable the Dataflow worker service account. Doing so lets the Dataflow worker service account access the Datadog API key secret.
-
In the Google Cloud console, go to the Secret Managerpage.
-
Select the checkbox next to your secret.
-
Click Show info panel,
-
In the Permissionstab, click Add principal.
-
In the Add principalssection, in the New principalsfield, paste the email of the service account that you created earlier.
-
In the Assign rolessection, assign the following resource-level role to the service account:
- Secret Manager Secret Accessor
-
Click Save.
Stage files to the Cloud Storage bucket
Give the Dataflow worker service account access to read and write the Dataflow job's staging files to the Cloud Storage bucket:
-
In the Google Cloud console, go to the Bucketspage.
-
Select the checklist next to your bucket.
-
Click Permissions.
-
In the Add principalssection, in the New principalsfield, paste the email of the service account that you created earlier.
-
In the Assign rolessection, assign the following role to the service account:
- Storage Object Admin
-
Click Save.
Export logs with the Pub/Sub-to-Datadog pipeline
Provide a baseline configuration for running the Pub/Sub to Datadog pipeline in a secure network with a custom Dataflow worker service account. If you expect to stream a high volume of logs, you can also configure the following parameters and features:
-
batchCount
: The number of messages in each batched request to Datadog (from 10 to 1,000 messages, with a default value of100
). To ensure a timely and consistent flow of logs, a batch is sent at least every two seconds. -
parallelism
: The number of requests that are being sent to Datadog in parallel, with a default value of1
(no parallelism). - Horizontal Autoscaling: Enabled by default for streaming jobs that use Streaming Engine. For more information, see Streaming autoscaling .
- User-defined functions : Optional JavaScript functions that you configure to act as extensions to the template (not enabled by default).
For the Dataflow job's URL
parameter, ensure that you
select the Datadog logs API URL that corresponds to your Datadog site
:
Site | Logs API URL |
---|---|
US1 | https://http-intake.logs.datadoghq.com
|
US3 | https://http-intake.logs.us3.datadoghq.com
|
US5 | https://http-intake.logs.us5.datadoghq.com
|
EU | https://http-intake.logs.datadoghq.eu
|
AP1 | https://http-intake.logs.ap1.datadoghq.com
|
US1-FED | https://http-intake.logs.ddog-gov.com
|
Create your Dataflow job
-
In the Google Cloud console, go to the Create job from templatepage.
-
In the Job namefield, name the project.
-
From the Regional endpointlist, select a Dataflow endpoint.
-
In the Dataflow templatelist, select Pub/Sub to Datadog. The Required Parameterssection appears.
-
Configure the Required Parameterssection:
- In the Pub/Sub input subscriptionlist, select the input subscription.
- In the Datadog Logs API URLfield, enter the URL that corresponds to your Datadog site.
- In the Output deadletter Pub/Sub topiclist, select the topic that you created to receive message failures.
-
Configure the Streaming Enginesection:
- In the Temporary locationfield, specify a path for temporary files in the storage bucket that you created for that purpose.
-
Configure the Optional Parameterssection:
- In the Google Cloud Secret Manager IDfield, enter the resource name of the secret that you configured with your Datadog API key value.
Configure your credentials, service account, and networking parameters
- In the Source of the API key passedfield, select SECRET_MANAGER.
- In the Worker regionlist, select the region where you created your custom VPC and subnet.
- In the Service account emaillist, select the custom Dataflow worker service account that you created for that purpose.
- In the Worker IP Address Configurationlist, select Private.
-
In the Subnetworkfield, specify the private subnetwork that you created for the Dataflow worker VMs.
For more information, see Guidelines for specifying a subnetwork parameter for Shared VPC .
-
Optional: Customize other settings.
-
Click Run job. The Dataflow service allocates resources to run the pipeline.
Validate that Datadog Log Explorer received logs
Open the Datadog Log Explorer
,
and ensure that the timeframe is expanded to encompass the timestamp of the
logs. To validate that Datadog Log Explorer received logs, search for logs with
the gcp.dataflow.step
source attribute, or any other log attribute.
-
Validate that Datadog Log Explorer received logs from Google Cloud:
Source:gcp.dataflow.step
The output will display all of the Datadog log messages that you forwarded from the dead-letter topic to the primary log forwarding pipeline.
For more information, see Search logs in the Datadog documentation.
Manage delivery errors
Log file delivery from the Dataflow pipeline that streams Google Cloud logs to Datadog can fail occasionally. Delivery errors can be caused by:
-
4xx
errors from the Datadog logs endpoint (related to authentication or network issues). -
5xx
errors caused by server issues at the destination.
Manage 401
and 403
errors
If you encounter a 401
error or a 403
error, you must replace the
primary log-forwarding job with a replacement job that has a valid API key
value. You must then clear the messages generated by those errors from the
dead-letter topic. To clear the error messages, follow the steps in the
Troubleshoot failed messages section.
For more information about replacing the primary log-forwarding job with a replacement job, see Launch a replacement job .
Manage other 4xx
errors
To resolve all other 4xx
errors, follow the steps in the Troubleshoot
failed messages section.
Manage 5xx
errors
For 5xx
errors, delivery is automatically retried with exponential backoff
,
for a maximum of 15 minutes. This automatic process might not resolve all errors.
To clear any remaining 5xx
errors, follow the steps in the
Troubleshoot failed messages section.
Troubleshoot failed messages
When you see failed messages in the dead-letter topic, examine them. To resolve the errors, and to forward the messages from the dead-letter topic to the primary log-forwarding pipeline, complete all of the following subsections in order.
Review your dead-letter subscription
-
In the Google Cloud console, go to the Pub/Sub Subscriptionspage.
-
Click the subscription ID of the dead-letter subscriptionthat you created.
-
Click the Messagestab.
-
To view the messages, leave the Enable ack messagescheckbox cleared and click Pull.
-
Inspect the failed messages and resolve any issues.
Reprocess dead-letter messages
To reprocess dead-letter messages, first create a Dataflow job and then configure parameters.
Create your Dataflow job
-
In the Google Cloud console, go to the Create job from templatepage.
-
Give the job a name and specify the regional endpoint.
Configure your messaging and storage parameters
- In the Create job from templatepage, in the Dataflow templatelist, select the Pub/Sub to Pub/Subtemplate.
- In the Sourcesection, in the Pub/Sub input subscriptionlist, select your dead-letter subscription.
- In the Targetsection, in the Output Pub/Sub topiclist, select the primary input topic.
- In the Streaming Enginesection, in the Temporary
locationfield, specify a path and filename prefix for temporary files in the storage bucket
that you created for that purpose. For example,
gs://my-bucket/temp
.
Configure your networking and service account parameters
- In the Create job from templatepage, find the Worker regionlist and select the region where you created your custom VPC and subnet.
- In the Service Account emaillist, select the custom Dataflow worker service account email address that you created for that purpose.
- In the Worker IP Address Configurationlist, select Private.
-
In the Subnetworkfield, specify the private subnetwork that you created for the Dataflow worker VMs.
For more information, see Guidelines for specifying a subnetwork parameter for Shared VPC .
-
Optional: Customize other settings.
-
Click Run job.
Confirm the dead-letter subscription is empty
Confirming that the dead-letter subscription is empty helps ensure that you have forwarded all messages from that Pub/Sub subscription to the primary input topic.
-
In the Google Cloud console, go to the Pub/Sub Subscriptionspage.
-
Click the subscription ID of the dead-letter subscription that you created.
-
Click the Messagestab.
-
Confirm that there are no more unacknowledged messages through the Pub/Sub subscription metrics.
For more information, see Monitor message backlog .
Drain the backup Dataflow job
After you have resolved the errors, and the messages in the dead-letter topic have returned to the log-forwarding pipeline, follow these steps to stop running the Pub/Sub to Pub/Sub template.
Draining the backup Dataflow job ensures that the Dataflow service finishes processing the buffered data while also blocking the ingestion of new data.
-
In the Google Cloud console, go to the Dataflow jobspage.
-
Select the job that you want to stop. The Stop Jobswindow appears. To stop a job, the status of the job must be running.
-
Select Drain.
-
Click Stop job.
Clean up
If you don't plan to continue using the Google Cloud and Datadog resources deployed in this reference architecture, delete them to avoid incurring additional costs. There are no Datadog resources for you to delete.
Delete the project
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete .
- In the dialog, type the project ID, and then click Shut down to delete the project.
What's next
- To learn more about the benefits of the Pub/Sub to Datadog Dataflow template, read the Stream your Google Cloud logs to Datadog with Dataflow blog post.
- For more information about Cloud Logging, see Cloud Logging .
- To learn more about Datadog log management, see Best Practices for Log Management .
- For more information about Dataflow, see Dataflow .
- For more reference architectures, diagrams, and best practices, explore the Cloud Architecture Center .
Contributors
Authors:
- Ashraf Hanafy | Senior Software Engineer for Google Cloud Integrations, Datadog
- Daniel Trujillo | Engineering Manager, Google Cloud Integrations, Datadog
- Bryce Eadie | Technical Writer, Datadog
- Sriram Raman | Senior Product Manager, Google Cloud Integrations, Datadog
Other contributors:
- Maruti C | Global Partner Engineer
- Chirag Shankar | Data Engineer
- Kevin Winters | Key Enterprise Architect
- Leonid Yankulin | Developer Relations Engineer
- Mohamed Ali | Cloud Technical Solutions Developer