The stream might incur errors during runtime.
- Some errors, such as a bad password on the source database, are recoverable, meaning they can be fixed and the stream resumes automatically.
- Errors may impact a single object, such as an event containing unsupported data types. Other errors may impact several objects or the entire stream, such as when Datastream can't connect to the source database.
- Depending on the error, information is provided in the Streamsor Stream detailspages of the Datastream UI. You can also use Datastream's APIs to retrieve information about the error.
To troubleshoot an error, navigate to the stream to view the error, and follow the steps outlined in the error message.
This page contains information about configuration, connectivity, Oracle, and MySQL errors, along with steps for troubleshooting the errors.
Configuration and connectivity errors
This can happen for various reasons. To troubleshoot this error:
- Ensure that the source database is up and reachable.
- Navigate to the source connection profile from the Streams or Connection profiles pages.
- Verify that the connection profile connectivity information is correct.
- Verify that the username and password match.
- Verify that the username exists on the database and has the required privileges.
- Save any changes that you made on the Connection profiles page.
The stream resumes automatically.
- Make sure that you've completed all of the prerequisites VPC peering prerequisites .
-
After creating the private connectivity configuration , verify that the route which contains the internal IP address of the database appears in the Exported routes tab of the VPC Network Peering page.
To do this, go to the VPC Network Peering page, and then search for the peering that was added (the name is
peering-[UUID]
). The route can be found in the Exported routes tab. If this route doesn't exist, then add it manually. - Datastream doesn't check for overlapping with dynamic peering routes . Providing a subnet that overlaps with a dynamic route may lead to connectivity issues. Therefore, we don't recommend using a subnet that's part of a dynamic route.
- Make sure that the custom routes for the Datastream IP address ranges are advertised correctly. If the custom routes are missing, see Custom advertised routes .
- If you're still having issues connecting to the source database, then see Set up a reverse proxy .
- Make sure that you've completed all of the prerequisites .
- Make sure that your firewall rules allow the subnetwork of the provided network attachment to connect to the source database.
- Make sure that the custom routes for the Datastream IP address ranges are advertised correctly. If the custom routes are missing, see Custom advertised routes .
You selected the public IP allowlist or Forward SSH tunnel network connectivity methods for the connection profile that you're creating. However, the Block Public Connectivity Methods organization policy for Datastream is enabled. Therefore, you can't select public connectivity methods for your connection profile.
To resolve this issue, either select the private VPC peering or Private Service Connect interfaces network connectivity method or disable the organization policy.
To disable the organization policy:
- Go to the Organization policies page in the Google Cloud Console.
- Select the Datastream - Block Public Connectivity Methods organization policy.
-
Click EDIT .
- In the Applies to section of the page, select Customize .
-
In the Enforcement section, select Off .
- Click SAVE .
- Return to the Oracle connection profile that you're creating, and then click CREATE .
This can happen if your database has thousands of tables and schemas. Some of them might not be included in the list of objects to pull when configuring the source for the stream in the Google Cloud console. Instead of selecting Specific schemas and tables in the Select objects to include section, select Custom . In the Object matching criteria field, enter the schemas and tables that you want Datastream to pull.
Datastream might drop unsupported events during streaming. You can perform the following actions to address the issue:
-
Manually trigger a backfill of the entire table. This works if the discarded events are
UPSERT
events only. If the dropped events includeDELETE
events, then you need to truncate the table in BigQuery before performing the backfill.For information about how to perform a backfill, see Initiate backfill .
- Contact Google Support and ask them to perform a partial backfill. This is possible only if you can identify the dropped events with a SQL
WHERE
clause, and if none of the events areDELETE
events. - Disregard the issue if the number of discarded events is low or if the discarded events aren't significant.
UNSUPPORTED_LARGE_SCHEMA
. You might want to exclude some of the columns or rename them. Alternatively, you can exclude the object with the large schema.Oracle errors
Error | Troubleshooting steps |
---|---|
Supplemental logging is configured incorrectly on the source database. | An error fetching ongoing, change data capture (CDC) data can happen if the supplemental logging configuration isn't correct on the source database. Verify that supplemental logging is configured correctly. Specifically, confirm that supplemental logging is turned on for the database tables that are being streamed from the source into the destination. The stream resumes automatically. |
Unable to resume replication as log position is lost. | This error can occur when the replication process is paused for a long time, which causes the log position to be lost. Streams shouldn't be paused for periods of time that approach the log retention period. Recreate the stream. |
Log files are missing, either partially or entirely. | The log files may have been deleted. Oracle purges log files as soon as
they can unless you specify a minimum rotation period to keep them around.
In the Oracle server, set how long the log files should be retained. For example, use For an RDS deployment, use |
The exclude list subsumes the include list. | The include list is contained completely within the exclude list, so the list of objects that Datastream pulls from the source is empty. Modify the object selection, and then try again. |
The logging mode for Oracle database isn't set to ARCHIVELOG . | Change the logging mode, and then try again. |
Datastream returns an ORA-00942: table or view does not exist
error message, but everything is configured properly. |
This can be a result of caching on the Oracle server. Recreating the database user should fix the caching issue. |
Changes to an Oracle source aren't reflected in the destination when the stream is already running. | If you use LogMiner as your CDC method, Datastream reads from archived redo log files. In such a case, the changes you make to the source aren't reflected in the destination until the log is archived. To see the changes in the destination, change the CDC method to binary log reader,change the log archive policy, or manually force a log switch. For more information, see Work with Oracle database redo log files . |
Oracle CDC configuration validation failed. | You selected a CDC method for which your source database wasn't configured. Select a different method, or complete the configuration for your CDC method. For more details, see Configure a source Oracle database . |
There's an unexpected internal error. | For more details, contact Google Support. |
MySQL errors
This can happen for continuous MySQL streams if the binlog configuration is incorrect on the source database. To troubleshoot this error, perform the following actions:
- Verify that binlog is configured correctly.
- Confirm that the binary log format of the MySQL database is set to
ROW
. - Restart the stream.
This can happen when the source database doesn't adhere to the version support matrix . To troubleshoot this error, perform the following actions:
- Ensure that the source database adheres to the matrix.
- Recreate the stream with the updated source database.
mysql.rds_set_configuration('binlog retention hours', 168);
to keep the binlogs around for at least 7 days.Increase the Datastream ability to read from the source database:
- Reduce the maximum binary log (binlog) file size for your source database. Reducing the size increases the number of binlog files.
- If there are multiple binlog files, set the number of maximum concurrent CDC tasks
for the stream accordingly. Having multiple binlog files lets Datastream read source database events concurrently, up to the number set in the
maxConcurrentCdcTasks
field.
PostgreSQL errors
Verify that logical decoding is configured correctly. See Configure a source PostgreSQL database .
pgoutput
. Verify that the replication slot is configured correctly. See Source PostgreSQL database
for more information.BIGQUERY_TOO_MANY_PRIMARY_KEYS
.FULL
, then Datastream treats all the columns in this table as primary keys. If there are more than 16 columns in the table, then this violates the BigQuery CDC limitation
and causes the error. To solve the issue, complete the following steps: - Change the replica identity to
DEFAULT
:ALTER TABLE TABLE_NAME REPLICA IDENTITY DEFAULT
- Remove the table from the stream's list of objects to include. For more information, see Modify configuration information about the source database .
- Delete the table from BigQuery. For more information, see Delete tables .
- In Datastream, add the table to the stream again by editing your source configuration.
SQL Server errors
Error | Troubleshooting steps |
---|---|
CDC is disabled for the database DATABASE_NAME . | Change data capture (CDC) must be enabled for the database. Datastream needs direct read access to transaction logs in order to replicate real time changes to the source database, and to get complete log information. Enable CDC for the database and try again. For information about enabling CDC for a database, see Configure a source SQL Server database . |
Tables with CDC disabled. | Change data capture (CDC) must be enabled for all the tables included in the stream. Datastream needs direct read access to transaction logs in order to replicate real time changes to source tables, and to get complete log information. Enable CDC for the tables included in the stream and try again. For information about enabling CDC for source tables, see Configure a source SQL Server database . |
Missing permissions. | Datastream is missing the necessary permissions to read from the source. Grant the appropriate privileges to the user account that is used to connect to your database, and try again. |
The SQL Server EDITION_NAME edition is not supported. | Datastream doesn't support this SQL Server edition. For more information about supported editions of SQL Server, see Overview of SQL Server as a source . |
The SQL Server version VERSION_NAME of the Standard edition is not supported. | Datastream doesn't support this version of the SQL Server Standard edition. For more information about supported versions of SQL Server, see Overview of SQL Server as a source . |
The stream is unable to read the event in LSN " YOUR_LSN " because the transaction log seems to be truncated. | This issue can occur when transaction logs no longer exist on the source database. When you replicate data from a SQL Server source using the transaction logs CDC method, the logs might get truncated before Datastream reads them. When that happens, Datastream can't reliably replicate the source database to the destination. To resolve the issue, recover your stream or consider using the change tables CDC method instead. For more information about differences between the two methods, see Overview of SQL Server as a source . |
SQL Server CDC configuration: failed. | The CDC method that you have selected doesn't comply with your database configuration. Change the CDC method and try again. |
Salesforce errors
Error | Troubleshooting steps |
---|---|
Insufficient permissions. | The user of the connected app or the external client app that you configured to authenticate the connection between your Salesforce org and Datastream doesn't have sufficient permissions to access the data that you want to replicate. Make sure that you configured your Salesforce source correctly. For more information, see Configure a Salesforce source . |
Bulk API 2.0 disabled. | Bulk API 2.0 is enabled by default for Performance
, Unlimited
, Enterprise
and Developer
editions. This error message indicates that the API is either disabled in your edition, or the credentials that you use don't have sufficient permissions. Make sure that the user profile that you use has the |
Limit exceeded. | You've exceeded the Salesforce API query limit. You see this message when you reach 90% of your API limit quota. In such a case, Datastream retries the operation at a later time. You might want to consider increasing your Salesforce API quota. |
Deleted limit exceeded. | When querying for deleted records, Salesforce limits the response to 600,000 record identifiers. The lowest querying granularity in Salesforce is one minute, and if you delete more than 600,000 records within a minute, Salesforce returns this error. |
Authentication error. | Datastream can't authenticate with Salesforce. You probably used the wrong credentials or domain name. |
MongoDB errors
Error | Troubleshooting steps |
---|---|
Authentication failed. | Please check if the authSource
for the Datastream
user is admin
. The Datastream user must be created in
the admin
database. This database is a privileged database that
lets users run certain administrative commands
. |
Database login failed. | Check your username and password and try again. Also, ensure that your user account has been created within the admin
database. If the problem persists, it's possible that your user account was deleted or created incorrectly. |
Excluded objects list is invalid: {exclude_list}
. |
Specify your excluded objects in the following format: DATABASE_NAME
. COLLECTION_NAME
. FIELD_NAME
. NESTED_FIELD_NAME
with optional wildcards. Valid examples: db.*
, database.collection.*
, database.*.field.*
. |
We're missing the necessary permissions to read from the source. | Assign the readAnyDatabase
role to your user and try again. |
MongoDB version VERSION_NAME is not supported. | Please use version 5.0 or higher. |
Datastream was unable to execute the buildinfo
command to determine the MongoDB version. |
Ensure that the user has the necessary permissions to execute the buildinfo
command and try again. For more information about the required permissions, see Configure a MongoDB database
. |
The connection to the MongoDB cluster timed out. | Ensure that you've provided the correct hostname and credentials, and try again. |
We are unable to read the necessary data due to insufficient permissions. | Assign the readAnyDatabase
role to the account that is used to connect to your MongoDB cluster, and try again. |
The user account that is used to connect to your MongoDB cluster does not exist. | Please create the user account and try again. |
We were unable to connect using the provided information. | Please verify that the correct connection format (SRV or Standard) is used and all the necessary information (like replica set names for replica set connection string) is included. For more information, see Create a connection profile for a MongoDB database . |
We encountered a MongoDB exception. Source error message: {source_error}
. |
If the source error message is unclear, contact Google Support. |
BigQuery errors
BIGQUERY_UNSUPPORTED_PRIMARY_KEY_CHANGE, details: Failed to write to BigQuery due to an unsupported primary key change: adding primary keys to existing tables is not supported.
When you select the Append-onlywrite mode for your stream, your data is appended in BigQuery as a stream of INSERT
, UPDATE-INSERT
, UPDATE-DELETE
and DELETE
events, without any consolidation. This might cause duplicate rows to be written to BigQuery when you perform backfill, or when an issue occurs and the BigQuery writer re-tries the write operations. To address the issue, we recommend that you run a de-duplication query similar to the following on a regular basis:
SELECT * FROM (SELECT *, row_number() OVER (PARTITION BY datastream_metadata.uuid) AS num FROM TABLE_NAME
) WHERE num=1
Verify that there's a primary key in your source table. BigQuery needs it to merge the changes into the destination table.
If there's no primary key, consider adding one in either the source or destination table. To add a primary key in your destination BigQuery table, follow these steps:
- Pause the stream.
- Truncate the table in BigQuery.
- Add the primary key to the table definition.
- Resume the stream.
- Trigger backfill for the table.
By default, Datastream doesn't support adding a primary key to a table that's already replicated to BigQuery without a primary key or removing a primary key from a table that's replicated to BigQuery with a primary key. You can, however, change the primary key definition for a source table replicated to BigQuery that already has a primary key:
- Check the total latency metric for the stream and wait at least as long as the current latency to ensure that any in-flight events are written to the destination. This allows all events with the original primary key to be successfully streamed.
- Pause the stream .
- Copy the
CREATE TABLE
data definition language (DDL) command for the table in BigQuery:SELECT ddl FROM PROJECT_ID . DATASET . INFORMATION_SCHEMA . TABLES WHERE table_name = ' TABLE_NAME ' ;
Replace the following:
- PROJECT_ID : the identifier of your Google Cloud project.
- DATASET_ID : the identifier of the dataset in BigQuery.
- TABLE_NAME : the name of the table for which you want to copy the DDL command.
- Drop the table in BigQuery.
- Adjust the
CREATE TABLE
DDL command with the new primary keys. Include the partition and cluster keys, and themax_staleness
OPTION
:CREATE TABLE ` [ PROJECT_ID ].[ DATASET_ID ].[ TABLE_NAME ] ` ( product_id INT64 NOT NULL , product_name STRING , datastream_metadata STRUCT
, PRIMARY KEY ( product_id ) NOT ENFORCED ) CLUSTER BY dept_no OPTIONS ( max_staleness = INTERVAL '0-0 0 0: MINS :0' YEAR TO SECOND ); ; Replace the following:
- PROJECT_ID : the identifier of your Google Cloud project.
- DATASET_ID : the identifier of the dataset in BigQuery.
- TABLE_NAME : the name of the table for which you copied the DDL command.
- MINS
: the number of minutes that you want to set for the
max_staleness
option, for example15
.
- Run the adjusted query to recreate the table in BigQuery.
- Resume the stream .
- Initiate backfill for the table .
What's next
- To learn how to look for potential issues with your stream, see Troubleshoot a stream .
- To learn how to configure your source database, see Sources .
- To learn how to configure your BigQuery or Cloud Storage destination, see Destinations .