The migration job process might incur errors during runtime.
- Some errors, such as a bad password on the source database, are recoverable, meaning they can be fixed and the migration job resumes automatically.
- Some are unrecoverable, such as errors in data replication, meaning the migration job needs to be restarted from the beginning.
When an error occurs, the migration job status changes to Failed
, and the substatus reflects the last status before failure.
To troubleshoot an error, navigate to the failed migration job to view the error and follow the steps outlined in the error message.
To view more details about the error, navigate to Cloud Monitoring using the link on the migration job. The logs are filtered to the specific migration job.
In the following table, you can find some examples of issues and how they can be solved:
The destination instance contains existing data or user defined
entities (for example databases, tables, or functions). You can only
migrate to empty instances. Clear your destination instance and retry
the migration job.
During the initial sync process (full dump), DDLs or programs requiring ACCESS EXCLUSIVE
locks
such as ALTER TABLE
or DROP TABLE
should be avoided on the tables. Otherwise, the DDLs or programs will wait until the initial sync finishes.
For example, if a table is still in the initial sync process and an ALTER TABLE
command is executed on the same table, then the command won't be run and subsequent DDL and DML commands will be blocked until the initial sync finishes.
No pglogical extension installed on databases (X)
pglogical
installed.- You receive a
Cannot connect to invalid database
error message. - The Storage usage migration job metric shows no progress after a long time when the migration job is performing the full database dump.
pglogical
extension. For more
information, see the pglogical
issue tracker in GitHub
.Cannot connect to invalid database
.Replication user 'x' doesn't have sufficient privileges.
Unable to connect to source database server.
The source database 'wal_level' configuration must be equal to 'logical'.
wal_level
for the source database is set to a value other than logical
.wal_level
to logical
.The source database 'max_replication_slots' configuration is not sufficient.
max_replication_slots
parameter wasn't configured correctly.The source database 'max_wal_senders' configuration is not sufficient.
max_wal_senders
parameter wasn't configured correctly.The source database 'max_worker_processes' configuration is not sufficient.
max_worker_processes
parameter wasn't configured correctly.Error Message: Cleanup may have failed on source due to error: generic::unknown: failed to connect to on-premises database.
OR
Error Message: Error promoting EM replica: finished drop replication with errors.
For each database, run commands as a user with the superuser
privilege.
For more information about which commands to run, see Clean up replication slots .
Error Message: x509 certificate signed by unknown authority.
The source CA certificate provided to Database Migration Service might contain only the root certificate. However, the source certificate requires both the root certificate and any intermediate certificates.
For example, for Amazon Relational Database Service, using the rds-ca-2019-root.pem certificate might result in this issue.
Create a combined source CA certificate that contains both the root certificate and all required intermediate certificates.
For the Amazon Relational Database Service use case, instead of the rds-ca-2019-root.pem certificate, use the rds-combined-ca-bundle.pem certificate.
Error Message: ERROR: Out of shared memory HINT: You might need to increase max_locks_per_transaction.
max_number_of_tables_per_database
}/( max_connections
+ max_prepared_transactions
).Error Message: ERROR: no data left in message.
Error Message: Cannot assign TransactionIds during recovery.
- When creating the destination, set the data disk size so that it's close to the final size. The full dump phase uses an I/O write-intensive workload, and a larger disk size has a better I/O performance. For more information, see Block storage performance .
- Choose a higher tier for the Cloud SQL destination to get the maximum available network and disk bandwidth.
- Tune the Cloud SQL destination's
max_wal_size
flag. Typically, 32 GB or 64 GB is a good value to set for this flag. Updating this flag doesn't require you to restart the server.
subscriber {subscriber_name} initialization failed during nonrecoverable step (d), please try the setup again
The migration job failed during the full dump phase and the job isn't recoverable. The source database instance was restarted or in recovery mode, or the replication connections ended because of an insufficient value set for the wal_sender_timeout
parameter.
To find the root cause of the problem:
- Go to the Logs Explorer page in the Google Cloud Console.
- From the resource list, select your Cloud SQL replica. A list of the most recent logs for the replica appears.
- From the log file names, select
postgres.log
. - Set the log's severity level to all levels above
Warning
. The first error logs may be the root cause of the failure.
- Make sure that Database Migration Service can always connect to the source database instance during the full dump phase.
- Check if the value of the
wal_sender_timeout
parameter is set to a larger number (for example,0
) on the source database instance. - Restart the migration job, and then try again.
ERROR: unknown column name {column_name}
A column was added to a replicated table on the primary node but not on the replica node.
Only data manipulation language (DML) changes are updated automatically during continuous migrations. Managing data definition language (DDL) changes so that the source and destination databases remain compatible is the responsibility of the user, and can be achieved in two ways:
- Stop writes to the source database and run the DDL commands in both source and destination. Before running the DDL commands on the destination, grant the
cloudsqlexternalsync
role to the Cloud SQL user applying the DDL changes. - Use the
pglogical.replicate_ddl_command
to allow DDL commands to be run on the source and destination at a consistent point. The user running the commands must have the same username on both the source and the destination, and should be the superuser or the owner of the artifact being migrated (for example, the table, sequence, view, or database).
See Continuous migration
to find the examples of using the pglogical.replicate_ddl_command.
ERROR: cannot truncate a table referenced in a foreign key constraint
The user tried to truncate a table that has a foreign key constraint.
Remove the foreign key constraint first, and then truncate the table.
ERROR: connection to other side has died
The replication connection ended because of an insufficient value set for the wal_sender_timeout parameter
. The error usually occurs during the replication phase after the success of the initial dump.
Consider increasing the wal_sender_timeout
parameter value or disable the timeout mechanism by setting its value to 0
on the source database instance.
migration job test configuration has returned the following warnings: Some table(s) have limited support.
Source has tables with limited support, for example tables without primary keys.
This is a warning message. You can proceed with the migration, but note that unsupported entities (for example tables without primary keys) don't get migrated. For more information, review Configure your source databases .
In the Errors column, click View errors and fix them. You can also remove the failed databases from the migration job.
For more information about removing a failed database from a migration job, see Manage migration jobs .
Clear extra data from your existing destination instance
When you migrate to an existing destination instance
, you receive the following error message: The destination instance contains existing data or user defined
entities (for example databases, tables, or functions). You can only
migrate to empty instances. Clear your destination instance and retry
the migration job.
This issue can occur if your destination instance contains extra data. You can only migrate to existing instances that are empty. See Known limitations .
Things to try
Clear extra data from your destination instance and start the migration job again by performing the following steps:
- Stop the migration job .
- At this point, your destination Cloud SQL instance is in `read-only` mode. Promote the destination instance to gain write access.
- Connect to your destination Cloud SQL instance .
- Remove extra data from your destination instance databases. Your
destination can only contain system configuration data. Destination databases
can't contain user data (such as tables). There are different SQL statements
you can run on your databases to find non-system data, for example:
Example SQL statement to retrieve non-system databases (click to expand)
SELECT datname FROM pg_catalog . pg_database WHERE datname NOT IN ( 'cloudsqladmin' , 'template1' , 'template0' , 'postgres' );
Example SQL statement to retrieve non-system data in the
postgres
database (click to expand)The
postgres
database is a system database, but it can contain non-system data. Make sure you run these statements on thepostgres
database. If you use thepsql
client to connect to the destination instance, you can switch to another database without resetting your connection by using the\connect {database_name_here}
command.SELECT table_schema , table_name FROM information_schema . tables WHERE table_schema != 'information_schema' AND table_schema not like 'pg\_%' ; SELECT routine_schema , routine_name FROM information_schema . routines WHERE routine_schema != 'information_schema' AND routine_schema not like 'pg\_%' ; SELECT extname FROM pg_extension WHERE extname != 'plpgsql' ;
- Start the migration job .
Clean up replication slots
You see one of the following messages:
-
Cleanup may have failed on source due to error: generic::unknown: failed to connect to on-premises database.
-
Error promoting EM replica: finished drop replication with errors.
The issue might be
When promoting a Cloud SQL instance, if the source instance isn't reachable from the Cloud SQL instance (for example, the source instance isn't running, or you removed the Cloud SQL instance from the allow list of source instances), then the settings needed for the replication can't be cleaned up during the promotion of a migration job. You must clean up the replication slots manually.
Things to try
For each database, run the following commands as a user with the superuser
privilege:
-
Get the replication slot names from the error message, and then run the following command to drop the slots, one by one:
select pg_drop_replication_slot({ slot_name });
-
If the replication slot names aren't available in the error message, then run the following command to query for the existing replication slots:
select pg_drop_replication_slot( slot_name ) from pg_replication_slots where slot_name like '%cloudsql%' and active = 'f';
-
If there are no Cloud SQL replicas using the source instance, then run the following command to clean up
pglogical
settings:select pglogical.drop_node( node_name ) from pglogical.node where node_name like
'cloudsql';
-
If the
pglogical
extension isn't needed anymore, then run the following command to uninstall the extension:DROP EXTENSION IF EXISTS pglogical;
Error message: Cannot connect to invalid database
When migrating to PostgreSQL version 15, after multiple subsequent connection retry attempts one of the following symptoms occur:
- You receive a
Cannot connect to invalid database
error message. - The Storage usage migration job metric shows no progress after a long time when the migration job is performing the full database dump.
The issue might be
This problem is often attributed to
the deadlock issue in the pglogical
extension. For more
information, see the pglogical
issue tracker in GitHub
.
Things to try
Perform the migration job again with a new destination instance
Try deleting the destination database where you experienced the issue and re-create your migration job. Follow these steps:
- Delete the destination instance where you experienced the problems. See Delete instances in Cloud SQL for PostgreSQL documentation.
- Delete the failed migration job. See Review a migration job .
- Re-create your migration job. See Create a migration job .
Migrate to an intermediate version
Consider migrating to an earlier PostgreSQL version, such as PostgreSQL 14. After a successful migration, you can try upgrade to the desired PostgreSQL 15 instance. See Upgrade the database major version by migrating data in Cloud SQL for PostgreSQL documentation.
Manage users and roles
Migrate existing users
Currently, Database Migration Service doesn't support migrating existing users from a source instance into a destination Cloud SQL instance. You can manage this migration by creating the users in Cloud SQL manually.
About the cloudsqlexternalsync
user
During the migration, all objects on the Cloud SQL replica are owned by the cloudsqlexternalsync
user. After the data is migrated, you can modify the ownership of the objects to other users by completing the following steps:
- Run the
GRANT cloudsqlexternalsync to {USER}
command. - On each database, run the
reassign owned by cloudsqlexternalsync to {USER} ;
command. - To remove the
cloudsqlexternalsync
user, run thedrop role cloudsqlexternalsync
command.
Import data into a new Cloud SQL instance
If you first export data
from a Cloud SQL instance that Database Migration Service migrated into Cloud Storage, and then import the data
from Cloud Storage into a stand-alone Cloud SQL instance, the import might fail because the cloudsqlexternalsync
user doesn't exist on the destination instance.
To mitigate the issue, either create the cloudsqlexternalsync
user
on the destination instance or remove the user
from the migrated instance.