Check if your question or problem has already been addressed on one of the following pages:
Topics in this page include:
- Backup and recovery
- Cancel import and export
- Cloning
- Connectivity
- Creating instances
- External primary
- External replica
- Flags
- High availability
- Import and export
- Logging
- Managing instances
- Private Service Connect
- Replication
Backup and recovery
Run the gcloud sql operations list
command
to list all
operations for the given Cloud SQL instance.
Look in the logs and filter by text to find the user. You may need to use audit logs for private information. Relevant log files include:
-
cloudsql.googlapis.com/mysql-general.log
-
cloudsql.googleapis.com/mysql.err
- If Cloud Audit Logs
is enabled and
you have the required permissions to view them,
cloudaudit.googleapis.com/activity
may also be available.
After an instance is purged, no data recovery is possible. However, if the instance is restored, then its backups are also restored. For more information on recovering a deleted instance, see Recovery backups .
If you have done an export operation, create a new instance and then do an import operation to recreate the database. Exports are written to Cloud Storage and imports are read from there.
If you really need to cancel the operation, you can ask customer support
to force restart
the instance.
Create the database users before restoring the SQL dump.
To keep backups indefinitely, you can create an on-demand backup , as they are not deleted in the same way as automated backups. On-demand backups remain indefinitely. That is, they remain until they're deleted or the instance they belong to is deleted. Because that type of backup is not deleted automatically, it can affect billing.
Cancel import and export
Issue | Troubleshooting |
---|---|
Error message: You can't cancel operation [operation-ID] because
this operation isn't in progress.
|
You're trying to cancel an import or export operation that's completed, failed, or cancelled. If the operation is running, you can cancel it. |
Error message: You can't cancel operation [operation-ID] because
Cloud SQL doesn't support the cancellation of an [operation-type]
operation.
|
Cloud SQL
doesn't support the cancellation of the operation because it has an operation type
other than |
Error message: The [operation-type] operation isn't cancelled. Wait
and retry in a few seconds.
|
Cloud SQL can't cancel the import or export operation at this time. Try again in a few seconds. If the problem persists, contact Google Cloud Support . |
Clone
Issue | Troubleshooting |
---|---|
Cloning fails with constraints/sql.restrictAuthorizedNetworks
error. |
The cloning operation is blocked by the Authorized Networks
configuration. Authorized Networks
are configured for public IP addresses in the Connectivity section
of the Google Cloud console, and cloning is not permitted due to security considerations
. Remove all |
Error message: Failed to create subnetwork. Couldn't find free
blocks in allocated IP ranges. Please allocate new ranges for this service
provider. Help Token: [help-token-id].
|
You're trying to use the Google Cloud console to clone an instance with a private IP address, but you didn't specify the allocated IP range that you want to use and the source instance isn't created with the specified range. As a result, the cloned instance is created in a random range. Use |
Connect
Aborted connection
.- Networking instability.
- No response to TCP keep-alive commands (either the client or the server isn't responsive, possibly overloaded)
- The database engine connection lifetime was exceeded and the server ends the connection.
Applications must tolerate network failures and follow best practices such as connection pooling and retrying. Most connection poolers catch these errors where possible. Otherwise the application must either retry or fail gracefully.
For connection retry, we recommend the following methods:
- Exponential backoff . Increase the time interval between each retry, exponentially.
- Add randomized backoff also.
Combining these methods helps reduce throttling.
Certificate verify failed
.The client certificates have expired or the path to the certificates isn't correct.
Regenerate the certificates by recreating them .
Create instances
Failed to create subnetwork. Couldn't
find free blocks in allocated IP ranges. Please allocate new ranges for
this service provider
.- The size of the allocated IP range for the private service connection is smaller than /24.
- The size of the allocated IP range for the private service connection is too small for the number of Cloud SQL instances.
- The requirement on the size of allocated IP range will be larger if instances are created in multiple regions. See allocated range size
To resolve this issue, you can either expand the existing allocated IP range or allocate an additional IP range to the private service connection. For more information, see Allocate an IP address range .
If you used the --allocated-ip-range-name
flag while creating
the Cloud SQL instance, you may only expand the specified IP range.
If you're allocating a new range, take care that the allocation doesn't overlap with any existing allocations.
After creating a new IP range, update the vpc peering with the following command:
gcloud services vpc-peerings update \ --service=servicenetworking.googleapis.com \ --ranges= OLD_RESERVED_RANGE_NAME , NEW_RESERVED_RANGE_NAME \ --network= VPC_NETWORK \ --project= PROJECT_ID \ --force
If you're expanding an existing allocation, take care to increase only the allocation range and not decrease it. For example, if the original allocation was 10.0.10.0/24, then make the new allocation at least 10.0.10.0/23.
In general, if starting from a /24 allocation, decrementing the /mask by 1 for each condition (additional instance type group, additional region) is a good rule of thumb. For example, if trying to create both instance type groups on the same allocation, going from /24 to /23 is enough.
After expanding an existing IP range, update the vpc peering with following command:
gcloud services vpc-peerings update \ --service=servicenetworking.googleapis.com \ --ranges= RESERVED_RANGE_NAME \ --network= VPC_NETWORK \ --project= PROJECT_ID
Failed to create subnetwork. Router status is
temporarily unavailable. Please try again later. Help Token: [token-ID]
.Failed to create subnetwork. Required
'compute.projects.get' permission for PROJECT_ID
.Export
Issue | Troubleshooting |
---|---|
HTTP Error 409: Operation failed because another operation was
already in progress.
|
There is already a pending operation for your instance. Only one operation is allowed at a time. Try your request after the current operation is complete. |
HTTP Error 403: The service account does not have the required
permissions for the bucket.
|
Ensure that the bucket exists and the service account for the Cloud SQL
instance (which is performing the export) has the Storage Object Creator
role
( roles/storage.objectCreator
) to allow export to the bucket. See IAM roles for Cloud Storage
. |
CSV export worked but SQL export failed. | CSV and SQL formats do export differently. The SQL format exports the
entire database, and likely takes longer to complete. The CSV format lets
you define which elements of the database to include in the export. Use CSV exports to export only what you need. |
Export is taking too long. | Cloud SQL does not support concurrent synchronous operations. Use export offloading . At a high level, in export offloading, instead of issuing an export on the source instance, Cloud SQL spins up an offload instance to perform the export. Export offloading has several advantages, including increased performance on the source instance and the unblocking of administrative operations while the export is running. With export offloading, total latency can increase by the amount of time it takes to bring up the offload instance. Generally, for reasonably sized exports, latency is not significant. However, if your export is small enough, then you may notice the increase in latency. |
You want exports to be automated. | Cloud SQL does not provide a way to automate exports. You could build your own automated export system using Google Cloud products such as Cloud Scheduler, Pub/Sub, and Cloud Functions, similar to this article on automating backups . |
External primary
Lost connection to MySQL server during query when dumping table
.Make sure the external primary is available to connect. You can also modify the values of the net_read_timeout and net_write_timeout flags on the source instance to stop the error. For more information on the allowable values for these flags, see Configure database flags .
To learn more about using mysqldump
flags for managed
import migration, see Allowed and default initial sync flags
Make sure the replication flags such as binlog-do-db
, binlog-ignore-db
, replicate-do-db
or replicate-ignore-db
are not set in a conflicting way.
Run the command show master status
on the primary instance to
see the current settings.
- Check the replication metrics for your replica instance in the Cloud Monitoring section of the Google Cloud console.
- The errors from the MySQL IO thread or SQL thread can be
found in Cloud Logging
in the
mysql.err log
files. - The error can also be found when connecting to the replica instance.
Run the command
SHOW SLAVE STATUS
, and check for the following fields in the output:- Slave_IO_Running
- Slave_SQL_Running
- Last_IO_Error
- Last_SQL_Error
mysqld check failed: data disk is full
.Increase the disk size of the replica instance. You can either manually increase the disk size or enable auto storage increase.
External replica
The slave is connecting ... master has purged
binary logs containing GTIDs that the slave requires
.Create a new dump file using the correct flag settings, and configure the external replica using that file
- Connect to your mysql client through a Compute Engine instance .
- Run
mysqldump
and use the
--master-data=1
and--flush-privileges
flags.Important: Do not include the
--set-gtid-purged=OFF
flag . - Ensure that the dump file just created contains the
SET @@GLOBAL.GTID_PURGED='...'
line. - Upload the dump file to a Cloud Storage bucket and configure the replica using the dump file .
Flags
Issue | Troubleshooting |
---|---|
After enabling a flag the instance loops between panicking and crashing. | Contact customer support
to
request flag removal followed by a hard drain
. This forces the
instance to restart on a different host with a fresh configuration without
the undesired flag or setting. |
You see the error message Bad syntax for dict arg
when
trying to set a flag. |
Complex parameter values , such as comma-separated lists, require special treatment when used with gcloud commands. |
High availability
Issue | Troubleshooting |
---|---|
You can't find the metrics for a manual failover. | Only automatic failovers go into the metrics. |
Cloud SQL instance resources (CPU and RAM) are near 100% usage, causing the high availability instance to go down. | The instance machine size is too small for the load. Edit the instance to upgrade to a larger machine size to get more CPUs and memory. |
Import
HTTP Error 409: Operation failed because another operation was already in progress
.Close unused operations. Check the CPU and memory usage of your Cloud SQL instance to make sure there are plenty of resources available. The best way to ensure maximum resources for the import is to restart the instance before beginning the operation.
A restart:
- Closes all connections.
- Ends any tasks that may be consuming resources.
Create the database users before importing.
Things to try:
Add the following line at the start of the dump file:
SET FOREIGN_KEY_CHECKS=0;
Additionally, add this line at the end of the dump file:
SET FOREIGN_KEY_CHECKS=1;
These settings deactivate data integrity checks while the import operation is in progress, and reactivate them after the data is loaded. This doesn't affect the integrity of the data on the database, because the data was already validated during the creation of the dump file.
Logging
Issue | Troubleshooting |
---|---|
Audit logs are not found. | Data-Access logs are only written if the operation is an authenticated user-driven API call that creates, modifies, or reads user-created data, or if the operation accesses configuration files or metadata of resources. |
Operations information is not found in logs. | You want to find more information about an operation. For example, a user was deleted but you can't find out who did it. The logs show the operation started but don't provide any more information. You must enable audit logging for detailed and personal identifying information (PII) like this to be logged. |
Logging is using a lot of disk space. | There are three kinds of log files that use disk space: redo logs,
general logs and binary logs. Connect to the database and run these commands for details on each type: SHOW VARIABLES LIKE 'innodb_log_file%'; SELECT ROUND(SUM(LENGTH(argument)/POW(1024,2),2) AS GB from mysql.general_log; SHOW BINARY LOGS; |
Log files are hard to read. | You'd rather view the logs as json or text.You can use the gcloud logging read
command along with linux post-processing commands to download the logs. To download the logs as JSON: gcloud logging read \ "resource.type=cloudsql_database \ AND logName=projects/ PROJECT_ID \ /logs/cloudsql.googleapis.com%2F LOG_NAME " \ --format json \ --project= PROJECT_ID \ --freshness="1d" \ > downloaded-log.json To download the logs as TEXT: gcloud logging read \ "resource.type=cloudsql_database \ AND logName=projects/ PROJECT_ID \ /logs/cloudsql.googleapis.com%2F LOG_NAME " \ --format json \ --project= PROJECT_ID \ --freshness="1d"| jq -rnc --stream 'fromstream(1|truncate_stream(inputs)) \ | .textPayload' \ --order=asc > downloaded-log.txt |
Manage instances
general_log
may have accumulated.
You can reduce crash recovery time by preventing a large general_log
from accumulating. If you have general_log
on, truncate the table and only enable general_log
for short
periods of time. You can find out the size of the general logs by connecting to the database and running this query:
SELECT ROUND(SUM(LENGTH(argument)/POW(1024,2)),2) from mysql.general_log;
Things to try:
- You can check the storage occupied by binary logs using the following
command in the MySQL command line interface:
SHOW BINARY LOGS;
- Temporary tables may also be occupying a significant amount of
storage space. To check the temporary space usage, use the
following command:
SELECT * FROM INFORMATION_SCHEMA.FILES WHERE TABLESPACE_NAME='innodb_temporary'\G
. - The following command lets you check the redo log size:
SHOW VARIABLES LIKE 'innodb_log_file%';
- You can check the size of
general_log
, if it is enabled, with the help of this command:SELECT ROUND(SUM(LENGTH(argument)/POW(1024,2)),2) AS GB from mysql.general_log;
- If needed, you can truncate your log tables by using the API. For more information, see the instances.truncateLog reference page .
- Learn more about setting and configuring slow-query logs.
Connect to the database and execute this query:
SHOW PROCESSLIST
.
The first item in the list may be the one holding the lock, which the subsequent items are waiting on.
The SHOW INNODB STATUS
query can also be helpful.
ibtmp1
is used for storing temporary
data. This file is reset upon database restart. To find information about
temporary file usage, connect to the database and
execute the following query: SELECT * FROM INFORMATION_SCHEMA.FILES WHERE TABLESPACE_NAME='innodb_temporary'\G
Connect to the database and execute the following query:
SELECT TABLE_SCHEMA, TABLE_NAME, sum(DATA_LENGTH+INDEX_LENGTH)/pow(1024,2)
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA NOT IN ('PERFORMANCE_SCHEMA','INFORMATION_SCHEMA','SYS','MYSQL')
GROUP BY TABLE_SCHEMA, TABLE_NAME;
InnoDB: page_cleaner: 1000ms intended loop took 5215ms. The
settings might not be optimal.
Shard the instance if possible. Using many smaller Cloud SQL instances is better than one large instance.
Restart deletes the temporary files but not reduce the storage. Only customer support can reset the instance size.
Look in the logs around the time of the deletion and see if there's a rogue script running from a dashboard or another automated process.
ERROR: (gcloud.sql.instances.delete) HTTP Error
409: The instance or operation is not in an appropriate state to handle the
request
, or the instance may have a INSTANCE_RISKY_FLAG_CONFIG
flag status. Some possible explanations include:
- Another operation is in progress. Cloud SQL operations do not run concurrently. Wait for the other operation to complete.
- The
INSTANCE_RISKY_FLAG_CONFIG
warning is triggered whenever at least onebeta
flag is being used. Remove the risky flag settings and restart the instance
Unfortunately, you can't shrink the ibtmp1
file by any method
other than restarting the service.
One mitigation option is to create the temporary table with ROW_FORMAT=COMPRESSED
, so it is stored
in file-per-table tablespaces in the temporary file directory. However, the
downside is performance costs associated with creating and removing a
file-per-table tablespace for each temporary table.
If your instance runs out of storage, and the automatic storage increase capability isn't enabled, your instance goes offline. To avoid this issue, you can edit the instance to enable automatic storage increase.
By having connections that last less than 60 seconds, most unclean shutdowns can be avoided, including connections from the database command prompt. If you keep these connections open for hours or days, shutdowns can be unclean.
Find out which objects are dependent on the user, then drop or reassign those objects to a different user.
This article discusses how to find the objects owned by the user.Refer to general performance tips in particular.
For slow database inserts, updates, or deletes, consider the following actions:
- If you enable the
long_query_time
flag, you can check the logs for slow queries. Go to the Logs Explorer page for your project and run a query like this:resource.type="cloudsql_database" resource.labels.database_id=" INSTANCE-ID " log_name="projects/ PROJECT-ID /logs/cloudsql.googleapis.com%2Fmysql-slow.log"
You can download the logs in JSON or TEXT format for local processing.
- Check the locations of the writer and database; sending data a long distance introduces latency.
- Check the location of the reader and database; latency affects read performance even more than write performance
To reduce the latency the recommendation is to locate both the source and destination resources in the same region.
Out of memory
but the
Google Cloud console or Cloud Monitoring charts seem to show there's still
memory remaining. There are other factors beside your workload that can impact memory usage, such as the number of active connections and internal overhead processes. These aren't always reflected in the monitoring charts.
Ensure that the instance has enough overhead to account for your workload plus some additional overhead.
To preserve your data, export it to Cloud Storage before you delete an instance .
The Cloud SQL Admin role includes the permission to delete the instance. To prevent accidental deletion, grant this role only as needed.
There are other ways to accomplish the goal by creating a new instance.
- You can clone the instance you want to rename and set a new name for the cloned instance. This allows you to create the new instance without having to import data manually. Just as when creating a new instance, the cloned instance has a new IP address.
- You can export data from your instance into a Cloud Storage bucket, create a new instance with the new name you want, and then import the data into the new instance.
In both cases, you can delete your old instance after the operation is done. We recommend going with the cloning route since it has no impact on performance and doesn't require you to redo any instance configuration settings such as flags, machine type, storage size and memory.
Private Service Connect
- Check the endpoint's status.
gcloud
To check the status, use the
gcloud compute forwarding-rules describe
command.gcloud compute forwarding-rules describe ENDPOINT_NAME \ --project= PROJECT_ID \ --region= REGION_NAME \ | grep pscConnectionStatus
Make the following replacements:
- ENDPOINT_NAME : the name of the endpoint
- PROJECT_ID : the ID or project number of the Google Cloud project that contains the endpoint
- REGION_NAME : the region name for the endpoint
REST
Before using any of the request data, make the following replacements:
- PROJECT_ID : the ID or project number of the Google Cloud project that contains the Private Service Connect endpoint
- REGION_NAME : the name of the region
- ENDPOINT_NAME : the name of the endpoint
HTTP method and URL:
GET https://compute.googleapis.com/compute/v1/projects/ PROJECT_ID /regions/ REGION_NAME /forwardingRules/ ENDPOINT_NAME
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{ "kind": "compute#forwardingRule", "id": " ENDPOINT_ID ", "creationTimestamp": "2024-05-09T12:03:21.383-07:00", "name": " ENDPOINT_NAME ", "region": "https://www.googleapis.com/compute/v1/projects/ PROJECT_ID /regions/ REGION_NAME ", "IPAddress": " IP_ADDRESS ", "target": "https://www.googleapis.com/compute/v1/projects/ PROJECT_ID /regions/ REGION_NAME /serviceAttachments/ SERVICE_ATTACHMENT_NAME ", "selfLink": "https://www.googleapis.com/compute/v1/projects/ PROJECT_ID /regions/ REGION_NAME /forwardingRules/ ENDPOINT_NAME ", "network": "https://www.googleapis.com/compute/v1/projects/ PROJECT_ID /global/networks/default", "serviceDirectoryRegistrations": [ { "namespace": "goog-psc-default" } ], "networkTier": "PREMIUM", "labelFingerprint": " LABEL_FINGERPRINT_ID ", "fingerprint": " FINGERPRINT_ID ", "pscConnectionId": " CONNECTION_ID ", "pscConnectionStatus": "ACCEPTED","allowPscGlobalAccess": true }
- Verify that the status of the endpoint is
ACCEPTED
. If the status isPENDING
, then the instance isn't allowing the Google Cloud project that contains the endpoint. Make sure that the network project in which the endpoint is created is allowed. For more information, see Edit an instance with Private Service Connect enabled .
Replication
First, check that the value of the max_connections
flag is
greater than or equal to the value on the primary.
If the max_connections
flag is set appropriately, inspect the logs
in
Cloud Logging to find the actual error.
If the error is: set Service Networking service account as
servicenetworking.serviceAgent role on consumer project
, then disable
and re-enable the Service
Networking API
. This action creates the service account necessary
to continue with the process.
Restart the replica instance to reclaim the temporary memory space.
Edit the instance
to enable automatic storage increase
.
- Slow queries on the replica. Find and fix them.
- All tables must have a unique/primary key. Every update on such a table without a unique/primary key causes full table scans on th replica.
- Queries like
DELETE ... WHERE field < 50000000
cause replication lag with row-based replication since a huge number of updates are piled up on the replica.
Some possible solutions include:
- Configure parallel replication .
- Set the innodb_flush_log_at_trx_commit
flag on the read replica to 2.
See Tips for working with flags for more information about this flag.
- Edit the instance to increase the size of the replica.
- Reduce the load on the database.
- Send read traffic to the read replica.
- Index the tables.
- Identify and fix slow write queries.
- Recreate the replica.
To avoid a long transaction, some possible solutions include:
- Break the transaction into multiple small transactions
- Chunk a single large write query into smaller batches
- Try to separate long SELECT queries from a transaction mixed with DMLs
On the primary instance that's displaying the error message, set the parallel replication flags:
- Modify the
binlog_transaction_dependency_tracking
andtransaction_write_set_extraction
flags:-
binlog_transaction_dependency_tracking=COMMIT_ORDER
-
transaction_write_set_extraction=OFF
-
- Add the
slave_pending_jobs_size_max
flag:slave_pending_jobs_size_max=33554432
- Modify the
transaction_write_set_extraction
flag:transaction_write_set_extraction=XXHASH64
- Modify the
binlog_transaction_dependency_tracking
flag:binlog_transaction_dependency_tracking=WRITESET
Recreate the replica after stopping all running queries.