This document explains how to use the host maintenance features that are available from the Cluster Director suite. It explains how to monitor, plan for, and perform scheduled maintenance on virtual machine (VM) instances. To manage maintenance on your reserved blocks of capacity, whether or not VMs are running on them, see instead Manage host events across reservations .
When you proactively manage upcoming maintenance host events on your VMs, you can minimize disruptions and maintain optimal performance.
Before you begin
Select the tab for how you plan to use the samples on this page:
Console
When you use the Google Cloud console to access Google Cloud services and APIs, you don't need to set up authentication.
gcloud
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
REST
To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.
Install the Google Cloud CLI. After installation, initialize the Google Cloud CLI by running the following command:
gcloud init
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity .
For more information, see Authenticate for using REST in the Google Cloud authentication documentation.
Required roles
To get the permissions that you need to manage host maintenance events across VMs, ask your administrator to grant you the following IAM roles:
- Compute Admin
(
roles/compute.admin
) on the project - For read-only access to System Event audit logs: Logs Viewer
(
roles/logging.viewer
) on the project
For more information about granting roles, see Manage access to projects, folders, and organizations .
These predefined roles contain the permissions required to manage host maintenance events across VMs. To see the exact permissions that are required, expand the Required permissionssection:
Required permissions
The following permissions are required to manage host maintenance events across VMs:
- To view the details of a VM:
compute.instances.get
on the project
You might also be able to get these permissions with custom roles or other predefined roles .
Overview
To optimize the maintenance of your VMs, complete the following steps:
-
Understand host maintenance. Learn about the frequency and maintenance behavior of your VMs based on their machine series. This information helps you minimize disruptions to your workloads.
-
Set up notification alerts. Create log-based alerts to receive notifications when maintenance for your VMs is scheduled, started, or completed. This approach helps you proactively plan your activities and avoid unexpected downtime.
-
Manage maintenance across VMs. View if maintenance is scheduled for your VMs. If needed, you can manually start maintenance across your VMs. This process helps you increase the resilience of your workloads to host events, prevent downtime, and maximize the availability of your applications.
Understand host maintenance
During the lifecycle of a Compute Engine instance , the host machine that your instance runs on undergoes multiple host events . A host event can include the regular maintenance of Compute Engine infrastructure, or in rare cases, a host error. Compute Engine also applies some non-disruptive lightweight upgrades for the hypervisor and network in the background.
The following table describes the host maintenance features for accelerator-optimized machine types:
Machine type | Maintenance frequency | Behavior | Advanced notification | On-demand maintenance | Simulate maintenance |
---|---|---|---|---|---|
A4
|
Minimum of 90 days | Terminates with Local SSD data persistence | 90 days | Yes | No |
A3 Ultra
|
Minimum of 90 days | Terminates with Local SSD data persistence | 90 days | Yes | No |
Compute Engine might perform maintenance more frequently.
Set up notification alerts for VMs
You can get notified about scheduled, started, or completed maintenance events for your VMs by creating log-based alerting policies .
To create an alert for the maintenance events of your VMs, complete the following procedure. Repeat this procedure for each alert that you want to create.
-
In the Google Cloud console, go to the Logs Explorer page:
If you use the search bar to find this page, then select the result whose subheading is Logging .
-
Click the Show querytoggle to the on position.
-
In the Querypane, build one of the following queries. These queries filter log entries to identify specific maintenance events. If you want to use multiple queries, repeat this procedure to create an unique alert for each query.
-
To receive alerts when maintenance for a VM is scheduled:
protoPayload.methodName="compute.instances.upcomingMaintenance" severity>=DEFAULT protoPayload.status.message =~ "scheduled"
-
To receive alerts when the maintenance window for a VM has opened:
protoPayload.methodName="compute.instances.upcomingMaintenance" severity>=DEFAULT protoPayload.status.message =~ "ongoing"
-
To receive alerts when maintenance for a VM has started:
protoPayload.methodName="compute.instance.terminateOnHostMaintenance" severity>=DEFAULT
-
To receive alerts when maintenance for a VM has completed:
protoPayload.methodName="compute.instances.upcomingMaintenance" severity>=DEFAULT protoPayload.status.message =~ "completed"
-
-
To validate the query, click Run query. If the query is valid, then the Query resultspane displays log entries that match the query.
-
In the Queryresults toolbar, click the Actionslist, and then select Create log alert. The Create logs-based alert policypane appears.
-
In the Alert detailssection, do the following:
-
In the Alert Policy Namefield, enter a name for the policy.
-
In the Policy severity levellist, select Warning(or a higher severity).
-
Click Next.
-
-
In the Choose logs to include in the alertsection, click Next.
-
In the Set notification frequency and autoclose durationsection, specify the following:
-
In the Time between notificationslist, select how often you want to be notified.
-
In the Incident autoclose durationlist, select after how long Cloud Logging stops sending notifications and automatically closes the incident.
-
Click Next.
-
-
In the Who should be notified?section, specify a notification channel for Logging to send notifications to.
-
Click Save.
To view examples of maintenance event notifications in the Logs Explorer, see Examples of maintenance notifications in the Compute Engine documentation.
Manage maintenance across VMs
You can view and control maintenance for your VMs by doing one or more of the following:
-
To check the state and scheduled time of upcoming maintenance for your VMs, view the maintenance state of VMs .
-
To immediately start maintenance on your VMs, rather than waiting for their scheduled maintenance time, manually start maintenance on VMs .
View the maintenance state of VMs
You can view the state and scheduled time of upcoming maintenance for your VMs
by checking the value of the upcomingMaintenance
field in the instance's
metadata. If a VM doesn't contain the upcomingMaintenance
field, then no host
maintenance event is scheduled for the VM. For more information about the fields
in upcomingMaintenance
, see Maintenance status definitions
in the Compute Engine documentation.
You can view the maintenance state for multiple VMs simultaneously or for individual VMs. For multiple VMs, use the Google Cloud console or REST API. For individual VMs, select any of the following options:
Console
-
In the Google Cloud console, go to the VM instancespage.
-
In the Maintenance statuscolumn, Compute Engine displays the maintenance state of your VMs. If you don't see this column in the VM instancestable, then click view_column Column display options, select the Maintenance statuscheckbox, and then click OK.
gcloud
To view the maintenance state of a VM, use the gcloud compute instances describe
command
with the --flatten=resourceStatus.upcomingMaintenance
flag:
gcloud compute instances describe VM_NAME
\
--flatten=resourceStatus.upcomingMaintenance \
--zone= ZONE
Replace the following:
-
VM_NAME
: the VM name. -
ZONE
: the zone where the VM exists.
The output is similar to one of the following:
-
If a host maintenance event is scheduled for your VM, then the output is similar to the following:
--- canReschedule: true latestWindowStartTime: '2024-12-01T19:00:00Z' machineType: 'a4-highgpu-8g' maintenanceStatus: 'PENDING' type: 'SCHEDULED' windowEndTime: '2024-12-01T22:00:00Z' windowStartTime: '2024-12-01T19:00:00Z'
-
If a host maintenance event isn't scheduled for your VM, then the output is similar to the following:
--- null
REST
To view the maintenance state of your VMs, make one of the following GET
requests. When you make a request, you must include the fields
query
parameter to only show the name, machine type, and upcoming maintenance for
a VM. You must also include the filter
query parameter to only filter VMs
by a specific machine type.
-
To view VMs across all zones:
instances.aggregatedList
method .GET https://compute.googleapis.com/compute/v1/projects/ PROJECT_ID /aggregated/instances?fields=items.name,items.machineType,items.upcomingMaintenance&filter=machineType%3A MACHINE_TYPE
-
To view VMs in a specific zone:
instances.list
method .GET https://compute.googleapis.com/compute/v1/projects/ PROJECT_ID /zones/ ZONE /instances?fields=items.name,items.machineType,items.upcomingMaintenance&filter=machineType%3A MACHINE_TYPE
Replace the following:
-
PROJECT_ID
: the ID of the project where you created VMs. -
ZONE
: the zone where the VMs exist. -
MACHINE_TYPE
: the machine type that you want to filter the VMs by.
If a host maintenance event is scheduled for a VM, then the VM contains the upcomingMaintenance
field:
{
"items": [
{
"name": "vm-01",
"machineType": "https://www.googleapis.com/compute/v1/projects/example-project/zones/europe-west1-b/machineTypes/a3-ultragpu-8g",
"resourceStatus": { "upcomingMaintenance": {
"canReschedule": true,
"latestWindowStartTime": "2024-12-01T19:00:00Z",
"machineType": "a3-ultragpu-8g",
"maintenanceStatus": "PENDING",
"type": "SCHEDULED",
"windowEndTime": "2024-12-01T22:00:00Z",
"windowStartTime": "2024-12-01T19:00:00Z"
}
}
},
...
]
}
Optionally, to further narrow down a list of VMs, set the filter
query
parameter to a different filter expression
.
Metadata server
To view the maintenance state of a VM, do the following:
-
If you haven't already, then connect to your Linux or Windows VM .
-
Query the metadata server as follows:
curl http://metadata.google.internal/computeMetadata/v1/instance/upcoming-maintenance?alt=json -H "Metadata-Flavor: Google"
If a host maintenance event is scheduled for your VM, then the output is similar to the following:
"Upcoming maintenance": { "can_reschedule": "true", "latest_window_start_time": "2024-12-01T19:00:01Z", "machineType": "a4-highgpu-8g", "maintenance_status": "PENDING", "type": "SCHEDULED", "window_end_time": "2024-12-01T21:00:01Z", "window_start_time": "2024-12-01T19:00:01Z" }
If a host maintenance event isn't scheduled, then the output is similar to the following:
{ }
Manually start maintenance on VMs
You can manually start maintenance for your VMs instead of waiting for the scheduled time.
Depending on the maintenance state of a VM, the following occurs:
- In the Google Cloud console, the maintenance state shows as Ready to run - will run on DATE .
- In the gcloud CLI or REST API,
Compute Engine sets the
maintenanceStatus
field toPENDING
.
- In the Google Cloud console, the maintenance state shows as Running .
- In the gcloud CLI or REST API,
Compute Engine sets the
maintenanceStatus
field toONGOING
.
- In the Google Cloud console, the maintenance state shows as Up-to-date .
- In the gcloud CLI or REST API,
Compute Engine sets the
maintenanceStatus
field toCOMPLETE
.
You can manually start maintenance for multiple VMs simultaneously or for individual VMs. For multiple VMs, use the Google Cloud console or, for VMs located in the same zone, the gcloud CLI. For individual VMs, select any of the following options:
Console
-
In the Google Cloud console, go to the VM instancespage.
-
Select the rows for the VMs where you want to start maintenance.
-
Click Run maintenance.
-
To confirm, click Run maintenance.
gcloud
To manually start maintenance for one or more VMs within the same zone, use
the gcloud compute instances perform-maintenance
command
:
gcloud compute instances perform-maintenance VM_NAMES
\
--zone= ZONE
Replace the following:
-
VM_NAMES
: a list of VM names separated by spaces; for example,vm-01 vm-02 vm-03
. -
ZONE
: the zone where the VMs exist.
REST
To manually start maintenance for a VM, make a POST
request to the instances.performMaintenance
method
:
POST https://compute.googleapis.com/compute/v1/projects/ PROJECT_ID
/zones/ ZONE
/instances/ VM_NAME
/performMaintenance
Replace the following:
-
PROJECT_ID
: the ID of the project where you created the VM. -
ZONE
: the zone where the VM exists. -
VM_NAME
: the VM name.