Manage incidents for log-based alerting policies

An incident is a record of when the condition of an alerting policy is met. Typically, when a condition is met, Cloud Monitoring opens an incident and sends a notification when a log is received that matches the condition of your alerting policy. However, incidents aren't created under the following circumstances:
  • The policy is snoozed or disabled.
  • The maximum rate of notifications would exceed the limit of 1 notification every 5 minutes for each log-based alerting policy.
  • The daily total of notifications would exceed the limit of 20 notifications a day for each log-based alerting policy.
  • Another log entry causes the same condition to be met for an open incident. In this case, Monitoring only sends another notification for the same incident.

For each incident, Monitoring creates an Incident detailspage that lets you manage the incident, and that reports incident information that can help you troubleshoot the failure. For example, the Incident detailspage shows a list of log entries that match the query of the log-based alerting policy. You can also find links to related incidents.

This document describes how you can find your incidents. It also describes how you can use the Incident detailspage to manage incidents for log-based alerting policies, which evaluate log entry data stored in individual logs in Cloud Logging.

Before you begin

Ensure that you have the permissions that you need:

To get the permissions that you need to view and manage incidents by using the Google Cloud console, ask your administrator to grant you the following IAM roles on your project:

For more information about granting roles, see Manage access to projects, folders, and organizations .

You might also be able to get the required permissions through custom roles or other predefined roles .

For more information about Cloud Monitoring roles, see Control access with Identity and Access Management .

Find incidents

To see a list of incidents in your Google Cloud project, do the following:

  1. In the Google Cloud console, go to the Alerting page:

    Go to Alerting

    If you use the search bar to find this page, then select the result whose subheading is Monitoring .

    • The Summarypane lists the number of open incidents.
    • The Incidentspane displays the most recent open incidents. To list the most recent incidents in the table, including those that are closed, click Show closed incidents.
  2. To view the details of a specific incident, select the incident in the list.

    The Incident detailspage opens. For more information about the Incident detailspage, see the Investigate an incident section of this page.

Find older incidents

The Incidentspane on the Alertingpage shows the most recent open incidents. To locate older incidents, do one of the following:

  • To page through the entries in the Incidentstable, click Neweror Older.

  • To navigate to the Incidentspage, click See all incidents. From the Incidentspage, you can do all the following:

    • Show closed incidents: To list all incidents in the table, click Show closed incidents.
    • Filter incidents: For information about adding filters, see Filter incidents .
    • Acknowledge or close an incident, or snooze its alerting policy. To access these options, click More optionsin the incident's row, and make a selection from the menu. For more information, see Manage incidents .

Filter incidents

When you enter a value on the filter bar, only incidents that match the filter are listed in the Incidentstable. If you add multiple filters, then an incident is displayed only if it satisfies all the filters.

To add a filter the table of incidents, do the following:

  1. On the Incidentspage, click Filter tableand then select a filter property. Filter properties include all the following:

    • State of the incident
    • Name of the alerting policy
    • When the incident was opened or closed
  2. Select a value from the secondary menu or enter a value in the filter bar.

Investigate an incident

The Incident detailspage contains information that may help you identify cause of an incident.

Explore log entries

Explore log entries to find patterns and recurring issues related to your investigation. The Logspane shows log entries that match the query of your log-based alerting policy.

  • To view the log entries in the Logs Explorer , click View in Logs Explorer, and then select a scoping project.
  • To view the Logs Panel in the Metrics Explorer , click Explore Data.

View supplementary information

The Labelssection shows the labels and values for the monitored resource included in the log entry that caused the incident. This information can help you identify the specific monitored resource that caused the incident. For more information, see Annotate incidents with labels .

The Documentationsection shows the documentation template for notifications that you provided when creating the alerting policy. This information might include a description of what the alerting policy monitors and include tips for mitigation. For more information, see Annotate notifications with user-defined documentation .

If you didn't configure documentation for your alerting policy, then the Documentationpane shows "No documentation is configured."

To help you discover underlying issues across your application, you can explore incidents related to other alerting policy conditions.

The Related Incidentssection shows a list of incidents that match one of the following:
  • The incident was created when a condition of the same alerting policy was met.
  • The incident shares a label with the incident shown on the Incident details page.

Manage incidents

Incidents are in one of the following states:

  •  Open: The condition of the log-based alerting policy was met, and the incident is still open. If the same condition is met again and there is already an incident open, then a new incident isn't opened.

  •  Acknowledged: The incident is open and has manually been marked as acknowledged. Typically, this status indicates that the incident is being investigated.

  •  Closed: You have manually closed the incident, or it was automatically closed after the auto-close period expired.

Acknowledge incidents

We recommend that you mark an incident as acknowledged when you begin investigating the cause of the incident.

To mark an incident as acknowledged, do the following:

  1. In the Incidentspane of the Alertingpage, click See all incidents.
  2. On the Incidentspage, find the incident that you want to acknowledge, and then do one of the following:

    • Click More optionsand then select Acknowledge.
    • Open the details page for the incident and then click Acknowledge incident.

Snooze an alerting policy

To prevent Monitoring from creating incidents and sending notifications during a specific time period, snooze the related alerting policy. When you snooze an alerting policy, incidents related to the alerting policy remain open but don't cause further notifications. The incidents close based on the alerting policy auto-close duration.

To create a snooze for an incident that you are viewing, do the following:

  1. On the Incident detailspage, click Snooze Policy.

  2. Select the snooze duration. After you select the snooze duration, the snooze begins immediately.

You can also snooze an alerting policy from the Incidentspage by finding the incident that you want to snooze, clicking More options, and then selecting Snooze. You can snooze alerting policies during outages to prevent further notifications during the troubleshooting process.

Close incidents

You can let Monitoring close an incident for you, or you can close the incident.

Monitoring automatically closes an incident when the auto-close duration for the alerting policy expires. By default, the auto-close duration is 7 days. The minimum auto-close duration is 30 minutes.

To close an incident, do the following:

  1. In the Incidentspane of the Alertingpage, click See all incidents.
  2. On the Incidentspage, find the incident that you want to close, and then do one of the following:

    • Click View moreand then select Close incident.
    • Open the Incident detailspage for that incident and then click Close incident.
If you see the message Unable to close incident , try again in a few minutes. You can't close a new incident immediately because the conditions that caused the incident are still considered active by the alerting system.

Data retention and limits

For information about limits and about the retention period of incidents, see Limits for alerting .

What's next