Conduct historical analysis with Cloud Logging


When a Pod fails or a service doesn't work as expected in Google Kubernetes Engine (GKE), understanding the sequence of events leading up to the issue is critical. Inspecting the current state isn't always enough to find the root cause, making historical log data invaluable.

Use this page to learn how to use Cloud Logging to investigate past failures (such as why a Pod failed to start or who deleted a critical Deployment) by querying and analyzing GKE logs.

This information is important for Platform admins and operators who need to perform root cause analysis on cluster-wide issues, audit changes, and understand system behavior trends. It's also essential for Application developers for debugging application-specific errors, tracing request paths, and understanding how their code behaves in the GKE environment over time. For more information about the common roles and example tasks that we reference in Google Cloud content, see Common GKE user roles and tasks .

Understand key log types for troubleshooting

To help you troubleshoot, Cloud Logging automatically collects and aggregates several key log types from your GKE clusters, containerized apps, and other Google Cloud services:

  • Node and runtime logs ( kubelet , containerd ): the logs from the underlying node services. Because the kubelet manages the lifecycle of all Pods on the node, its logs are essential for troubleshooting issues like container startups, Out of Memory (OOM) events, probe failures, and volume mount errors. These logs are also crucial for diagnosing node-level problems, such as a node that has a NotReady status.

    Because containerd manages the lifecycle of your containers, including pulling images, its logs are crucial for troubleshooting issues that happen before the kubelet can start the container. The containerd logs help you diagnose node-level problems in GKE, because they document the specific activities and potential errors of the container runtime.

  • App logs ( stdout , stderr ): the standard output and error streams from your containerized processes. These logs are essential for debugging app-specific issues like crashes, errors, or unexpected behavior.

  • Audit logs: these logs answer "who did what, where, and when?" for your cluster. They track administrative actions and API calls made to the Kubernetes API server, which is useful for diagnosing issues caused by configuration changes or unauthorized access.

Common troubleshooting scenarios

After you identify an issue, you can query these logs to find out what happened. To help get you started, reviewing logs can help you with these issues:

  • If a node has a NotReady status, review its node logs. The kubelet and containerd logs often reveal the underlying cause, such as network problems or resource constraints.
  • If a new node fails to provision and join the cluster, review the node's serial port logs . These logs capture early boot and kubelet startup activity before the node's logging agents are fully active.
  • If a Pod failed to start in the past, review the app logs for that Pod to check for crashes. If the logs are empty or the Pod can't be scheduled, check the audit logs for relevant events or the node logs on the target node for clues about resource pressure or image pull errors.
  • If a critical Deployment was deleted and no one knows why, query the Admin Activity audit logs. These logs can help you identify which user or service account issued the delete API call, providing a clear starting point for your investigation.

How to access logs

Use Logs Explorer to query, view, and analyze GKE logs in the Google Cloud console. Logs Explorer provides powerful filtering options that help you to isolate your issue.

To access and use Logs Explorer, complete the following steps:

  1. In the Google Cloud console, go to the Logs Explorerpage.

    Go to Logs Explorer

  2. In the query pane, enter a query. Use the Logging query language to write targeted queries. Here are some common filters to get you started:

    Filter type Description Example value
    resource.type
    The type of Kubernetes resource. k8s_cluster , k8s_node , k8s_pod , k8s_container
    log_id
    The log stream from the resource. stdout , stderr
    resource.labels. RESOURCE_TYPE .name
    Filter for resources with a specific name.
    Replace RESOURCE_TYPE with the name of the resource that you want to query. For example, namespace or pod .
    example-namespace-name , example-pod-name
    severity
    The log severity level. DEFAULT , INFO , WARNING , ERROR , CRITICAL
    jsonPayload.message=~
    A regular expression search for text within the log message. scale.down.error.failed.to.delete.node.min.size.reached

    For example, to troubleshoot a specific Pod, you might want to isolate its error logs. To see only logs with an ERROR severity for that Pod, use the following query:

      resource 
     . 
     type 
     = 
     "k8s_container" 
     resource 
     . 
     labels 
     . 
     pod_name 
     = 
     " POD_NAME 
    " 
     resource 
     . 
     labels 
     . 
     namespace_name 
     = 
     " NAMESPACE_NAME 
    " 
     severity 
     = 
     ERROR 
     
    

    Replace the following:

    • POD_NAME : the name of the Pod experiencing issues.
    • NAMESPACE_NAME : the namespace that the Pod is in. If you're not sure what the namespace is, review the Namespace column from the output of the kubectl get pods command.

    For more examples, see Kubernetes-related queries in the Google Cloud Observability documentation.

  3. Click Run query.

  4. To see the full log message, including the JSON payload, metadata, and timestamp, click the log entry.

For more information about GKE logs, see About GKE logs .

What's next

Design a Mobile Site
View Site in Mobile | Classic
Share by: