Investigate a cluster's state with kubectl

Autopilot Standard

Diagnosing the root cause of Google Kubernetes Engine (GKE) issues often requires inspecting the live state, configuration, and events of your Kubernetes resources in detail. To move beyond surface-level symptoms, you need tools to directly query and interact with the cluster's control plane.

Use this page to learn essential kubectl commands for investigating the live state of your cluster. Learning these commands lets you gather detailed information directly from the Kubernetes control plane, helping you understand why a problem is occurring.

This information is important for Platform admins and operators who need to perform in-depth cluster health checks, manage resources, and troubleshoot infrastructure issues at a granular level. It's also essential for Application developers for debugging application behavior, inspecting Pod logs and events, and verifying the exact state of their deployments within the Kubernetes environment. For more information about the common roles and example tasks that we reference in Google Cloud content, see Common GKE user roles and tasks .

Before you begin

Before you start, perform the following tasks:

Install kubectl .
Configure the kubectl command-line tool to communicate with your cluster:
```
 gcloud  
container  
clusters  
get-credentials  
 CLUSTER_NAME 
  
 \ 
  
--location = 
 LOCATION 
 
```
Replace the following:
- CLUSTER_NAME : the name of your cluster.
- LOCATION : the Compute Engine location of the control plane of your cluster. Provide a region for regional clusters, or a zone for zonal clusters.
Review your permissions. To see if you have the required permissions to run kubectl commands, use the kubectl auth can-i command. For example, to see if you have permission to run kubectl get nodes , run the kubectl auth can-i get nodes command.

If you have the required permissions, the command returns yes ; otherwise, the command returns no .

If you lack permission to run a kubectl command, you might see an error message similar to the following:
```
 Error from server (Forbidden): pods " POD_NAME 
" is forbidden: User
" USERNAME 
@ DOMAIN 
.com" cannot list resource "pods" in API group "" in the
namespace "default" 
```
If you don't have the required permissions, ask your cluster administrator to assign the necessary roles to you.

Get an overview of what's running

The kubectl get command helps you to see an overall view of what's happening in your cluster. Use the following commands to see the status of two of the most important cluster components, nodes and Pods:

To check if your nodes are healthy, view details about all nodes and their statuses:

 kubectl  
get  
nodes

The output is similar to the following:

 NAME                                        STATUS   ROLES    AGE     VERSION

gke-cs-cluster-default-pool-8b8a777f-224a   Ready    <none>   4d23h   v1.32.3-gke.1785003
gke-cs-cluster-default-pool-8b8a777f-egb2   Ready    <none>   4d22h   v1.32.3-gke.1785003
gke-cs-cluster-default-pool-8b8a777f-p5bn   Ready    <none>   4d22h   v1.32.3-gke.1785003

Any status other than Ready requires additional investigation.

To check if your Pods are healthy, view details about all Pods and their statuses:
```
 kubectl  
get  
pods  
--all-namespaces 
```
The output is similar to the following:
```
 NAMESPACE   NAME       READY   STATUS      RESTARTS   AGE
kube-system netd-6nbsq 3/3     Running     0          4d23h
kube-system netd-g7tpl 3/3     Running     0          4d23h 
```
Any status other than Running requires additional investigation. Here are some common statuses that you might see:
- Running : a healthy, running state.
- Pending : the Pod is waiting to be scheduled on a node.
- CrashLoopBackOff : the containers in the Pod are repeatedly crashing in a loop because the app starts, exits with an error, and is then restarted by Kubernetes.
- ImagePullBackOff : the Pod can't pull the container image.

The preceding commands are only two examples of how you can use the kubectl get command. You can also use the command to learn more about many types of Kubernetes resources. For a full list of the resources that you can explore, see kubectl get in the Kubernetes documentation.

Learn more about specific resources

After you identify a problem, you need to get more details. An example of a problem could be a Pod that doesn't have a status of Running . To get more details, use the kubectl describe command.

For example, to describe a specific Pod, run the following command:

 kubectl  
describe  
pod  
 POD_NAME 
  
-n  
 NAMESPACE_NAME

Replace the following:

POD_NAME : the name of the Pod experiencing issues.
NAMESPACE_NAME : the namespace that the Pod is in. If you're not sure what the namespace is, review the Namespace column from the output of the kubectl get pods command.

The output of the kubectl describe command includes detailed information about your resource. Here are some of the most helpful sections to review when you troubleshoot a Pod:

Status : the current status of the Pod.
Conditions : the overall health and readiness of the Pod.
Restart Count : how many times the containers in the Pod have restarted. High numbers can be a cause of concern.
Events : a log of important things that have happened to this Pod, like being scheduled to a node, pulling its container image, and whether any errors occurred. The Events section is often where you can find the direct clues to why a Pod is failing.

Like the kubectl get command, you can use the kubectl describe command to learn more about multiple types of resources. For a full list of the resources that you can explore, see kubectl describe in the Kubernetes documentation.

What's next

Read Conduct historical analysis with Cloud Logging (the next page in this series).
See these concepts applied in the example troubleshooting scenario .
For advice about resolving specific problems, review GKE's troubleshooting guides .
If you can't find a solution to your problem in the documentation, see Get support for further help, including advice on the following topics:
- Opening a support case by contacting Cloud Customer Care .
- Getting support from the community by asking questions on StackOverflow and using the google-kubernetes-engine tag to search for similar issues. You can also join the #kubernetes-engine Slack channel for more community support.
- Opening bugs or feature requests by using the public issue tracker .