Version 1.15. This version is no longer supported. For information about how to upgrade to version 1.16, seeUpgrade clustersin the latest documentation. For more information about supported and unsupported versions, see theVersion historypage in the latest documentation.
This page shows you how to resolve issues with the Kubernetes API server
(kube-apiserver) for Google Distributed Cloud.
If you need additional assistance, reach out toCloud Customer Care.
You can also seeGetting supportfor more information about support resources, including the following:
If the webhook requires more time to complete, you canconfigure a custom timeout value.
The webhooks latency adds to API request latency, so should be evaluated as
quickly as possible.
If the webhook error blocks cluster availability or the webhook is harmless
to remove and mitigates the situation, check if it's possible to temporarily
set thefailurePolicytoIgnoreor remove the offending webhook.
API server dial failure or latency
This error might be seen in a few different ways:
External name resolution errors:An external client might return errors
that containlookupin the message, such as:
dial tcp: lookup kubernetes.example.com on 127.0.0.1:53: no such host
This error doesn't apply to a client running within the cluster. The
Kubernetes Service IP is injected, so no resolution is required.
Network errors:The client might print a generic network error when trying
to dial the API server, like the following examples:
dial tcp 10.96.0.1:443: connect: no route to host
dial tcp 10.96.0.1:443: connect: connection refused
dial tcp 10.96.0.1:443: connect: i/o timeout
High latency connecting to API server:The connection to API server might
be successful, but the requests timeout on the client side. In this scenario,
the client usually prints error messages containingcontext deadline
exceeded.
If the connection to the API server fails completely, try the connection within
the same environment where the client reports the error.Kubernetes ephemeral containerscan be used to inject a debugging container to the existing namespaces as
follows:
From where the problematic client runs, usekubectlto perform a request
with high verbosity. For example, aGETrequest to/healthzusually
requires no authentication:
kubectlget-v999--raw/healthz
If the request fails orkubectlis unavailable, you can obtain the URL from
the output and manually perform the request withcurl. For example, if the
service host obtained from the previous output washttps://192.0.2.1:36917/,
you can send a similar request as follows:
# Replace "--ca-cert /path/to/ca.pem" to "--insecure" if you are accessing# a local cluster and you trust the connection cannot be tampered.# The output is always "ok" and thus contains no sensentive information.curl-v--cacert/path/to/ca.pemhttps://192.0.2.1:36917/healthz
The output from this command usually indicates the root cause of a failed
connection.
If the connection is successful but is slow or times out, it indicates an
overloaded API server. To confirm, in the console look atAPI
Server Request Rateand request latency metrics inCloud Kubernetes > Anthos >
Cluster > K8s Control Plane.
To resolve these connection failures or latency problems, review the following
remediation options:
If a network error occurs within the cluster, there might be problem with the
Container Network Interface (CNI) plugin. This problem is usually transient
and resolves itself after a Pod recreation or reschedule.
If the network error is from outside the cluster, check if the client is
properly configured to access the cluster, or generate the client
configuration again. If the connection goes through a proxy or gateway, check
if another connection that goes through the same mechanism works.
If the API server is overloaded, it usually means that many clients access the
API server at the same time. A single client can't overload an API server due
to throttling and thePriority and Fairnessfeature. Review the workload for the following areas:
Works at Pod level. It's more common to mistakenly create and forget Pods
than higher level resources.
Adjust the number of replicas through erroneous calculation.
A webhook that loops back the request to itself or amplifies the load by
creating more requests than it handles.
What's next
If you need additional assistance, reach out toCloud Customer Care.
You can also seeGetting supportfor more information about support resources, including the following:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003eThis document outlines troubleshooting steps for issues with the Kubernetes API server (\u003ccode\u003ekube-apiserver\u003c/code\u003e) in Google Distributed Cloud, including webhook timeouts, failed webhook calls, and API server dial failures or latency.\u003c/p\u003e\n"],["\u003cp\u003eWebhook-related issues can manifest as "connection refused" or "context deadline exceeded" errors, and can be confirmed by examining the API server logs for network errors or using Cloud Monitoring to check webhook latency.\u003c/p\u003e\n"],["\u003cp\u003eAPI server dial failures or latency might show as external name resolution errors, generic network errors, or timeouts, and can be diagnosed by using \u003ccode\u003ekubectl\u003c/code\u003e or \u003ccode\u003ecurl\u003c/code\u003e to test connectivity, or by reviewing the API Server Request Rate.\u003c/p\u003e\n"],["\u003cp\u003eSolutions for webhook issues include adding firewall rules, configuring custom timeouts, or temporarily adjusting the \u003ccode\u003efailurePolicy\u003c/code\u003e, while solutions for API server connectivity issues involve checking network configurations, client settings, or addressing potential server overload.\u003c/p\u003e\n"],["\u003cp\u003eIf the connection is successful but slow, the document indicates that the server is overloaded, and gives further steps to determine the issue by reviewing the workload and using the \u003ccode\u003eAPI Server Request Rate\u003c/code\u003e and request latency metrics.\u003c/p\u003e\n"]]],[],null,["# Troubleshoot the Kubernetes API server\n\n\u003cbr /\u003e\n\nThis page shows you how to resolve issues with the Kubernetes API server\n(`kube-apiserver`) for Google Distributed Cloud.\nIf you need additional assistance, reach out to [Cloud Customer Care](/support-hub). You can also see [Getting support](/anthos/clusters/docs/bare-metal/1.15/getting-support) for more information about support resources, including the following:\n\n- [Requirements](/kubernetes-engine/distributed-cloud/bare-metal/docs/getting-support#intro-support) for opening a support case.\n- [Tools](/kubernetes-engine/distributed-cloud/bare-metal/docs/getting-support#support-tools) to help you troubleshoot, such as your environment configuration, logs, and metrics.\n- Supported [components](/kubernetes-engine/distributed-cloud/bare-metal/docs/getting-support#what-we-support).\n\nWebhook timeouts and failed webhook calls\n-----------------------------------------\n\nThese errors might be seen in a few different ways. If you experience any of the\nfollowing symptoms, it's possible that webhook calls are failing:\n\n- **Connection refused:** If `kube-apiserver` reports timeout errors for\n calling the webhook, the following error is reported in the logs:\n\n failed calling webhook \"server.system.private.gdc.goog\":\n failed to call webhook: Post \"https://root-admin-webhook.gpc-system.svc:443/mutate-system-private-gdc-goog-v1alpha1-server?timeout=10s\":\n dial tcp 10.202.1.18:443: connect: connection refused\n\n- **Context deadline exceeded:** You might also see the following error reported\n in the logs:\n\n failed calling webhook \"namespaces.hnc.x-k8s.io\": failed to call webhook: Post\n \"https://hnc-webhook-service.hnc-system.svc:443/validate-v1-namespace?timeout=10s\\\":\n context deadline exceeded\"\n\nIf you think that you are experiencing webhook timeouts or failed webhook calls,\nuse one of the following methods to confirm the issue:\n\n- Check the API server log to see if there is network issue.\n\n - Check the log for network-related errors like `TLS handshake error`.\n - Check if the IP/Port matches what the API server is configured to respond on.\n- Monitor webhook latency with the following steps:\n\n 1. In the console, go to the Cloud Monitoring page.\n\n [Go to the Cloud Monitoring page](https://console.cloud.google.com/monitoring/)\n 2. Select **Metrics explorer**.\n\n 3. Select the `apiserver_admission_webhook_admission_duration_seconds` metric.\n\nTo resolve this issue, review the following suggestions:\n\n- Additional firewall rules might be required for the webhook. For more\n information, see how to\n [add firewall rules for specific use cases](/kubernetes-engine/docs/how-to/private-clusters#add_firewall_rules).\n\n- If the webhook requires more time to complete, you can\n [configure a custom timeout value](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#timeouts).\n The webhooks latency adds to API request latency, so should be evaluated as\n quickly as possible.\n\n- If the webhook error blocks cluster availability or the webhook is harmless\n to remove and mitigates the situation, check if it's possible to temporarily\n set the `failurePolicy` to `Ignore` or remove the offending webhook.\n\nAPI server dial failure or latency\n----------------------------------\n\nThis error might be seen in a few different ways:\n\n- **External name resolution errors:** An external client might return errors\n that contain `lookup` in the message, such as:\n\n dial tcp: lookup kubernetes.example.com on 127.0.0.1:53: no such host\n\n This error doesn't apply to a client running within the cluster. The\n Kubernetes Service IP is injected, so no resolution is required.\n- **Network errors:** The client might print a generic network error when trying\n to dial the API server, like the following examples:\n\n dial tcp 10.96.0.1:443: connect: no route to host\n dial tcp 10.96.0.1:443: connect: connection refused\n dial tcp 10.96.0.1:443: connect: i/o timeout\n\n- **High latency connecting to API server:** The connection to API server might\n be successful, but the requests timeout on the client side. In this scenario,\n the client usually prints error messages containing `context deadline\n exceeded`.\n\nIf the connection to the API server fails completely, try the connection within\nthe same environment where the client reports the error.\n[Kubernetes ephemeral containers](https://kubernetes.io/docs/concepts/workloads/pods/ephemeral-containers/)\ncan be used to inject a debugging container to the existing namespaces as\nfollows:\n\n1. From where the problematic client runs, use `kubectl` to perform a request\n with high verbosity. For example, a `GET` request to `/healthz` usually\n requires no authentication:\n\n kubectl get -v999 --raw /healthz\n\n2. If the request fails or `kubectl` is unavailable, you can obtain the URL from\n the output and manually perform the request with `curl`. For example, if the\n service host obtained from the previous output was `https://192.0.2.1:36917/`,\n you can send a similar request as follows:\n\n # Replace \"--ca-cert /path/to/ca.pem\" to \"--insecure\" if you are accessing\n # a local cluster and you trust the connection cannot be tampered.\n # The output is always \"ok\" and thus contains no sensentive information.\n\n curl -v --cacert /path/to/ca.pem https://192.0.2.1:36917/healthz\n\n The output from this command usually indicates the root cause of a failed\n connection.\n | **Note:** You can't use the `ping` or `traceroute` commands to the IP address. A Kubernetes Service IP doesn't accept ICMP or protocols outside the list defined in the Service resource.\n\n If the connection is successful but is slow or times out, it indicates an\n overloaded API server. To confirm, in the console look at `API\n Server Request Rate` and request latency metrics in `Cloud Kubernetes \u003e Anthos \u003e\n Cluster \u003e K8s Control Plane`.\n\nTo resolve these connection failures or latency problems, review the following\nremediation options:\n\n- If a network error occurs within the cluster, there might be problem with the\n Container Network Interface (CNI) plugin. This problem is usually transient\n and resolves itself after a Pod recreation or reschedule.\n\n- If the network error is from outside the cluster, check if the client is\n properly configured to access the cluster, or generate the client\n configuration again. If the connection goes through a proxy or gateway, check\n if another connection that goes through the same mechanism works.\n\n- If the API server is overloaded, it usually means that many clients access the\n API server at the same time. A single client can't overload an API server due\n to throttling and the\n [Priority and Fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control/)\n feature. Review the workload for the following areas:\n\n - Works at Pod level. It's more common to mistakenly create and forget Pods than higher level resources.\n - Adjust the number of replicas through erroneous calculation.\n - A webhook that loops back the request to itself or amplifies the load by creating more requests than it handles.\n\nWhat's next\n-----------\n\nIf you need additional assistance, reach out to [Cloud Customer Care](/support-hub). You can also see [Getting support](/anthos/clusters/docs/bare-metal/1.15/getting-support) for more information about support resources, including the following:\n\n- [Requirements](/kubernetes-engine/distributed-cloud/bare-metal/docs/getting-support#intro-support) for opening a support case.\n- [Tools](/kubernetes-engine/distributed-cloud/bare-metal/docs/getting-support#support-tools) to help you troubleshoot, such as your environment configuration, logs, and metrics.\n- Supported [components](/kubernetes-engine/distributed-cloud/bare-metal/docs/getting-support#what-we-support)."]]