This page explains how to debug node issues on Google Distributed Cloud using a suite of preinstalled
debugging tools.
Overview
Each Google Distributed Cloud cluster you create is composed of several
nodes. Each node includes a distribution ofCoreOS'toolbox, a shell
script that unpacks and runs a debugging container,debug-toolbox.debug-toolboxis a container image that includes several useful debuggingtools.
If you encounter issues with a specific node, you can attempt debugging by
connecting to the affected node, run thetoolboxscript to unpack and run thedebug-toolboxcontainer, and run the tools included in the container.
Tools included indebug-toolboxcontainer
Thedebug-toolboxcontainer runs a Debian base image that includes the
following packages:
bash
curl
dnsutils
hping3
iperf3
lsof
netcat
mtr
procps
strace
tcpdump
traceroute
util-linux
Since these tools are included in the container, they don't require an internet
connection. If you want to install additional debugging tools, you useapt-get, which does require an internet connection.
While inside the container, run one of thetools. For example,tcpdump.
When you're finished, exit the container and close the SSH connection to the
node.
Node Problem Detector
Beginning with Google Distributed Cloud version 1.4,Node Problem
Detector,
which is enabled for all the nodes in a cluster, helps in quick detection of
some common node problems. Node Problem Detector keeps checking for possible
problems and reports the same as events and conditions on the node. If a node
misbehaves, you can check whether Node Problem Detector detected the problem by
runningkubectl describeon the node and looking for the corresponding events
and conditions.
Node Problem Detector monitors generate several conditions on the node. If the
reported condition isKubeletUnhealthyorContainerRuntimeUnhealthy, a
restart of the correspondingsystemdservice (kubelet or Docker) might help in
making the node healthy again.
Beginning with Google Distributed Cloud version 1.5, kubelet and docker
systemd service auto repair is enabled in Node Problem Detector. If
Node Problem Detector detects aKubeletUnhealthyorContainerRuntimeUnhealthycondition on the node, it tries to restart the
kubelet or docker service automatically if the duration since last restart is
above a certain threshold.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003eGoogle Distributed Cloud nodes include a \u003ccode\u003edebug-toolbox\u003c/code\u003e container with pre-installed debugging tools for troubleshooting.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003etoolbox\u003c/code\u003e script allows users to access the \u003ccode\u003edebug-toolbox\u003c/code\u003e container on a node to utilize debugging tools like \u003ccode\u003etcpdump\u003c/code\u003e without needing an internet connection.\u003c/p\u003e\n"],["\u003cp\u003eNode Problem Detector, available from Google Distributed Cloud version 1.4, helps to quickly detect and report common node issues as events and conditions.\u003c/p\u003e\n"],["\u003cp\u003eStarting with version 1.5, Node Problem Detector can automatically restart the kubelet or Docker service if an unhealthy condition is detected and has persisted beyond a set threshold.\u003c/p\u003e\n"],["\u003cp\u003eThe tools included in the \u003ccode\u003edebug-toolbox\u003c/code\u003e are, bash, curl, dnsutils, hping3, iperf3, lsof, netcat, mtr, procps, strace, tcpdump, traceroute and util-linux.\u003c/p\u003e\n"]]],[],null,["# Debugging node issues\n\n\u003cbr /\u003e\n\nThis page explains how to debug node issues on Google Distributed Cloud using a suite of preinstalled\ndebugging tools.\n\nOverview\n--------\n\nEach Google Distributed Cloud cluster you create is composed of several\nnodes. Each node includes a distribution of\n[CoreOS' `toolbox`](https://github.com/coreos/toolbox), a shell\nscript that unpacks and runs a debugging container, `debug-toolbox`.\n`debug-toolbox` is a container image that includes several useful debugging\n[tools](#tools).\n\nIf you encounter issues with a specific node, you can attempt debugging by\nconnecting to the affected node, run the `toolbox` script to unpack and run the\n`debug-toolbox` container, and run the tools included in the container.\n\n### Tools included in `debug-toolbox` container\n\nThe `debug-toolbox` container runs a Debian base image that includes the\nfollowing packages:\n\n- bash\n- curl\n- dnsutils\n- hping3\n- iperf3\n- lsof\n- netcat\n- mtr\n- procps\n- strace\n- tcpdump\n- traceroute\n- util-linux\n\nSince these tools are included in the container, they don't require an internet\nconnection. If you want to install additional debugging tools, you use\n`apt-get`, which does require an internet connection.\n\nUsing `toolbox`\n---------------\n\n1. [SSH into the cluster node](/anthos/clusters/docs/on-prem/1.10/how-to/ssh-cluster-node).\n\n2. Run the `toolbox` command:\n\n ```\n sudo toolbox\n ```\n\n This command starts a `debug-toolbox` container.\n3. While inside the container, run one of the [tools](#tools). For example,\n `tcpdump`.\n\n4. When you're finished, exit the container and close the SSH connection to the\n node.\n\nNode Problem Detector\n---------------------\n\nBeginning with Google Distributed Cloud version 1.4, [Node Problem\nDetector](https://github.com/kubernetes/node-problem-detector),\nwhich is enabled for all the nodes in a cluster, helps in quick detection of\nsome common node problems. Node Problem Detector keeps checking for possible\nproblems and reports the same as events and conditions on the node. If a node\nmisbehaves, you can check whether Node Problem Detector detected the problem by\nrunning `kubectl describe` on the node and looking for the corresponding events\nand conditions.\n\nNode Problem Detector monitors generate several conditions on the node. If the\nreported condition is `KubeletUnhealthy` or `ContainerRuntimeUnhealthy`, a\nrestart of the corresponding `systemd` service (kubelet or Docker) might help in\nmaking the node healthy again.\n\nBeginning with Google Distributed Cloud version 1.5, kubelet and docker\nsystemd service auto repair is enabled in Node Problem Detector. If\nNode Problem Detector detects a `KubeletUnhealthy` or\n`ContainerRuntimeUnhealthy` condition on the node, it tries to restart the\nkubelet or docker service automatically if the duration since last restart is\nabove a certain threshold."]]