Customizing node system configuration

Standard

This document shows you how to customize your Google Kubernetes Engine (GKE) node configuration by using a configuration file called a node system configuration .

A node system configuration is a configuration file that provides a way to adjust a limited set of system settings. In your node pool, you can use a node system configuration to specify custom settings for the kubelet Kubernetes node agent and for sysctl low-level Linux kernel configurations.

This document details the available configurations for a node system configuration and how to apply them to your GKE Standard node pools. Note that because GKE Autopilot clusters have a more managed node environment, their direct node system configuration options are limited compared to GKE Standard node pools.

Why use node system configurations

Node system configurations offer the following benefits:

Performance tuning:optimize network stack performance, memory management, CPU scheduling, or I/O behavior for demanding applications like AI training or serving, databases, high-traffic web servers, or latency-sensitive services.
Security hardening:apply specific kernel-level security settings or restrict certain system behaviors to reduce the attack surface.
Resource management:fine-tune how kubelet manages PIDs, disk space, image garbage collection, or CPU and memory resources.
Workload compatibility:help ensure that the node environment meets specific prerequisites for specialized software or older applications that require particular kernel settings.

Other options for customizing node configurations

You can also customize your node configuration by using other methods:

Runtime configuration file: to customize a containerd container runtime on your GKE nodes, you can use a different file called a runtime configuration file . For more information, see Customize containerd configuration in GKE nodes .
ComputeClass: you can specify node attributes in your GKE ComputeClass specification. You can use ComputeClasses, in both GKE Autopilot mode and Standard mode, in GKE version 1.32.1-gke.1729000 and later. For more information, see Customize the node system configuration .
DaemonSets: you can also use DaemonSets to customize nodes. For more information, see Automatically bootstrapping GKE nodes with DaemonSets .

Node system configurations are not supported in Windows Server nodes.

Before you begin

Before you begin, make sure to do the following:

Install command-line tools:
- If you use the gcloud CLI examples in this document, ensure that you install and configure the Google Cloud CLI .
- If you use the Terraform examples, ensure that you install and configure Terraform .
Grant permissions: you need appropriate IAM permissions to create and update GKE clusters and node pools, such as container.clusterAdmin or a different role with equivalent permissions.
Plan for potential workload disruption: custom node configurations are applied at the node pool level. Changes typically trigger a rolling update of the nodes in the pool, which involves re-creating the nodes . Plan for potential workload disruption and use Pod Disruption Budgets (PDBs) where appropriate.
Back up and test all changes:always test configuration changes in a staging or development environment before you apply them to production. Incorrect settings can lead to node instability or workload failures.
Review GKE default settings:GKE node images come with optimized default configurations. Only customize parameters if you have a specific need and understand the impact of your changes.

Use a node system configuration in GKE Standard mode

When you use a node system configuration, you use a YAML file that contains the configuration parameters for the kubelet and the Linux kernel. Although node system configurations are also available in GKE Autopilot mode, the steps in this document show you how to create and use a configuration file for GKE Standard mode.

To use a node system configuration in GKE Standard mode, do the following:

Create a configuration file . This file contains your kubelet and sysctl configurations.
Add the configuration when you create a cluster, or when you create or update a node pool.

Create a configuration file

Write your node system configuration in YAML. The following example adds configurations for the kubelet and sysctl options:

  kubeletConfig 
 : 
  
 cpuManagerPolicy 
 : 
  
 static 
  
 allowedUnsafeSysctls 
 : 
  
 - 
  
 'kernel.shm*' 
  
 - 
  
 'kernel.msg*' 
  
 - 
  
 'kernel.sem' 
  
 - 
  
 'fs.mqueue*' 
  
 - 
  
 'net.*' 
 linuxConfig 
 : 
  
 sysctl 
 : 
  
 net.core.somaxconn 
 : 
  
 '2048' 
  
 net.ipv4.tcp_rmem 
 : 
  
 '4096 
  
 87380 
  
 6291456'

In this example, the following applies:

The cpuManagerPolicy: static field configures the kubelet to use the static CPU management policy . + The net.core.somaxconn: '2048' field limits the socket listen() backlog to 2,048 bytes.
The net.ipv4.tcp_rmem: '4096 87380 6291456' field sets the minimum, default, and maximum value of the TCP socket receive buffer to 4,096 bytes, 87,380 bytes, and 6,291,456 bytes, respectively.

If you want to add configurations only for the kubelet or sysctl , include only that section in your node system configuration. For example, to add a kubelet configuration, create the following file:

  kubeletConfig 
 : 
  
 cpuManagerPolicy 
 : 
  
 static

For a complete list of the fields that you can add to your node system configuration, see the Kubelet configuration options and Sysctl configuration options sections.

Add the configuration to a Standard node pool

After you create the node system configuration, add the --system-config-from-file flag by using the Google Cloud CLI. You can add this flag when you create a cluster, or when you create or update a node pool. You can't add a node system configuration by using the Google Cloud console.

Create a cluster with the node system configuration

You can add a node system configuration during cluster creation by using the gcloud CLI or Terraform. The following instructions apply the node system configuration to the default node pool:

gcloud CLI

 gcloud  
container  
clusters  
create  
 CLUSTER_NAME 
  
 \ 
  
--location = 
 LOCATION 
  
 \ 
  
--system-config-from-file = 
 SYSTEM_CONFIG_PATH

Replace the following:

CLUSTER_NAME : the name for your cluster
LOCATION : the compute zone or region of the cluster
SYSTEM_CONFIG_PATH : the path to the file that contains your kubelet and sysctl configurations

After you apply a node system configuration, the default node pool of the cluster uses the settings that you defined.

Terraform

To create a regional cluster with a customized node system configuration by using Terraform, refer to the following example:

 resource "google_container_cluster" "default" {
  name     = "gke-standard-regional-cluster"
  location = "us-central1"

  initial_node_count = 1

  node_config {
    # Kubelet configuration
    kubelet_config {
      cpu_manager_policy = "static"
    }

    linux_node_config {
      # Sysctl configuration
      sysctls = {
        "net.core.netdev_max_backlog" = "10000"
      }

      # Linux cgroup mode configuration
      cgroup_mode = "CGROUP_MODE_V2"

      # Linux huge page configuration
      hugepages_config {
        hugepage_size_2m = "1024"
      }
    }
  }
}

For more information about using Terraform, see Terraform support for GKE .

Create a new node pool with the node system configuration

You can add a node system configuration when you use the gcloud CLI or Terraform to create a new node pool.

The following instructions apply the node system configuration to a new node pool:

gcloud CLI

 gcloud  
container  
node-pools  
create  
 POOL_NAME 
  
 \ 
  
--cluster  
 CLUSTER_NAME 
  
 \ 
  
--location = 
 LOCATION 
  
 \ 
  
--system-config-from-file = 
 SYSTEM_CONFIG_PATH

Replace the following:

POOL_NAME : the name for your node pool
CLUSTER_NAME : the name of the cluster that you want to add a node pool to
LOCATION : the compute zone or region of the cluster
SYSTEM_CONFIG_PATH : the path to the file that contains your kubelet and sysctl configurations

Terraform

To create a node pool with a customized node system configuration by using Terraform, refer to the following example:

 resource "google_container_node_pool" "default" {
  name    = "gke-standard-regional-node-pool"
  cluster = google_container_cluster.default.name

  node_config {
    # Kubelet configuration
    kubelet_config {
      cpu_manager_policy = "static"
    }

    linux_node_config {
      # Sysctl configuration
      sysctls = {
        "net.core.netdev_max_backlog" = "10000"
      }

      # Linux cgroup mode configuration
      cgroup_mode = "CGROUP_MODE_V2"

      # Linux huge page configuration
      hugepages_config {
        hugepage_size_2m = "1024"
      }
    }
  }
}

For more information about using Terraform, see Terraform support for GKE .

Update the node system configuration of an existing node pool

You can update the node system configuration of an existing node pool by running the following command:

   
gcloud  
container  
node-pools  
update  
 POOL_NAME 
  
 \ 
  
--cluster = 
 CLUSTER_NAME 
  
 \ 
  
--location = 
 LOCATION 
  
 \ 
  
--system-config-from-file = 
 SYSTEM_CONFIG_PATH

Replace the following:

POOL_NAME : the name of the node pool that you want to update
CLUSTER_NAME : the name of the cluster that you want to update
LOCATION : the compute zone or region of the cluster
SYSTEM_CONFIG_PATH : the path to the file that contains your kubelet and sysctl configurations

This change requires re-creating the nodes, which can cause disruption to your running workloads. For more information about this specific change, find the corresponding row in the manual changes that re-create the nodes using a node upgrade strategy without respecting maintenance policies table.

For more information about node updates, see Planning for node update disruptions .

Edit a node system configuration

To edit a node system configuration, you can create a new node pool with the configuration that you want, or update the node system configuration of an existing node pool.

Edit by creating a node pool

To edit a node system configuration by creating a node pool, do the following:

Create a configuration file with the configuration that you want.
Add the configuration to a new node pool.
Migrate your workloads to the new node pool.
Delete the old node pool .

Edit by updating an existing node pool

To edit the node system configuration of an existing node pool, follow the instructions in the Update node pool tab for adding the configuration to a node pool . When you update a node system configuration and the new configuration overrides the node pool's existing system configuration, the nodes must be re-created. If you omit any parameters during an update, the parameters are set to their respective defaults.

If you want to reset the node system configuration back to the defaults, update your configuration file with empty values for the kubelet and sysctl fields, for example:

  kubeletConfig 
 : 
  
 {} 
 linuxConfig 
 : 
  
 sysctl 
 : 
  
 {}

Delete a node system configuration

To remove a node system configuration, do the following steps:

Create a node pool .
Migrate your workloads to the new node pool.
Delete the node pool that has the old node system configuration.

Configuration options for the `kubelet`

The tables in this section describe the kubelet options that you can modify.

CPU management

The following table describes the CPU management options for the kubelet .

`kubelet` config settings	Restrictions	Default setting	Description
`cpuCFSQuota`	Must be `true` or `false` .	`true`	This setting enforces the Pod's CPU limit . Setting this value to `false` means that the CPU limits for Pods are ignored. Ignoring CPU limits might be beneficial in certain scenarios where Pods are sensitive to CPU limits. The risk of disabling `cpuCFSQuota` is that a rogue Pod can consume more CPU resources than intended.
`cpuCFSQuotaPeriod`	Must be a duration of time.	`"100ms"`	This setting sets the CPU CFS quota period value, `cpu.cfs_period_us` , which specifies the period of how often a cgroup's access to CPU resources should be reallocated. This option lets you tune the CPU throttling behavior.

Memory management and eviction

The following table describes the modifiable options for memory management and eviction. This section also contains a separate table that describes the modifiable options for the evictionSoft flag.

`kubelet` config settings	Restrictions	Default setting	Description
`evictionSoft`	Map of signal names. For value restrictions, see the following table.	`none`	This setting maps signal names to a quantity or percentage that defines soft eviction thresholds. A soft eviction threshold must have a grace period. The `kubelet` does not evict Pods until the grace period is exceeded.
`evictionSoftGracePeriod`	Map of signal names. For each signal name, the value must be a positive duration less than `5m` . Valid time units are `ns` , `us` (or `µs` ), `ms` , `s` , or `m` .	`none`	This setting maps signal names to durations that define grace periods for soft eviction thresholds. Each soft eviction threshold must have a corresponding grace period.
`evictionMinimumReclaim`	Map of signal names. For each signal name, the value must be a positive percentage less than `10%` .	`none`	This setting maps signal names to percentages that define the minimum amount of a given resource that the `kubelet` reclaims when it performs a Pod eviction.
`evictionMaxPodGracePeriodSeconds`	Value must be an integer between `0` and `300` .	`0`	This setting defines, in seconds, the maximum grace period for Pod termination during eviction.

The following table shows the modifiable options for the evictionSoft flag. The same options also apply to the evictionSoftGracePeriod and evictionMinimumReclaim flags with different restrictions.

`kubelet` config settings	Restrictions	Default setting	Description
`memoryAvailable`	Value must be a quantity greater than `100Mi` and less than `50%` of the node's memory.	`none`	This setting represents the amount of memory available before soft eviction. Defines the amount of the `memory.available` signal in the `kubelet` .
`nodefsAvailable`	Value must be between `10%` and `50%` .	`none`	This setting represents the nodefs available before soft eviction. Defines the amount of the `nodefs.available` signal in the `kubelet` .
`nodefsInodesFree`	Value must be between `5%` and `50%` .	`none`	This setting represents the nodefs inodes that are free before soft eviction. Defines the amount of the `nodefs.inodesFree` signal in the `kubelet` .
`imagefsAvailable`	Value must be between `15%` and `50%` .	`none`	This setting represents the imagefs available before soft eviction. Defines the amount of `imagefs.available` signal in the `kubelet` .
`imagefsInodesFree`	Value must be between `5%` and `50%` .	`none`	This setting represents the imagefs inodes that are free before soft eviction. Defines the amount of the `imagefs.inodesFree` signal in the `kubelet` .
`pidAvailable`	Value must be between `10%` and `50%` .	`none`	This setting represents the PIDs available before soft eviction. Defines the amount of the `pid.available` signal in the `kubelet` .
`singleProcessOOMKill`	Value must be `true` or `false` .	`true` for cgroupv1 nodes, `false` for cgroupv2 nodes.	This setting sets whether the processes in the container are OOMkilled individually or as a group. Available on GKE versions 1.32.4-gke.1132000, 1.33.0-gke.1748000 or later.

PID management

The following table describes the modifiable options for PID management.

`kubelet` config settings	Restrictions	Default setting	Description
`podPidsLimit`	Value must be between `1024` and `4194304` .	`none`	This setting sets the maximum number of process IDs (PIDs) that each Pod can use.

Logging

The following table describes the modifiable options for logging.

`kubelet` config settings	Restrictions	Default setting	Description
`containerLogMaxSize`	Value must be a positive number and a unit suffix between `10Mi` and `500Mi` , inclusive.	`10Mi`	This setting controls the containerLogMaxSize setting of container log rotation policy, which lets you configure the maximum size for each log file. The default value is `10Mi` . Valid units are `Ki` , `Mi` , and `Gi` .
`containerLogMaxFiles`	Value must be an integer between `2` and `10` , inclusive.	`5`	This setting controls the containerLogMaxFiles setting of the container log files rotation policy, which lets you configure the maximum number of files allowed for each container respectively. The default value is `5` . The total log size `(container_log_max_size*container_log_max_files)` per container can't exceed 1% of the total storage of the node.

Image garbage collection

The following table describes the modifiable options for image garbage collection.

`kubelet` config settings	Restrictions	Default setting	Description
`imageGCHighThresholdPercent`	Value must be an integer between 10 and 85, inclusive, and higher than `imageGcLowThresholdPercent` .	`85`	This setting defines the percent of disk usage above which image garbage collection is run. It represents the highest disk usage to garbage collect to. The percentage is calculated by dividing this field's value by 100.
`imageGCLowThresholdPercent`	Value must be an integer between 10 and 85, inclusive, and lower than `imageGcHighThresholdPercent` .	`80`	This setting defines the percent of disk usage before which image garbage collection is never run. It represents the lowest disk usage to garbage collect to. The percentage is calculated by dividing this field's value by 100.
`imageMinimumGcAge`	Value must be a duration of time not greater than `2m` . Valid time units are `ns` , `us` (or `µs` ), `ms` , `s` , `m` , or `h` .	`2m`	This setting defines the minimum age for an unused image before it is garbage-collected.
`imageMaximumGcAge`	Value must be a duration of time.	`0s`	This setting defines the maximum age an image can be unused before it is garbage-collected. This field's default value is `0s` , which disables this field. Thus, images won't be garbage-collected based on being unused. When this value is specified, it must be greater than the value of the `imageMinimumGcAge` field. Available on GKE versions 1.30.7-gke.1076000, 1.31.3-gke.1023000 or later.

Image pulling

The following table describes the modifiable options for image pulling.

`kubelet` config settings	Restrictions	Default setting	Description
`maxParallelImagePulls`	Value must be an integer between 2 and 5, inclusive.	`2` or `3` based on the disk type.	This setting defines the maximum number of image pulls in parallel. The default value is decided by the boot disk type.

Security and unsafe operations

The following table describes the modifiable options for configuring security and handling unsafe operations.

kubelet config settings

Restrictions

Default setting

Description

allowedUnsafeSysctls

List of sysctl names or groups. The allowed sysctl groups are the following:

kernel.shm*
kernel.msg*
kernel.sem
fs.mqueue.*
net.* .

none

This setting defines a comma-separated allowlist of unsafe sysctl names or sysctl groups that can be set on the Pods.

insecureKubeletReadonlyPortEnabled

Value must be a boolean value, either true or false .

true

This setting disables the insecure kubelet read-only port 10255 on every new node pool in your cluster. If you configure this setting in this file, you can't use a GKE API client to change the setting at the cluster level.

Resource Managers

Kubernetes offers a suite of Resource Managers. You can configure these Resource Managers to coordinate and optimize the alignment of node resources for Pods that are configured with specific requirements for CPUs, devices, and memory (hugepages) resources.

The following table describes the modifiable options for Resource Managers.

kubelet config settings

Restrictions

Default setting

Description

cpuManagerPolicy

Value must be none or static .

none

This setting controls the kubelet CPU Manager policy . The default value is none , which is the default CPU affinity scheme, providing no affinity beyond what the OS scheduler does automatically.

Setting this value to static allows Pods that are both in the Guaranteed QoS class and have integer CPU requests to be assigned exclusive CPUs.

memoryManager.policy

Value must be None or Static .

None

This setting controls the kubelet Memory Manager policy . With the default value of None , Kubernetes acts the same as if the Memory Manager is not present.

If you set this value to Static , the Memory Manager policy sends topology hints that depend on the type of Pod. For details, see Static policy .

This setting is supported for clusters with the control plane running GKE version 1.32.3-gke.1785000 or later.

topologyManager

Value must be one of the supported settings for each of the respective fields.

You can't set the topologyManager field when you use the Terraform instructions to add the configuration to a Standard node pool .

policy : none
scope : container

These settings control the kubelet Topology Manager configuration by using the policy and scope subfields. The Topology Manager coordinates the set of components responsible for performance optimizations related to CPU isolation, memory, and device locality.

You can set the policy and scope settings independently of each other. For more information about these settings, see Topology manager scopes and policies .

The following GKE resources support this setting:

Clusters with the control plane running GKE version 1.32.3-gke.1785000 or later. For clusters with the control plane and nodes running 1.33.0-gke.1712000 or later, the Topology Manager also receives information about GPU topology.
Nodes with the following machine types: A2, A3, A4, C3, C4, C4A, G2, G4, M3, N4

Sysctl configuration options

To tune the performance of your system, you can modify Linux kernel parameters. The tables in this section describe the various kernel parameters that you can configure.

Filesystem parameters ( `fs.*` )

The following table describes the modifiable parameters for the Linux filesystem. These settings control the behavior of the Linux filesystem, such as file handle limits and event monitoring.

`Sysctl` parameter	Restrictions	Description
`fs.aio-max-nr`	Must be between [65536, 4194304].	This setting defines the maximum system-wide number of asynchronous I/O requests.
`fs.file-max`	Must be between [104857, 67108864].	This setting defines the maximum number of file-handles that the Linux kernel can allocate.
`fs.inotify.max_user_instances`	Must be between [8192, 1048576].	This setting defines the maximum number of inotify instances that a user can create.
`fs.inotify.max_user_watches`	Must be between [8192, 1048576].	This setting defines the maximum number of inotify watches that a user can create.
`fs.nr_open`	Must be between [1048576, 2147483584].	This setting defines the maximum number of file descriptors that can be opened by a process.

Kernel parameters ( `kernel.*` )

The following table describes the modifiable parameters for the Linux kernel. These settings configure core kernel functionalities, including shared memory allocation.

Sysctl parameter

Restrictions

Description

kernel.shmmni

Must be between [4096, 32768].

This setting defines the system-wide maximum number of shared memory segments. If this value isn't set, it defaults to 4096 .

kernel.shmmax

Must be between [0, 18446744073692774399].

This setting defines the maximum size, in bytes, of a single shared memory segment allowed by the kernel. This value is ignored if it is greater than the actual amount of RAM, which means that all available RAM can be shared.

kernel.shmall

Must be between [0, 18446744073692774399].

This setting defines the total amount of shared memory pages that can be used on the system at one time. A page is 4,096 bytes on the AMD64 and Intel 64 architecture.

kernel.perf_event_paranoid

Must be between [-1, 3].

This setting controls use of the performance events system by unprivileged users without CAP_PERFMON. The default value is 2 in the kernel.

kernel.sched_rt_runtime_us

Must be between [-1, 1000000].

This setting defines a global limit on how much time real-time scheduling can use.

kernel.softlockup_panic

Optional (boolean).

This setting ontrols whether the kernel panics when a soft lockup is detected.

kernel.yama.ptrace_scope

Must be between [0, 3].

This setting is defines the scope and restrictions for the ptrace() system call, impacting process debugging and tracing. Supported value include the following:

0 : classic ptrace permissions.
1 : restricted ptrace, which is the default in many distributions. Only child processes or CAP_SYS_PTRACE.
2 : admin-only ptrace. Only processes with CAP_SYS_PTRACE.
3 : no ptrace. ptrace calls are disallowed.

kernel.kptr_restrict

Must be between [0, 2].

This setting indicates whether restrictions are placed on exposing kernel addresses through /proc and other interfaces.

kernel.dmesg_restrict

Optional (boolean).

This setting indicates whether unprivileged users are prevented from using dmesg(8) to view messages from the kernel's log buffer.

kernel.sysrq

Must be between [0, 511].

This setting controls the functions allowed to be invoked through the SysRq key. Possible values include the following:

0 : disables sysrq completely.
1 : enables all sysrq functions.
>1 : bitmask of allowed sysrq functions. For more information, see Linux Magic System Request Key Hacks .

Network parameters ( `net.*` )

The following table describes the modifiable parameters for networking. These settings tune the performance and behavior of the networking stack, from socket buffers to connection tracking.

Sysctl parameter

Restrictions

Description

net.core.busy_poll