This document shows you how to customize your Google Kubernetes Engine (GKE) node configuration by using a configuration file called a node system configuration .
A node system configuration is a configuration file that provides a way to
adjust a limited set of system settings. In your node pool, you can use a node
system configuration to specify custom settings for the kubelet
Kubernetes node agent and for sysctl
low-level Linux kernel
configurations.
This document details the available configurations for a node system configuration and how to apply them to your GKE Standard node pools. Note that because GKE Autopilot clusters have a more managed node environment, their direct node system configuration options are limited compared to GKE Standard node pools.
Why use node system configurations
Node system configurations offer the following benefits:
- Performance tuning:optimize network stack performance, memory management, CPU scheduling, or I/O behavior for demanding applications like AI training or serving, databases, high-traffic web servers, or latency-sensitive services.
- Security hardening:apply specific kernel-level security settings or restrict certain system behaviors to reduce the attack surface.
- Resource management:fine-tune how
kubeletmanages PIDs, disk space, image garbage collection, or CPU and memory resources. - Workload compatibility:help ensure that the node environment meets specific prerequisites for specialized software or older applications that require particular kernel settings.
Other options for customizing node configurations
You can also customize your node configuration by using other methods:
- Runtime configuration file: to customize a containerd container runtime on your GKE nodes, you can use a different file called a runtime configuration file . For more information, see Customize containerd configuration in GKE nodes .
- ComputeClass: you can specify node attributes in your GKE ComputeClass specification. You can use ComputeClasses, in both GKE Autopilot mode and Standard mode, in GKE version 1.32.1-gke.1729000 and later. For more information, see Customize the node system configuration .
- DaemonSets: you can also use DaemonSets to customize nodes. For more information, see Automatically bootstrapping GKE nodes with DaemonSets .
Node system configurations are not supported in Windows Server nodes.
Before you begin
Before you begin, make sure to do the following:
- Install command-line tools:
- If you use the gcloud CLI examples in this document, ensure that you install and configure the Google Cloud CLI .
- If you use the Terraform examples, ensure that you install and configure Terraform .
- Grant permissions: you need appropriate IAM permissions to
create and update GKE clusters and node pools, such as
container.clusterAdminor a different role with equivalent permissions. - Plan for potential workload disruption: custom node configurations are applied at the node pool level. Changes typically trigger a rolling update of the nodes in the pool, which involves re-creating the nodes . Plan for potential workload disruption and use Pod Disruption Budgets (PDBs) where appropriate.
- Back up and test all changes:always test configuration changes in a staging or development environment before you apply them to production. Incorrect settings can lead to node instability or workload failures.
- Review GKE default settings:GKE node images come with optimized default configurations. Only customize parameters if you have a specific need and understand the impact of your changes.
Use a node system configuration in GKE Standard mode
When you use a node system configuration, you use a YAML file that contains the
configuration parameters for the kubelet
and the Linux kernel. Although node
system configurations are also available in GKE
Autopilot mode, the steps in this document show you how to create and
use a configuration file for GKE Standard mode.
To use a node system configuration in GKE Standard mode, do the following:
- Create a configuration file
. This file contains your
kubeletandsysctlconfigurations. - Add the configuration when you create a cluster, or when you create or update a node pool.
Create a configuration file
Write your node system configuration in YAML. The following example adds
configurations for the kubelet
and sysctl
options:
kubeletConfig
:
cpuManagerPolicy
:
static
allowedUnsafeSysctls
:
-
'kernel.shm*'
-
'kernel.msg*'
-
'kernel.sem'
-
'fs.mqueue*'
-
'net.*'
linuxConfig
:
sysctl
:
net.core.somaxconn
:
'2048'
net.ipv4.tcp_rmem
:
'4096
87380
6291456'
In this example, the following applies:
- The
cpuManagerPolicy: staticfield configures thekubeletto use the static CPU management policy . + Thenet.core.somaxconn: '2048'field limits thesocket listen()backlog to 2,048 bytes. - The
net.ipv4.tcp_rmem: '4096 87380 6291456'field sets the minimum, default, and maximum value of the TCP socket receive buffer to 4,096 bytes, 87,380 bytes, and 6,291,456 bytes, respectively.
If you want to add configurations only for the kubelet
or sysctl
, include
only that section in your node system configuration. For example, to add a kubelet
configuration, create the following file:
kubeletConfig
:
cpuManagerPolicy
:
static
For a complete list of the fields that you can add to your node system configuration, see the Kubelet configuration options and Sysctl configuration options sections.
Add the configuration to a Standard node pool
After you create the node system configuration, add the --system-config-from-file
flag by using the Google Cloud CLI. You can add this flag when you create a
cluster, or when you create or update a node pool. You can't add a node system
configuration by using the Google Cloud console.
Create a cluster with the node system configuration
You can add a node system configuration during cluster creation by using the gcloud CLI or Terraform. The following instructions apply the node system configuration to the default node pool:
gcloud CLI
gcloud
container
clusters
create
CLUSTER_NAME
\
--location =
LOCATION
\
--system-config-from-file =
SYSTEM_CONFIG_PATH
Replace the following:
-
CLUSTER_NAME: the name for your cluster -
LOCATION: the compute zone or region of the cluster -
SYSTEM_CONFIG_PATH: the path to the file that contains yourkubeletandsysctlconfigurations
After you apply a node system configuration, the default node pool of the cluster uses the settings that you defined.
Terraform
To create a regional cluster with a customized node system configuration by using Terraform, refer to the following example:
For more information about using Terraform, see Terraform support for GKE .
Create a new node pool with the node system configuration
You can add a node system configuration when you use the gcloud CLI or Terraform to create a new node pool.
The following instructions apply the node system configuration to a new node pool:
gcloud CLI
gcloud
container
node-pools
create
POOL_NAME
\
--cluster
CLUSTER_NAME
\
--location =
LOCATION
\
--system-config-from-file =
SYSTEM_CONFIG_PATH
Replace the following:
-
POOL_NAME: the name for your node pool -
CLUSTER_NAME: the name of the cluster that you want to add a node pool to -
LOCATION: the compute zone or region of the cluster -
SYSTEM_CONFIG_PATH: the path to the file that contains yourkubeletandsysctlconfigurations
Terraform
To create a node pool with a customized node system configuration by using Terraform, refer to the following example:
For more information about using Terraform, see Terraform support for GKE .
Update the node system configuration of an existing node pool
You can update the node system configuration of an existing node pool by running the following command:
gcloud
container
node-pools
update
POOL_NAME
\
--cluster =
CLUSTER_NAME
\
--location =
LOCATION
\
--system-config-from-file =
SYSTEM_CONFIG_PATH
Replace the following:
-
POOL_NAME: the name of the node pool that you want to update -
CLUSTER_NAME: the name of the cluster that you want to update -
LOCATION: the compute zone or region of the cluster -
SYSTEM_CONFIG_PATH: the path to the file that contains yourkubeletandsysctlconfigurations
This change requires re-creating the nodes, which can cause disruption to your running workloads. For more information about this specific change, find the corresponding row in the manual changes that re-create the nodes using a node upgrade strategy without respecting maintenance policies table.
For more information about node updates, see Planning for node update disruptions .
Edit a node system configuration
To edit a node system configuration, you can create a new node pool with the configuration that you want, or update the node system configuration of an existing node pool.
Edit by creating a node pool
To edit a node system configuration by creating a node pool, do the following:
- Create a configuration file with the configuration that you want.
- Add the configuration to a new node pool.
- Migrate your workloads to the new node pool.
- Delete the old node pool .
Edit by updating an existing node pool
To edit the node system configuration of an existing node pool, follow the instructions in the Update node pool tab for adding the configuration to a node pool . When you update a node system configuration and the new configuration overrides the node pool's existing system configuration, the nodes must be re-created. If you omit any parameters during an update, the parameters are set to their respective defaults.
If you want to reset the node system configuration back to the defaults, update
your configuration file with empty values for the kubelet
and sysctl
fields,
for example:
kubeletConfig
:
{}
linuxConfig
:
sysctl
:
{}
Delete a node system configuration
To remove a node system configuration, do the following steps:
- Create a node pool .
- Migrate your workloads to the new node pool.
- Delete the node pool that has the old node system configuration.
Configuration options for the kubelet
The tables in this section describe the kubelet
options that you can modify.
CPU management
The following table describes the CPU management options for the kubelet
.
kubelet
config settings |
Restrictions | Default setting | Description |
|---|---|---|---|
Must be true
or false
. |
true
|
This setting enforces the Pod's CPU limit
. Setting this value to false
means that the CPU limits for Pods are ignored.Ignoring CPU limits might be beneficial in certain scenarios where Pods are sensitive to CPU limits. The risk of disabling cpuCFSQuota
is that a rogue Pod can consume more CPU resources than intended. |
|
| Must be a duration of time. | "100ms"
|
This setting sets the CPU CFS quota period value, cpu.cfs_period_us
, which specifies the period of how often a cgroup's access to CPU resources should be reallocated. This option lets you tune the CPU throttling behavior. |
Memory management and eviction
The following table describes the modifiable options for memory management and
eviction. This section also contains a separate table that describes the
modifiable options for the evictionSoft
flag.
kubelet
config settings |
Restrictions | Default setting | Description |
|---|---|---|---|
| Map of signal names. For value restrictions, see the following table. | none
|
This setting maps signal names to a quantity or percentage that defines soft eviction thresholds. A soft eviction threshold must have a grace period. The kubelet
does not evict Pods until the grace period is exceeded. |
|
Map of signal names. For each signal name, the value must be a positive duration less than 5m
. Valid time units are ns
, us
(or µs
), ms
, s
, or m
. |
none
|
This setting maps signal names to durations that define grace periods for soft eviction thresholds. Each soft eviction threshold must have a corresponding grace period. | |
Map of signal names. For each signal name, the value must be a positive percentage less than 10%
. |
none
|
This setting maps signal names to percentages that define the minimum amount of a given resource that the kubelet
reclaims when it performs a Pod eviction. |
|
Value must be an integer between 0
and 300
. |
0
|
This setting defines, in seconds, the maximum grace period for Pod termination during eviction. |
The following table shows the modifiable options for the evictionSoft
flag.
The same options also apply to the evictionSoftGracePeriod
and evictionMinimumReclaim
flags with different restrictions.
kubelet
config settings |
Restrictions | Default setting | Description |
|---|---|---|---|
Value must be a quantity greater than 100Mi
and less than 50%
of the node's memory. |
none
|
This setting represents the amount of memory available before soft eviction. Defines the amount of the memory.available
signal in the kubelet
. |
|
Value must be between 10%
and 50%
. |
none
|
This setting represents the nodefs available before soft eviction. Defines the amount of the nodefs.available
signal in the kubelet
. |
|
Value must be between 5%
and 50%
. |
none
|
This setting represents the nodefs inodes that are free before soft eviction. Defines the amount of the nodefs.inodesFree
signal in the kubelet
. |
|
Value must be between 15%
and 50%
. |
none
|
This setting represents the imagefs available before soft eviction. Defines the amount of imagefs.available
signal in the kubelet
. |
|
Value must be between 5%
and 50%
. |
none
|
This setting represents the imagefs inodes that are free before soft eviction. Defines the amount of the imagefs.inodesFree
signal in the kubelet
. |
|
Value must be between 10%
and 50%
. |
none
|
This setting represents the PIDs available before soft eviction. Defines the amount of the pid.available
signal in the kubelet
. |
|
Value must be true
or false
. |
true
for cgroupv1 nodes, false
for cgroupv2 nodes. |
This setting sets whether the processes in the container are OOMkilled individually or as a group. Available on GKE versions 1.32.4-gke.1132000, 1.33.0-gke.1748000 or later. |
PID management
The following table describes the modifiable options for PID management.
kubelet
config settings |
Restrictions | Default setting | Description |
|---|---|---|---|
Value must be between 1024
and 4194304
. |
none
|
This setting sets the maximum number of process IDs (PIDs) that each Pod can use. |
Logging
The following table describes the modifiable options for logging.
kubelet
config settings |
Restrictions | Default setting | Description |
|---|---|---|---|
Value must be a positive number and a unit suffix between 10Mi
and 500Mi
, inclusive. |
10Mi
|
This setting controls the containerLogMaxSize
setting of container log rotation policy, which lets you configure the maximum size for each log file. The default value is 10Mi
. Valid units are Ki
, Mi
, and Gi
. |
|
Value must be an integer between 2
and 10
, inclusive. |
5
|
This setting controls the containerLogMaxFiles
setting of the container log files rotation policy, which lets you configure the maximum number of files allowed for each container respectively. The default value is 5
. The total log size (container_log_max_size*container_log_max_files)
per container can't exceed 1% of the total storage of the node. |
Image garbage collection
The following table describes the modifiable options for image garbage collection.
kubelet
config settings |
Restrictions | Default setting | Description |
|---|---|---|---|
Value must be an integer between 10 and 85, inclusive, and higher than imageGcLowThresholdPercent
. |
85
|
This setting defines the percent of disk usage above which image garbage collection is run. It represents the highest disk usage to garbage collect to. The percentage is calculated by dividing this field's value by 100. | |
Value must be an integer between 10 and 85, inclusive, and lower than imageGcHighThresholdPercent
. |
80
|
This setting defines the percent of disk usage before which image garbage collection is never run. It represents the lowest disk usage to garbage collect to. The percentage is calculated by dividing this field's value by 100. | |
Value must be a duration of time not greater than 2m
. Valid time units are ns
, us
(or µs
), ms
, s
, m
, or h
. |
2m
|
This setting defines the minimum age for an unused image before it is garbage-collected. | |
| Value must be a duration of time. | 0s
|
This setting defines the maximum age an image can be unused before it is garbage-collected. This field's default value is Available on GKE versions 1.30.7-gke.1076000, 1.31.3-gke.1023000 or later. |
Image pulling
The following table describes the modifiable options for image pulling.
kubelet
config settings |
Restrictions | Default setting | Description |
|---|---|---|---|
| Value must be an integer between 2 and 5, inclusive. | 2
or 3
based on the disk type. |
This setting defines the maximum number of image pulls in parallel. The default value is decided by the boot disk type. |
Security and unsafe operations
The following table describes the modifiable options for configuring security and handling unsafe operations.
kubelet
config settingsList of sysctl
names or groups. The allowed sysctl
groups are the following:
-
kernel.shm* -
kernel.msg* -
kernel.sem -
fs.mqueue.* -
net.*.
none
sysctl
names or sysctl
groups that can be set on the Pods.true
or false
.true
kubelet
read-only port 10255
on every new node pool in your cluster. If you configure this setting in this file, you can't use a GKE API client to change the setting at the cluster level.Resource Managers
Kubernetes offers a suite of Resource Managers. You can configure these Resource Managers to coordinate and optimize the alignment of node resources for Pods that are configured with specific requirements for CPUs, devices, and memory (hugepages) resources.
The following table describes the modifiable options for Resource Managers.
kubelet
config settingsnone
or static
.none
kubelet
CPU Manager policy
. The default value is none
, which is the default CPU affinity scheme, providing no affinity beyond what the OS scheduler does automatically.Setting this value to
static
allows Pods that are both in the Guaranteed
QoS class and have integer CPU requests to be assigned exclusive CPUs.None
or Static
.None
This setting controls the kubelet
Memory Manager policy
. With the default value of None
, Kubernetes acts the same as if the Memory Manager is not present.
If you set this value to Static
, the Memory Manager policy sends topology hints that depend on the type of Pod. For details, see Static policy
.
This setting is supported for clusters with the control plane running GKE version 1.32.3-gke.1785000 or later.
Value must be one of the supported settings for each of the respective fields.
You can't set the topologyManager
field when you use the Terraform instructions to add the configuration to a Standard node pool
.
-
policy:none -
scope:container
These settings control the kubelet
Topology Manager
configuration by using the policy
and scope
subfields. The Topology Manager coordinates the set of components responsible for performance optimizations related to CPU isolation, memory, and device locality.
You can set the policy and scope settings independently of each other. For more information about these settings, see Topology manager scopes and policies .
The following GKE resources support this setting:
- Clusters with the control plane running GKE version 1.32.3-gke.1785000 or later. For clusters with the control plane and nodes running 1.33.0-gke.1712000 or later, the Topology Manager also receives information about GPU topology.
- Nodes with the following machine types: A2, A3, A4, C3, C4, C4A, G2, G4, M3, N4
Sysctl configuration options
To tune the performance of your system, you can modify Linux kernel parameters. The tables in this section describe the various kernel parameters that you can configure.
Filesystem parameters ( fs.*
)
The following table describes the modifiable parameters for the Linux filesystem. These settings control the behavior of the Linux filesystem, such as file handle limits and event monitoring.
Sysctl
parameter |
Restrictions | Description |
|---|---|---|
| Must be between [65536, 4194304]. | This setting defines the maximum system-wide number of asynchronous I/O requests. | |
| Must be between [104857, 67108864]. | This setting defines the maximum number of file-handles that the Linux kernel can allocate. | |
| Must be between [8192, 1048576]. | This setting defines the maximum number of inotify instances that a user can create. | |
| Must be between [8192, 1048576]. | This setting defines the maximum number of inotify watches that a user can create. | |
| Must be between [1048576, 2147483584]. | This setting defines the maximum number of file descriptors that can be opened by a process. |
Kernel parameters ( kernel.*
)
The following table describes the modifiable parameters for the Linux kernel. These settings configure core kernel functionalities, including shared memory allocation.
4096
.2
in the kernel.This setting is defines the scope and restrictions for the ptrace()
system call, impacting process debugging and tracing. Supported value include the following:
-
0: classic ptrace permissions. -
1: restricted ptrace, which is the default in many distributions. Only child processes or CAP_SYS_PTRACE. -
2: admin-only ptrace. Only processes with CAP_SYS_PTRACE. -
3: no ptrace. ptrace calls are disallowed.
/proc
and other interfaces.dmesg(8)
to view messages from the kernel's log buffer.This setting controls the functions allowed to be invoked through the SysRq key. Possible values include the following:
-
0: disables sysrq completely. -
1: enables all sysrq functions. -
>1: bitmask of allowed sysrq functions. For more information, see Linux Magic System Request Key Hacks .
Network parameters ( net.*
)
The following table describes the modifiable parameters for networking. These settings tune the performance and behavior of the networking stack, from socket buffers to connection tracking.
socket listen()
backlog, which is known in userspace as SOMAXCONN. This setting defaults to 128
.TIME_WAIT
state for new connections when it is safe from a protocol viewpoint. The default value is 0
.This setting controls TCP Packetization-Layer Path MTU Discovery. The supported values are the following:
-
0: disabled. -
1: disabled by default, enabled when an ICMP black hole is detected. -
2: always enabled. Use the initial MSS oftcp_base_mss.
conf/default/disable_ipv6
setting and also all per-interface disable_ipv6
settings to the same value.0
, which means that the setting is disabled. Available on GKE versions 1.32.0-gke.1448000 or later.This setting defines the size of the hash table. The recommended setting is the result of the following: nf_conntrack_max = nf_conntrack_buckets * 4
.
Available on GKE versions 1.32.0-gke.1448000 or later.
This setting defines the period, in seconds, for which the TCP connections can remain in the CLOSE_WAIT
state. The default value is 3600
.
Available on GKE versions 1.32.0-gke.1448000 or later.
This setting defines the duration, in seconds, of dead connections before they are deleted automatically from the connection tracking table.
Available on GKE versions 1.32.0-gke.1448000 or later.
This setting defines the period, in seconds, for which the TCP connections can remain in the TIME_WAIT
state. The default value is 120
.
Available on GKE versions 1.32.0-gke.1448000 or later.
Virtual Memory parameters ( vm.*
)
The following table describes the modifiable parameters for the Virtual Memory subsystem. These settings manage the Virtual Memory subsystem, which controls how the kernel handles memory, swapping, and disk caching.
sysctl
parametervm.dirty_ratio
field.This setting defines the amount of dirty memory at which the background kernel flusher threads start writeback.
Be aware that vm.dirty_background_bytes
is the counterpart of vm.dirty_background_ratio
. Only one of these settings can be specified.
This setting defines the amount of dirty memory at which a process that generates disk writes starts writeback itself. The minimum value allowed for vm.dirty_bytes
is two pages in bytes. Any value that's lower than this limit will be ignored and the old configuration will be retained.
Be aware that vm.dirty_bytes
is the counterpart of vm.dirty_ratio
. Only one of these settings can be specified.
This setting determines the kernel's strategy for handling memory overcommitment. The values are as follows:
-
0: reject large allocations -
1: always allow -
2: prevent commit beyond swap + ratio of RAM
vm.overcommit_memory
field is set to 2
.60
.10
.67584
.For more information about the supported values for each sysctl
flag, see the --system-config-from-file gcloud CLI documentation
.
Different Linux namespaces
might have unique values for a given sysctl
flag, but others might be global
for the entire node. Updating sysctl
options by using a node system
configuration helps ensure that the sysctl
is applied globally on the node and
in each namespace, so that each Pod has identical sysctl
values in each Linux
namespace.
Linux cgroup mode configuration options
The container runtime and kubelet
use Linux kernel cgroups
for resource management, such
as limiting how much CPU or memory each container in a Pod can access. There are
two versions of the cgroup subsystem in the kernel: cgroupv1
and cgroupv2
.
Kubernetes support for cgroupv2
was introduced as alpha in Kubernetes version
1.18, beta in 1.22, and GA in 1.25. For more information, see the Kubernetes cgroups v2 documentation
.
Node system configuration lets you customize the cgroup configuration of your
node pools. You can use cgroupv1
or cgroupv2
. GKE uses cgroupv2
for new Standard node pools that run version 1.26 and later,
and cgroupv1
for node pools that run versions earlier than 1.26. For node
pools that were created with node auto-provisioning, the cgroup configuration
depends on the initial cluster version, not the node pool version. cgroupv1
is
not supported on Arm machines.
You can use node system configuration to change the setting for a node pool to
use cgroupv1
or cgroupv2
explicitly. Upgrading an existing node pool that
uses cgroupv1
to version 1.26 doesn't change the setting to cgroupv2
.
Existing node pools that run a version earlier than 1.26—and that don't
include a customized cgroup configuration—will continue to use cgroupv1
.
To change the setting, you must explicitly specify cgroupv2
for the existing
node pool.
For example, to configure your node pool to use cgroupv2
, use a node system
configuration file such as the following:
linuxConfig
:
cgroupMode
:
'CGROUP_MODE_V2'
The supported cgroupMode
options are the following:
-
CGROUP_MODE_V1: usecgroupv1on the node pool. -
CGROUP_MODE_V2: usecgroupv2on the node pool. -
CGROUP_MODE_UNSPECIFIED: use the default GKE cgroup configuration.
To use cgroupv2
, the following requirements and limitations apply:
- For a node pool that runs a version earlier than 1.26, you must use gcloud CLI version 408.0.0 or later. Alternatively, use gcloud beta with version 395.0.0 or later.
- Your cluster and node pools must run GKE version 1.24.2-gke.300 or later.
- You must use either the Container-Optimized OS with containerd or Ubuntu with containerd node image .
- If any of your workloads depend on reading the cgroup filesystem
(
/sys/fs/cgroup/...), ensure that they are compatible with thecgroupv2API. - If you use any monitoring or third-party tools, ensure that they are
compatible with
cgroupv2. - If you use Java workloads (JDK), we recommend that you use versions which fully support cgroupv2
,
including JDK
8u372, JDK 11.0.16 or later, or JDK 15 or later.
Verify cgroup configuration
When you add a node system configuration, GKE must re-create the nodes to implement the changes. After you add the configuration to a node pool and the nodes are re-created, you can verify the new configuration.
You can verify the cgroup configuration for nodes in a node pool by using
gcloud CLI or the kubectl
command-line tool:
gcloud CLI
Check the cgroup configuration for a node pool:
gcloud
container
node-pools
describe
POOL_NAME
\
--format =
'value(Config.effectiveCgroupMode)'
Replace POOL_NAME
with the name of your node pool.
The potential output is one of the following:
-
EFFECTIVE_CGROUP_MODE_V1: the nodes usecgroupv1 -
EFFECTIVE_CGROUP_MODE_V2: the nodes usecgroupv2
The output shows only the new cgroup configuration after the nodes in the node pool are re-created. The output is empty for Windows server node pools, which don't support cgroup.
kubectl
To use kubectl
to verify the cgroup configuration for nodes in this node
pool, select a node and connect to it by using the following instructions:
- Create an interactive shell
with any node in the node pool. In the command, replace
mynodewith the name of any node in the node pool. - Identify the cgroup version on Linux nodes .
Linux hugepages configuration options
You can use a node system configuration file to pre-allocate hugepages . Kubernetes supports pre-allocated hugepages as a resource type, similar to CPU or memory.
To use hugepages, the following limitations and requirements apply:
- To ensure that the node is not fully occupied by hugepages, the overall size
of the allocated hugepages can't exceed either the following:
- On machines with less than 30 GB memory: 60% of the total memory. For example, on e2-standard-2 machine with 8 GB of memory, you can't allocate more than 4.8 GB for hugepages.
- On machines with more than 30 GB memory: 80% of the total memory. For example, on c4a-standard-8 machines with 32 GB of memory, hugepages cannot exceed 25.6 GB.
- 1 GB hugepages are only available on A3, C2D, C3, C3D, C4, C4A, C4D, CT5E, CT5LP, CT6E, H3, M2, M3, M4, or Z3 machine types.
The following table describes the modifiable settings for Linux hugepages.
| Config parameter | Restrictions | Default value | Description |
|---|---|---|---|
hugepage_size2m
|
Integer count. Subject to the previously described memory allocation limits. | 0
|
This setting pre-allocates a specific number of 2 MB hugepages. |
hugepage_size1g
|
Integer count. Subject to both of the previously described memory and machine type limitations. | 0
|
This setting pre-allocates a specific number of 1 GB hugepages. |
Transparent hugepages (THP)
You can use a node system configuration file to enable the Linux kernel's Transparent HugePage Support . With THP, the kernel automatically assigns hugepages to processes without manual pre-allocation.
The following table describes the modifiable parameters for THP.
transparentHugepageEnabled
-
TRANSPARENT_HUGEPAGE_ENABLED_ALWAYS: transparent hugepage is enabled system wide. -
TRANSPARENT_HUGEPAGE_ENABLED_MADVISE: transparent hugepage is enabled inside MADV_HUGEPAGE regions. This setting is the default kernel configuration. -
TRANSPARENT_HUGEPAGE_ENABLED_NEVER: transparent hugepage is disabled. -
TRANSPARENT_HUGEPAGE_ENABLED_UNSPECIFIED: the default value. GKE does not modify the kernel configuration.
UNSPECIFIED
transparentHugepageDefrag
-
TRANSPARENT_HUGEPAGE_DEFRAG_ALWAYS: an application requesting THP stalls on allocation failure and directly reclaims pages and compact memory in an effort to allocate a THP immediately. -
TRANSPARENT_HUGEPAGE_DEFRAG_DEFER: an application wakes kswapd in the background to reclaim pages and wake kcompactd to compact memory so that THP is available in the near future. It is the responsibility of khugepaged to then install the THP pages later. -
TRANSPARENT_HUGEPAGE_DEFRAG_DEFER_WITH_MADVISE: an application enters direct reclaim and compaction like usual, but only for regions that have usedmadvise(MADV_HUGEPAGE). All other regions wake kswapd in the background to reclaim pages and wake kcompactd to compact memory so that THP is available in the near future. -
TRANSPARENT_HUGEPAGE_DEFRAG_MADVISE: an application enters direct reclaim and compaction like usual, but only for regions that have usedmadvise(MADV_HUGEPAGE). All other regions wake kswapd in the background to reclaim pages and wake kcompactd to compact memory so that THP is available in the near future. -
TRANSPARENT_HUGEPAGE_DEFRAG_NEVER: an application never enters direct reclaim or compaction. -
TRANSPARENT_HUGEPAGE_DEFRAG_UNSPECIFIED: the default value. GKE does not modify the kernel configuration.
UNSPECIFIED
THP is available on GKE version 1.33.2-gke.4655000 or later. It is also enabled on new TPU node pools by default on GKE version 1.33.2-gke.4655000 or later. THP isn't enabled when you upgrade existing node pools to a supported version or later.

