Troubleshooting logging in GKE


This page helps you resolve issues with the Google Kubernetes Engine (GKE) logging pipeline itself, such as logs not appearing in Cloud Logging. For more information about how to use logs to troubleshoot your workloads and clusters, see Introduction to GKE troubleshooting .

Missing cluster logs in Cloud Logging

Verify logging is enabled in the project

  1. List enabled services:

     gcloud  
    services  
    list  
    --enabled  
    --filter = 
     "NAME=logging.googleapis.com" 
     
    

    The following output indicates that logging is enabled for the project:

     NAME                    TITLE
    logging.googleapis.com  Cloud Logging API 
    

    Optional: Check the logs in Logs Viewer to determine who disabled the API and when they disabled the API:

     protoPayload.methodName = 
     "google.api.serviceusage.v1.ServiceUsage.DisableService" 
    protoPayload.response.services = 
     "logging.googleapis.com" 
     
    
  2. If logging is disabled, enable logging:

     gcloud  
    services  
     enable 
      
    logging.googleapis.com 
    

Verify logging is enabled on the cluster

  1. List the clusters:

     gcloud  
    container  
    clusters  
    list  
     \ 
      
    --project = 
     PROJECT_ID 
      
     \ 
      
     '--format=value(name,loggingConfig.componentConfig.enableComponents)' 
      
     \ 
      
    --sort-by = 
    name  
     | 
      
    column  
    -t 
    

    Replace the following:

    • PROJECT_ID : your Google Cloud project ID.

    The output is similar to the following:

     cluster-1              SYSTEM_COMPONENTS
    cluster-2              SYSTEM_COMPONENTS;WORKLOADS
    cluster-3 
    

    If the value for your cluster is empty, logging is disabled. For example, cluster-3 in this output has logging disabled.

  2. Enable cluster logging if set to NONE :

     gcloud  
    container  
    clusters  
    update  
     CLUSTER_NAME 
      
     \ 
      
    --logging = 
    SYSTEM,WORKLOAD  
     \ 
      
    --location = 
     COMPUTE_LOCATION 
     
    

    Replace the following:

After verifying that you've enabled logging in your project and on the cluster, consider using Gemini Cloud Assist Investigations to gain additional insights into your logs, and resolve issues. For more information about different ways to initiate an investigation by using the Logs Explorer, see Troubleshoot issues with Gemini Cloud Assist Investigations in the Gemini documentation.

Verify nodes in the node pools have Cloud Logging access scope

One of the following scopes is required for nodes to write logs to Cloud Logging:

  • https://www.googleapis.com/auth/logging.write
  • https://www.googleapis.com/auth/cloud-platform
  • https://www.googleapis.com/auth/logging.admin
  1. Check the scopes configured on each node pool in the cluster:

     gcloud  
    container  
    node-pools  
    list  
    --cluster = 
     CLUSTER_NAME 
      
     \ 
      
    --format = 
     "table(name,config.oauthScopes)" 
      
     \ 
      
    --location  
     COMPUTE_LOCATION 
     
    

    Replace the following:

    Migrate your workloads from the old node pool to the newly created node pool and monitor the progress.

  2. Create new node pools with the correct logging scope:

     gcloud  
    container  
    node-pools  
    create  
     NODE_POOL_NAME 
      
     \ 
      
    --cluster = 
     CLUSTER_NAME 
      
     \ 
      
    --location = 
     COMPUTE_LOCATION 
      
     \ 
      
    --scopes = 
     "gke-default" 
     
    

    Replace the following:

Identify clusters with node service accounts that are missing critical permissions

To identify clusters with node service accounts missing critical permissions, use GKE recommendations of NODE_SA_MISSING_PERMISSIONS recommender subtype :

  • Use the Google Cloud console. Go to the Kubernetes clusters page. In the Notificationscolumn for specific clusters, check for the Grant critical permissionsrecommendation.
  • Use the gcloud CLI or Recommender API by specifying the NODE_SA_MISSING_PERMISSIONS recommender subtype.

    To query for this recommendation, run the following command:

     gcloud  
    recommender  
    recommendations  
    list  
     \ 
      
    --recommender = 
    google.container.DiagnosisRecommender  
     \ 
      
    --location  
     LOCATION 
      
     \ 
      
    --project  
     PROJECT_ID 
      
     \ 
      
    --format  
    yaml  
     \ 
      
    --filter = 
     "recommenderSubtype:NODE_SA_MISSING_PERMISSIONS" 
     
    

To implement this recommendation, grant the roles/container.defaultNodeServiceAccount role to the node's service account.

You can run a script that searches node pools in your project's Standard and Autopilot clusters for any node service accounts that don't have the required permissions for GKE. This script uses the gcloud CLI and the jq utility. To view the script, expand the following section:

View the script

 #!/bin/bash 
 # Set your project ID 
 project_id 
 = 
  PROJECT_ID 
 
 project_number 
 = 
 $( 
gcloud  
projects  
describe  
 " 
 $project_id 
 " 
  
--format = 
 "value(projectNumber)" 
 ) 
 declare 
  
-a  
all_service_accounts declare 
  
-a  
sa_missing_permissions # Function to check if a service account has a specific permission 
 # $1: project_id 
 # $2: service_account 
 # $3: permission 
service_account_has_permission () 
  
 { 
  
 local 
  
 project_id 
 = 
 " 
 $1 
 " 
  
 local 
  
 service_account 
 = 
 " 
 $2 
 " 
  
 local 
  
 permission 
 = 
 " 
 $3 
 " 
  
 local 
  
 roles 
 = 
 $( 
gcloud  
projects  
get-iam-policy  
 " 
 $project_id 
 " 
  
 \ 
  
--flatten = 
 "bindings[].members" 
  
 \ 
  
--format = 
 "table[no-heading](bindings.role)" 
  
 \ 
  
--filter = 
 "bindings.members:\" 
 $service_account 
 \"" 
 ) 
  
 for 
  
role  
 in 
  
 $roles 
 ; 
  
 do 
  
 if 
  
role_has_permission  
 " 
 $role 
 " 
  
 " 
 $permission 
 " 
 ; 
  
 then 
  
 echo 
  
 "Yes" 
  
 # Has permission 
  
 return 
  
 fi 
  
 done 
  
 echo 
  
 "No" 
  
 # Does not have permission 
 } 
 # Function to check if a role has the specific permission 
 # $1: role 
 # $2: permission 
role_has_permission () 
  
 { 
  
 local 
  
 role 
 = 
 " 
 $1 
 " 
  
 local 
  
 permission 
 = 
 " 
 $2 
 " 
  
gcloud  
iam  
roles  
describe  
 " 
 $role 
 " 
  
--format = 
 "json" 
  
 | 
  
 \ 
  
jq  
-r  
 ".includedPermissions" 
  
 | 
  
 \ 
  
grep  
-q  
 " 
 $permission 
 " 
 } 
 # Function to add $1 into the service account array all_service_accounts 
 # $1: service account 
add_service_account () 
  
 { 
  
 local 
  
 service_account 
 = 
 " 
 $1 
 " 
  
 all_service_accounts 
 +=( 
  
 ${ 
 service_account 
 } 
  
 ) 
 } 
 # Function to add service accounts into the global array all_service_accounts for a Standard GKE cluster 
 # $1: project_id 
 # $2: location 
 # $3: cluster_name 
add_service_accounts_for_standard () 
  
 { 
  
 local 
  
 project_id 
 = 
 " 
 $1 
 " 
  
 local 
  
 cluster_location 
 = 
 " 
 $2 
 " 
  
 local 
  
 cluster_name 
 = 
 " 
 $3 
 " 
  
 while 
  
 read 
  
nodepool ; 
  
 do 
  
 nodepool_name 
 = 
 $( 
 echo 
  
 " 
 $nodepool 
 " 
  
 | 
  
awk  
 '{print $1}' 
 ) 
  
 if 
  
 [[ 
  
 " 
 $nodepool_name 
 " 
  
 == 
  
 "" 
  
 ]] 
 ; 
  
 then 
  
 # skip the empty line which is from running `gcloud container node-pools list` in GCP console 
  
 continue 
  
 fi 
  
 while 
  
 read 
  
nodepool_details ; 
  
 do 
  
 service_account 
 = 
 $( 
 echo 
  
 " 
 $nodepool_details 
 " 
  
 | 
  
awk  
 '{print $1}' 
 ) 
  
 if 
  
 [[ 
  
 " 
 $service_account 
 " 
  
 == 
  
 "default" 
  
 ]] 
 ; 
  
 then 
  
 service_account 
 = 
 " 
 ${ 
 project_number 
 } 
 -compute@developer.gserviceaccount.com" 
  
 fi 
  
 if 
  
 [[ 
  
-n  
 " 
 $service_account 
 " 
  
 ]] 
 ; 
  
 then 
  
 printf 
  
 "%-60s| %-40s| %-40s| %-10s| %-20s\n" 
  
 $service_account 
  
 $project_id 
  
 $cluster_name 
  
 $cluster_location 
  
 $nodepool_name 
  
add_service_account  
 " 
 ${ 
 service_account 
 } 
 " 
  
 else 
  
 echo 
  
 "cannot find service account for node pool 
 $project_id 
 \t 
 $cluster_name 
 \t 
 $cluster_location 
 \t 
 $nodepool_details 
 " 
  
 fi 
  
 done 
  
 <<< 
  
 " 
 $( 
gcloud  
container  
node-pools  
describe  
 " 
 $nodepool_name 
 " 
  
--cluster  
 " 
 $cluster_name 
 " 
  
--zone  
 " 
 $cluster_location 
 " 
  
--project  
 " 
 $project_id 
 " 
  
--format = 
 "table[no-heading](config.serviceAccount)" 
 ) 
 " 
  
 done 
  
 <<< 
  
 " 
 $( 
gcloud  
container  
node-pools  
list  
--cluster  
 " 
 $cluster_name 
 " 
  
--zone  
 " 
 $cluster_location 
 " 
  
--project  
 " 
 $project_id 
 " 
  
--format = 
 "table[no-heading](name)" 
 ) 
 " 
 } 
 # Function to add service accounts into the global array all_service_accounts for an Autopilot GKE cluster 
 # Autopilot cluster only has one node service account. 
 # $1: project_id 
 # $2: location 
 # $3: cluster_name 
add_service_account_for_autopilot (){ 
  
 local 
  
 project_id 
 = 
 " 
 $1 
 " 
  
 local 
  
 cluster_location 
 = 
 " 
 $2 
 " 
  
 local 
  
 cluster_name 
 = 
 " 
 $3 
 " 
  
 while 
  
 read 
  
service_account ; 
  
 do 
  
 if 
  
 [[ 
  
 " 
 $service_account 
 " 
  
 == 
  
 "default" 
  
 ]] 
 ; 
  
 then 
  
 service_account 
 = 
 " 
 ${ 
 project_number 
 } 
 -compute@developer.gserviceaccount.com" 
  
 fi 
  
 if 
  
 [[ 
  
-n  
 " 
 $service_account 
 " 
  
 ]] 
 ; 
  
 then 
  
 printf 
  
 "%-60s| %-40s| %-40s| %-10s| %-20s\n" 
  
 $service_account 
  
 $project_id 
  
 $cluster_name 
  
 $cluster_location 
  
 $nodepool_name 
  
add_service_account  
 " 
 ${ 
 service_account 
 } 
 " 
  
 else 
  
 echo 
  
 "cannot find service account" 
  
 for 
  
cluster  
 " 
 $project_id 
 \t 
 $cluster_name 
 \t 
 $cluster_location 
 \t" 
  
 fi 
  
 done 
  
 <<< 
  
 " 
 $( 
gcloud  
container  
clusters  
describe  
 " 
 $cluster_name 
 " 
  
--location  
 " 
 $cluster_location 
 " 
  
--project  
 " 
 $project_id 
 " 
  
--format = 
 "table[no-heading](autoscaling.autoprovisioningNodePoolDefaults.serviceAccount)" 
 ) 
 " 
 } 
 # Function to check whether the cluster is an Autopilot cluster or not 
 # $1: project_id 
 # $2: location 
 # $3: cluster_name 
is_autopilot_cluster () 
  
 { 
  
 local 
  
 project_id 
 = 
 " 
 $1 
 " 
  
 local 
  
 cluster_location 
 = 
 " 
 $2 
 " 
  
 local 
  
 cluster_name 
 = 
 " 
 $3 
 " 
  
 autopilot 
 = 
 $( 
gcloud  
container  
clusters  
describe  
 " 
 $cluster_name 
 " 
  
--location  
 " 
 $cluster_location 
 " 
  
--format = 
 "table[no-heading](autopilot.enabled)" 
 ) 
  
 echo 
  
 " 
 $autopilot 
 " 
 } 
 echo 
  
 "--- 1. List all service accounts in all GKE node pools" 
 printf 
  
 "%-60s| %-40s| %-40s| %-10s| %-20s\n" 
  
 "service_account" 
  
 "project_id" 
  
 "cluster_name" 
  
 "cluster_location" 
  
 "nodepool_name" 
 while 
  
 read 
  
cluster ; 
  
 do 
  
 cluster_name 
 = 
 $( 
 echo 
  
 " 
 $cluster 
 " 
  
 | 
  
awk  
 '{print $1}' 
 ) 
  
 cluster_location 
 = 
 $( 
 echo 
  
 " 
 $cluster 
 " 
  
 | 
  
awk  
 '{print $2}' 
 ) 
  
 # how to find a cluster is a Standard cluster or an Autopilot cluster 
  
 autopilot 
 = 
 $( 
is_autopilot_cluster  
 " 
 $project_id 
 " 
  
 " 
 $cluster_location 
 " 
  
 " 
 $cluster_name 
 " 
 ) 
  
 if 
  
 [[ 
  
 " 
 $autopilot 
 " 
  
 == 
  
 "True" 
  
 ]] 
 ; 
  
 then 
  
add_service_account_for_autopilot  
 " 
 $project_id 
 " 
  
 " 
 $cluster_location 
 " 
  
 " 
 $cluster_name 
 " 
  
 else 
  
add_service_accounts_for_standard  
 " 
 $project_id 
 " 
  
 " 
 $cluster_location 
 " 
  
 " 
 $cluster_name 
 " 
  
 fi 
 done 
  
 <<< 
  
 " 
 $( 
gcloud  
container  
clusters  
list  
--project  
 " 
 $project_id 
 " 
  
--format = 
 "value(name,location)" 
 ) 
 " 
 echo 
  
 "--- 2. Check if service accounts have permissions" 
 unique_service_accounts 
 =( 
 $( 
 echo 
  
 " 
 ${ 
 all_service_accounts 
 [@] 
 } 
 " 
  
 | 
  
tr  
 ' ' 
  
 '\n' 
  
 | 
  
sort  
-u  
 | 
  
tr  
 '\n' 
  
 ' ' 
 ) 
 ) 
 echo 
  
 "Service accounts: 
 ${ 
 unique_service_accounts 
 [@] 
 } 
 " 
 printf 
  
 "%-60s| %-40s| %-40s| %-20s\n" 
  
 "service_account" 
  
 "has_logging_permission" 
  
 "has_monitoring_permission" 
  
 "has_performance_hpa_metric_write_permission" 
 for 
  
sa  
 in 
  
 " 
 ${ 
 unique_service_accounts 
 [@] 
 } 
 " 
 ; 
  
 do 
  
 logging_permission 
 = 
 $( 
service_account_has_permission  
 " 
 $project_id 
 " 
  
 " 
 $sa 
 " 
  
 "logging.logEntries.create" 
 ) 
  
 time_series_create_permission 
 = 
 $( 
service_account_has_permission  
 " 
 $project_id 
 " 
  
 " 
 $sa 
 " 
  
 "monitoring.timeSeries.create" 
 ) 
  
 metric_descriptors_create_permission 
 = 
 $( 
service_account_has_permission  
 " 
 $project_id 
 " 
  
 " 
 $sa 
 " 
  
 "monitoring.metricDescriptors.create" 
 ) 
  
 if 
  
 [[ 
  
 " 
 $time_series_create_permission 
 " 
  
 == 
  
 "No" 
  
 || 
  
 " 
 $metric_descriptors_create_permission 
 " 
  
 == 
  
 "No" 
  
 ]] 
 ; 
  
 then 
  
 monitoring_permission 
 = 
 "No" 
  
 else 
  
 monitoring_permission 
 = 
 "Yes" 
  
 fi 
  
 performance_hpa_metric_write_permission 
 = 
 $( 
service_account_has_permission  
 " 
 $project_id 
 " 
  
 " 
 $sa 
 " 
  
 "autoscaling.sites.writeMetrics" 
 ) 
  
 printf 
  
 "%-60s| %-40s| %-40s| %-20s\n" 
  
 $sa 
  
 $logging_permission 
  
 $monitoring_permission 
  
 $performance_hpa_metric_write_permission 
  
 if 
  
 [[ 
  
 " 
 $logging_permission 
 " 
  
 == 
  
 "No" 
  
 || 
  
 " 
 $monitoring_permission 
 " 
  
 == 
  
 "No" 
  
 || 
  
 " 
 $performance_hpa_metric_write_permission 
 " 
  
 == 
  
 "No" 
  
 ]] 
 ; 
  
 then 
  
 sa_missing_permissions 
 +=( 
  
 ${ 
 sa 
 } 
  
 ) 
  
 fi 
 done 
 echo 
  
 "--- 3. List all service accounts that don't have the above permissions" 
 if 
  
 [[ 
  
 " 
 ${# 
 sa_missing_permissions[@]}" 
  
-gt  
 0 
  
 ]] 
 ; 
  
 then 
  
 printf 
  
 "Grant roles/container.defaultNodeServiceAccount to the following service accounts: %s\n" 
  
 " 
 ${ 
 sa_missing_permissions 
 [@] 
 } 
 " 
 else 
  
 echo 
  
 "All service accounts have the above permissions" 
 fi 

Identify node service accounts that are missing critical permissions in a cluster

GKE uses IAM service accounts that are attached to your nodes to run system tasks like logging and monitoring. At a minimum, these node service accounts must have the Kubernetes Engine Default Node Service Account ( roles/container.defaultNodeServiceAccount ) role on your project. By default, GKE uses the Compute Engine default service account , which is automatically created in your project, as the node service account.

If your organization enforces the iam.automaticIamGrantsForDefaultServiceAccounts organization policy constraint , the default Compute Engine service account in your project might not automatically get the required permissions for GKE.

  • To verify if logging permissions are missing, check for 401 errors in the logging in your cluster:

      [[ 
      
     $( 
    kubectl  
    logs  
    -l  
    k8s-app = 
    fluentbit-gke  
    -n  
    kube-system  
    -c  
    fluentbit-gke  
     | 
      
    grep  
    -cw  
     "Received 401" 
     ) 
      
    -gt  
     0 
      
     ]] 
     && 
     echo 
      
     "true" 
      
     || 
      
     echo 
      
     "false" 
     
    

    If the output is true , then the system workload is experiencing 401 errors, which indicate a lack of permissions. If the output is false , skip the rest of these steps and try a different troubleshooting procedure.To identify all the missing critical permissions, check the script .

  1. Find the name of the service account that your nodes use:

    Console

    1. Go to the Kubernetes clusters page:

      Go to Kubernetes clusters

    2. In the cluster list, click the name of the cluster that you want to inspect.
    3. Depending on the cluster mode of operation, do one of the following:
      • For Autopilot mode clusters, in the Security section, find the Service account field.
      • For Standard mode clusters, do the following:
        1. Click the Nodes tab.
        2. In the Node pools table, click a node pool name. The Node pool details page opens.
        3. In the Security section, find the Service account field.

    If the value in the Service account field is default , your nodes use the Compute Engine default service account. If the value in this field is not default , your nodes use a custom service account. To grant the required role to a custom service account, see Use least privilege IAM service accounts .

    gcloud

    For Autopilot mode clusters, run the following command:

    gcloud  
    container  
    clusters  
    describe  
      CLUSTER_NAME 
     
      
     \ 
      
    --location = 
      LOCATION 
     
      
     \ 
      
    --flatten = 
    autoscaling.autoprovisioningNodePoolDefaults.serviceAccount

    For Standard mode clusters, run the following command:

    gcloud  
    container  
    clusters  
    describe  
      CLUSTER_NAME 
     
      
     \ 
      
    --location = 
      LOCATION 
     
      
     \ 
      
    --format = 
     "table(nodePools.name,nodePools.config.serviceAccount)" 
    

    If the output is default , your nodes use the Compute Engine default service account. If the output is not default , your nodes use a custom service account. To grant the required role to a custom service account, see Use least privilege IAM service accounts .

  2. To grant the roles/container.defaultNodeServiceAccount role to the Compute Engine default service account, complete the following steps:

    console

    1. Go to the Welcome page:

      Go to Welcome

    2. In the Project number field, click Copy to clipboard .
    3. Go to the IAM page:

      Go to IAM

    4. Click Grant access .
    5. In the New principals field, specify the following value:
        PROJECT_NUMBER 
       
      -compute@developer.gserviceaccount.com
      Replace PROJECT_NUMBER with the project number that you copied.
    6. In the Select a role menu, select the Kubernetes Engine Default Node Service Account role.
    7. Click Save .

    gcloud

    1. Find your Google Cloud project number:
      gcloud  
      projects  
      describe  
       PROJECT_ID 
        
       \ 
        
      --format = 
       "value(projectNumber)" 
      

      Replace PROJECT_ID with your project ID.

      The output is similar to the following:

      12345678901
    2. Grant the roles/container.defaultNodeServiceAccount role to the Compute Engine default service account:
      gcloud  
      projects  
      add-iam-policy-binding  
       PROJECT_ID 
        
       \ 
        
      --member = 
       "serviceAccount: PROJECT_NUMBER 
      -compute@developer.gserviceaccount.com" 
        
       \ 
        
      --role = 
       "roles/container.defaultNodeServiceAccount" 
      

      Replace PROJECT_NUMBER with the project number from the previous step.

  • Verify that node service accounts have required permissions. Check the script to verify.

A script to identify missing permissions for GKE Node Service Account

You can run a script that searches node pools in your project's Standard and Autopilot clusters for any node service accounts that don't have the required permissions for GKE. This script uses the gcloud CLI and the jq utility. To view the script, expand the following section:

View the script

 #!/bin/bash 
 # Set your project ID 
 project_id 
 = 
  PROJECT_ID 
 
 project_number 
 = 
 $( 
gcloud  
projects  
describe  
 " 
 $project_id 
 " 
  
--format = 
 "value(projectNumber)" 
 ) 
 declare 
  
-a  
all_service_accounts declare 
  
-a  
sa_missing_permissions # Function to check if a service account has a specific permission 
 # $1: project_id 
 # $2: service_account 
 # $3: permission 
service_account_has_permission () 
  
 { 
  
 local 
  
 project_id 
 = 
 " 
 $1 
 " 
  
 local 
  
 service_account 
 = 
 " 
 $2 
 " 
  
 local 
  
 permission 
 = 
 " 
 $3 
 " 
  
 local 
  
 roles 
 = 
 $( 
gcloud  
projects  
get-iam-policy  
 " 
 $project_id 
 " 
  
 \ 
  
--flatten = 
 "bindings[].members" 
  
 \ 
  
--format = 
 "table[no-heading](bindings.role)" 
  
 \ 
  
--filter = 
 "bindings.members:\" 
 $service_account 
 \"" 
 ) 
  
 for 
  
role  
 in 
  
 $roles 
 ; 
  
 do 
  
 if 
  
role_has_permission  
 " 
 $role 
 " 
  
 " 
 $permission 
 " 
 ; 
  
 then 
  
 echo 
  
 "Yes" 
  
 # Has permission 
  
 return 
  
 fi 
  
 done 
  
 echo 
  
 "No" 
  
 # Does not have permission 
 } 
 # Function to check if a role has the specific permission 
 # $1: role 
 # $2: permission 
role_has_permission () 
  
 { 
  
 local 
  
 role 
 = 
 " 
 $1 
 " 
  
 local 
  
 permission 
 = 
 " 
 $2 
 " 
  
gcloud  
iam  
roles  
describe  
 " 
 $role 
 " 
  
--format = 
 "json" 
  
 | 
  
 \ 
  
jq  
-r  
 ".includedPermissions" 
  
 | 
  
 \ 
  
grep  
-q  
 " 
 $permission 
 " 
 } 
 # Function to add $1 into the service account array all_service_accounts 
 # $1: service account 
add_service_account () 
  
 { 
  
 local 
  
 service_account 
 = 
 " 
 $1 
 " 
  
 all_service_accounts 
 +=( 
  
 ${ 
 service_account 
 } 
  
 ) 
 } 
 # Function to add service accounts into the global array all_service_accounts for a Standard GKE cluster 
 # $1: project_id 
 # $2: location 
 # $3: cluster_name 
add_service_accounts_for_standard () 
  
 { 
  
 local 
  
 project_id 
 = 
 " 
 $1 
 " 
  
 local 
  
 cluster_location 
 = 
 " 
 $2 
 " 
  
 local 
  
 cluster_name 
 = 
 " 
 $3 
 " 
  
 while 
  
 read 
  
nodepool ; 
  
 do 
  
 nodepool_name 
 = 
 $( 
 echo 
  
 " 
 $nodepool 
 " 
  
 | 
  
awk  
 '{print $1}' 
 ) 
  
 if 
  
 [[ 
  
 " 
 $nodepool_name 
 " 
  
 == 
  
 "" 
  
 ]] 
 ; 
  
 then 
  
 # skip the empty line which is from running `gcloud container node-pools list` in GCP console 
  
 continue 
  
 fi 
  
 while 
  
 read 
  
nodepool_details ; 
  
 do 
  
 service_account 
 = 
 $( 
 echo 
  
 " 
 $nodepool_details 
 " 
  
 | 
  
awk  
 '{print $1}' 
 ) 
  
 if 
  
 [[ 
  
 " 
 $service_account 
 " 
  
 == 
  
 "default" 
  
 ]] 
 ; 
  
 then 
  
 service_account 
 = 
 " 
 ${ 
 project_number 
 } 
 -compute@developer.gserviceaccount.com" 
  
 fi 
  
 if 
  
 [[ 
  
-n  
 " 
 $service_account 
 " 
  
 ]] 
 ; 
  
 then 
  
 printf 
  
 "%-60s| %-40s| %-40s| %-10s| %-20s\n" 
  
 $service_account 
  
 $project_id 
  
 $cluster_name 
  
 $cluster_location 
  
 $nodepool_name 
  
add_service_account  
 " 
 ${ 
 service_account 
 } 
 " 
  
 else 
  
 echo 
  
 "cannot find service account for node pool 
 $project_id 
 \t 
 $cluster_name 
 \t 
 $cluster_location 
 \t 
 $nodepool_details 
 " 
  
 fi 
  
 done 
  
 <<< 
  
 " 
 $( 
gcloud  
container  
node-pools  
describe  
 " 
 $nodepool_name 
 " 
  
--cluster  
 " 
 $cluster_name 
 " 
  
--zone  
 " 
 $cluster_location 
 " 
  
--project  
 " 
 $project_id 
 " 
  
--format = 
 "table[no-heading](config.serviceAccount)" 
 ) 
 " 
  
 done 
  
 <<< 
  
 " 
 $( 
gcloud  
container  
node-pools  
list  
--cluster  
 " 
 $cluster_name 
 " 
  
--zone  
 " 
 $cluster_location 
 " 
  
--project  
 " 
 $project_id 
 " 
  
--format = 
 "table[no-heading](name)" 
 ) 
 " 
 } 
 # Function to add service accounts into the global array all_service_accounts for an Autopilot GKE cluster 
 # Autopilot cluster only has one node service account. 
 # $1: project_id 
 # $2: location 
 # $3: cluster_name 
add_service_account_for_autopilot (){ 
  
 local 
  
 project_id 
 = 
 " 
 $1 
 " 
  
 local 
  
 cluster_location 
 = 
 " 
 $2 
 " 
  
 local 
  
 cluster_name 
 = 
 " 
 $3 
 " 
  
 while 
  
 read 
  
service_account ; 
  
 do 
  
 if 
  
 [[ 
  
 " 
 $service_account 
 " 
  
 == 
  
 "default" 
  
 ]] 
 ; 
  
 then 
  
 service_account 
 = 
 " 
 ${ 
 project_number 
 } 
 -compute@developer.gserviceaccount.com" 
  
 fi 
  
 if 
  
 [[ 
  
-n  
 " 
 $service_account 
 " 
  
 ]] 
 ; 
  
 then 
  
 printf 
  
 "%-60s| %-40s| %-40s| %-10s| %-20s\n" 
  
 $service_account 
  
 $project_id 
  
 $cluster_name 
  
 $cluster_location 
  
 $nodepool_name 
  
add_service_account  
 " 
 ${ 
 service_account 
 } 
 " 
  
 else 
  
 echo 
  
 "cannot find service account" 
  
 for 
  
cluster  
 " 
 $project_id 
 \t 
 $cluster_name 
 \t 
 $cluster_location 
 \t" 
  
 fi 
  
 done 
  
 <<< 
  
 " 
 $( 
gcloud  
container  
clusters  
describe  
 " 
 $cluster_name 
 " 
  
--location  
 " 
 $cluster_location 
 " 
  
--project  
 " 
 $project_id 
 " 
  
--format = 
 "table[no-heading](autoscaling.autoprovisioningNodePoolDefaults.serviceAccount)" 
 ) 
 " 
 } 
 # Function to check whether the cluster is an Autopilot cluster or not 
 # $1: project_id 
 # $2: location 
 # $3: cluster_name 
is_autopilot_cluster () 
  
 { 
  
 local 
  
 project_id 
 = 
 " 
 $1 
 " 
  
 local 
  
 cluster_location 
 = 
 " 
 $2 
 " 
  
 local 
  
 cluster_name 
 = 
 " 
 $3 
 " 
  
 autopilot 
 = 
 $( 
gcloud  
container  
clusters  
describe  
 " 
 $cluster_name 
 " 
  
--location  
 " 
 $cluster_location 
 " 
  
--format = 
 "table[no-heading](autopilot.enabled)" 
 ) 
  
 echo 
  
 " 
 $autopilot 
 " 
 } 
 echo 
  
 "--- 1. List all service accounts in all GKE node pools" 
 printf 
  
 "%-60s| %-40s| %-40s| %-10s| %-20s\n" 
  
 "service_account" 
  
 "project_id" 
  
 "cluster_name" 
  
 "cluster_location" 
  
 "nodepool_name" 
 while 
  
 read 
  
cluster ; 
  
 do 
  
 cluster_name 
 = 
 $( 
 echo 
  
 " 
 $cluster 
 " 
  
 | 
  
awk  
 '{print $1}' 
 ) 
  
 cluster_location 
 = 
 $( 
 echo 
  
 " 
 $cluster 
 " 
  
 | 
  
awk  
 '{print $2}' 
 ) 
  
 # how to find a cluster is a Standard cluster or an Autopilot cluster 
  
 autopilot 
 = 
 $( 
is_autopilot_cluster  
 " 
 $project_id 
 " 
  
 " 
 $cluster_location 
 " 
  
 " 
 $cluster_name 
 " 
 ) 
  
 if 
  
 [[ 
  
 " 
 $autopilot 
 " 
  
 == 
  
 "True" 
  
 ]] 
 ; 
  
 then 
  
add_service_account_for_autopilot  
 " 
 $project_id 
 " 
  
 " 
 $cluster_location 
 " 
  
 " 
 $cluster_name 
 " 
  
 else 
  
add_service_accounts_for_standard  
 " 
 $project_id 
 " 
  
 " 
 $cluster_location 
 " 
  
 " 
 $cluster_name 
 " 
  
 fi 
 done 
  
 <<< 
  
 " 
 $( 
gcloud  
container  
clusters  
list  
--project  
 " 
 $project_id 
 " 
  
--format = 
 "value(name,location)" 
 ) 
 " 
 echo 
  
 "--- 2. Check if service accounts have permissions" 
 unique_service_accounts 
 =( 
 $( 
 echo 
  
 " 
 ${ 
 all_service_accounts 
 [@] 
 } 
 " 
  
 | 
  
tr  
 ' ' 
  
 '\n' 
  
 | 
  
sort  
-u  
 | 
  
tr  
 '\n' 
  
 ' ' 
 ) 
 ) 
 echo 
  
 "Service accounts: 
 ${ 
 unique_service_accounts 
 [@] 
 } 
 " 
 printf 
  
 "%-60s| %-40s| %-40s| %-20s\n" 
  
 "service_account" 
  
 "has_logging_permission" 
  
 "has_monitoring_permission" 
  
 "has_performance_hpa_metric_write_permission" 
 for 
  
sa  
 in 
  
 " 
 ${ 
 unique_service_accounts 
 [@] 
 } 
 " 
 ; 
  
 do 
  
 logging_permission 
 = 
 $( 
service_account_has_permission  
 " 
 $project_id 
 " 
  
 " 
 $sa 
 " 
  
 "logging.logEntries.create" 
 ) 
  
 time_series_create_permission 
 = 
 $( 
service_account_has_permission  
 " 
 $project_id 
 " 
  
 " 
 $sa 
 " 
  
 "monitoring.timeSeries.create" 
 ) 
  
 metric_descriptors_create_permission 
 = 
 $( 
service_account_has_permission  
 " 
 $project_id 
 " 
  
 " 
 $sa 
 " 
  
 "monitoring.metricDescriptors.create" 
 ) 
  
 if 
  
 [[ 
  
 " 
 $time_series_create_permission 
 " 
  
 == 
  
 "No" 
  
 || 
  
 " 
 $metric_descriptors_create_permission 
 " 
  
 == 
  
 "No" 
  
 ]] 
 ; 
  
 then 
  
 monitoring_permission 
 = 
 "No" 
  
 else 
  
 monitoring_permission 
 = 
 "Yes" 
  
 fi 
  
 performance_hpa_metric_write_permission 
 = 
 $( 
service_account_has_permission  
 " 
 $project_id 
 " 
  
 " 
 $sa 
 " 
  
 "autoscaling.sites.writeMetrics" 
 ) 
  
 printf 
  
 "%-60s| %-40s| %-40s| %-20s\n" 
  
 $sa 
  
 $logging_permission 
  
 $monitoring_permission 
  
 $performance_hpa_metric_write_permission 
  
 if 
  
 [[ 
  
 " 
 $logging_permission 
 " 
  
 == 
  
 "No" 
  
 || 
  
 " 
 $monitoring_permission 
 " 
  
 == 
  
 "No" 
  
 || 
  
 " 
 $performance_hpa_metric_write_permission 
 " 
  
 == 
  
 "No" 
  
 ]] 
 ; 
  
 then 
  
 sa_missing_permissions 
 +=( 
  
 ${ 
 sa 
 } 
  
 ) 
  
 fi 
 done 
 echo 
  
 "--- 3. List all service accounts that don't have the above permissions" 
 if 
  
 [[ 
  
 " 
 ${# 
 sa_missing_permissions[@]}" 
  
-gt  
 0 
  
 ]] 
 ; 
  
 then 
  
 printf 
  
 "Grant roles/container.defaultNodeServiceAccount to the following service accounts: %s\n" 
  
 " 
 ${ 
 sa_missing_permissions 
 [@] 
 } 
 " 
 else 
  
 echo 
  
 "All service accounts have the above permissions" 
 fi 

Verify that Cloud Logging write API quotas have not been reached

Confirm that you have not reached API write quotas for Cloud Logging.

  1. Go to the Quotaspage in the Google Cloud console.

    Go to Quotas

  2. Filter the table by "Cloud Logging API".

  3. Confirm that you have not reached any of the quotas.

Debugging GKE logging issues with gcpdiag

If you are missing or getting incomplete logs from your GKE cluster, use the gcpdiag tool for troubleshooting.

gcpdiag is an open source tool. It is not an officially supported Google Cloud product. You can use the gcpdiag tool to help you identify and fix Google Cloud project issues. For more information, see the gcpdiag project on GitHub .

When logs from the GKE cluster are missing or incomplete, investigate potential causes by focusing on the following core configuration settings that are essential for proper logging functions:
  • Project-Level Logging:Ensures that the Google Cloud project housing the GKE cluster has the Cloud Logging API enabled.
  • Cluster-Level Logging:Verifies that logging is explicitly enabled within the configuration of the GKE cluster.
  • Node Pool Permissions:Confirms that the nodes within the cluster's node pools have the 'Cloud Logging Write' scope enabled, allowing them to send log data.
  • Service Account Permissions:Validates that the service account used by the node pools possesses the necessary IAM permissions to interact with Cloud Logging. Specifically, the 'roles/logging.logWriter' role is typically required.
  • Cloud Logging API Write Quotas:Verifies that Cloud Logging API Write quotas have not been exceeded within the specified timeframe.

Docker

You can run gcpdiag using a wrapper that starts gcpdiag in a Docker container. Docker or Podman must be installed.

  1. Copy and run the following command on your local workstation.
    curl https://gcpdiag.dev/gcpdiag.sh >gcpdiag && chmod +x gcpdiag
  2. Execute the gcpdiag command.
      ./gcpdiag runbook gke/logs \ 
     --parameter project_id= PROJECT_ID 
    \ 
     --parameter name= GKE_NAME 
    \ 
     --parameter location= LOCATION 
     
     
    

View available parameters for this runbook.

Replace the following:

  • PROJECT_ID : The ID of the project containing the resource.
  • GKE_NAME : The name of the GKE cluster.
  • LOCATION : The zone or region of the GKE cluster.

Useful flags:

For a list and description of all gcpdiag tool flags, see the gcpdiag usage instructions .

What's next

Design a Mobile Site
View Site in Mobile | Classic
Share by: