Automate data transfer from Cloud Storage to Parallelstore using GKE Volume Populator


GKE Volume Populator is available by invitation only. If you'd like to request access to GKE Volume Populator in your Google Cloud project, contact your sales representative.

This guide describes how you can preload large amounts of data from a Cloud Storage bucket to a Google Kubernetes Engine (GKE) Parallelstore volume during dynamic provisioning using GKE Volume Populator. For more information, see About GKE Volume Populator .

This guide is for storage specialists who create and allocate storage, and manage data security and access. To learn more about common roles and example tasks that we reference in Google Cloud content, see Common GKE Enterprise user roles and tasks .

Limitations

  • The GCPDataSource custom resource must be in the same namespace as your Kubernetes workload. Volumes with cross-namespace data sources are not supported.
  • GKE Volume Populator only supports Workload Identity Federation for GKE binding of IAM service accounts to a Kubernetes service account. Granting IAM permissions to the Kubernetes service account directly is not supported.

Before you begin

Before you start, make sure that you have performed the following tasks:

  • Enable the Parallelstore API and the Google Kubernetes Engine API.
  • Enable APIs
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update .

Requirements

To use GKE Volume Populator, your clusters must meet the following requirements:

  • Use GKE cluster version 1.31.1-gke.1729000 or later.
  • Have the Parallelstore CSI driver enabled. GKE enables the CSI driver for you by default on new and existing GKE Autopilot clusters. On new and existing Standard clusters, you need to enable the CSI driver .

Prepare your environment

This section covers the steps to create your GKE clusters and set up the necessary permissions to use GKE Volume Populator.

Set up your VPC network

You must specify the same Virtual Private Cloud (VPC) network when creating the Parallelstore instance and client Compute Engine VMs or GKE clusters. To enable VPC to privately connect to Google Cloud services without exposing traffic to the public internet, you need to do a one-time configuration of private services access (PSA), if you have not already done so.

To configure PSA, follow these steps:

  1. To set up network peering for your project, configure the Compute Network Admin ( roles/compute.networkAdmin ) IAM permission.

    To grant the role, run the following command:

     gcloud  
    projects  
    add-iam-policy-binding  
     PROJECT_ID 
      
     \ 
      
    --member = 
     "user: EMAIL_ADDRESS 
    " 
      
     \ 
      
    --role = 
    roles/compute.networkAdmin 
    

    Replace EMAIL_ADDRESS with your email address.

  2. Enable service networking:

     gcloud  
    services  
     enable 
      
    servicenetworking.googleapis.com 
    
  3. Create a VPC network:

     gcloud  
    compute  
    networks  
    create  
     NETWORK_NAME 
      
     \ 
      
    --subnet-mode = 
    auto  
     \ 
      
    --mtu = 
     8896 
      
     \ 
      
    --project = 
     PROJECT_ID 
     
    

    Replace the following:

    • NETWORK_NAME : the name of the VPC network where you will create your Parallelstore instance.
    • PROJECT_ID : your Google Cloud project ID .
  4. Create an IP range.

    Private services access requires an IP address range (CIDR block) with a prefix length of at least /24 (256 addresses). Parallelstore reserves 64 addresses per instance, which means that you can re-use this IP range with other services or other Parallelstore instances if needed.

     gcloud  
    compute  
    addresses  
    create  
     IP_RANGE_NAME 
      
     \ 
      
    --global  
     \ 
      
    --purpose = 
    VPC_PEERING  
     \ 
      
    --prefix-length = 
     24 
      
     \ 
      
    --description = 
     "Parallelstore VPC Peering" 
      
     \ 
      
    --network = 
     NETWORK_NAME 
      
     \ 
      
    --project = 
     PROJECT_ID 
     
    

    Replace IP_RANGE_NAME with the name of the VPC network IP range name.

  5. Set an environment variable with the CIDR range associated with the range you created in the previous step:

      CIDR_RANGE 
     = 
     $( 
      
    gcloud  
    compute  
    addresses  
    describe  
     IP_RANGE_NAME 
      
     \ 
      
    --global  
     \ 
      
    --format = 
     "value[separator=/](address, prefixLength)" 
      
     \ 
      
    --project = 
     PROJECT_ID 
      
     \ 
     ) 
     
    
  6. Create a firewall rule to allow TCP traffic from the IP range you created:

     gcloud  
    compute  
    firewall-rules  
    create  
     FIREWALL_NAME 
      
     \ 
      
    --allow = 
    tcp  
     \ 
      
    --network = 
     NETWORK_NAME 
      
     \ 
      
    --source-ranges = 
     $CIDR_RANGE 
      
     \ 
      
    --project = 
     PROJECT_ID 
     
    

    Replace FIREWALL_NAME with the name of the firewall rule to allow TCP traffic from the IP range that you created.

  7. Connect the peering:

     gcloud services vpc-peerings connect \
      --network= NETWORK_NAME 
    \
      --ranges= IP_RANGE_NAME 
    \
      --project= PROJECT_ID 
    \
      --service=servicenetworking.googleapis.com 
    

If you encounter issues while you set up the VPC network, check the Parallelstore troubleshooting guide .

Create your GKE cluster

We recommend that you use an Autopilot cluster for a fully managed Kubernetes experience. To choose the GKE mode of operation that's the best fit for your workload needs, see Choose a GKE mode of operation .

Autopilot

To create a GKE cluster using Autopilot, run the following command:

 gcloud  
container  
clusters  
create-auto  
 CLUSTER_NAME 
  
 \ 
  
--network = 
 NETWORK_NAME 
  
 \ 
  
--cluster-version = 
 CLUSTER_VERSION 
  
 \ 
  
--location = 
 CLUSTER_LOCATION 
 

GKE enables Workload Identity Federation for GKE and the Parallelstore CSI Driver by default in Autopilot clusters.

Replace the following values:

  • CLUSTER_NAME : the name of your cluster.
  • CLUSTER_VERSION : the GKE version number. You must specify 1.31.1-gke.1729000 or later.
  • NETWORK_NAME : the name of the VPC network you created for the Parallelstore instance. To learn more, see Configure a VPC network .
  • CLUSTER_LOCATION : the region where you want to create your cluster. We recommend that you create the cluster in a supported Parallelstore location for best performance. If you want to create your cluster in a non-supported Parallelstore location, when creating a Parallelstore StorageClass, you must specify a custom topology that uses a supported Parallelstore location, otherwise provisioning will fail.

Standard

Create a Standard cluster with the Parallelstore CSI Driver and Workload Identity Federation for GKE enabled by using the following command:

 gcloud  
container  
clusters  
create  
 CLUSTER_NAME 
  
 \ 
  
--addons = 
ParallelstoreCsiDriver  
 \ 
  
--cluster-version = 
 CLUSTER_VERSION 
  
 \ 
  
--workload-pool = 
 PROJECT_ID 
.svc.id.goog  
 \ 
  
--network = 
 NETWORK_NAME 
  
 \ 
  
--location = 
 CLUSTER_LOCATION 
 

Replace the following values:

  • CLUSTER_NAME : the name of your cluster.
  • CLUSTER_VERSION : the GKE version number. You must specify 1.31.1-gke.1729000 or later.
  • PROJECT_ID : your Google Cloud project ID .
  • NETWORK_NAME : the name of the VPC network you created for the Parallelstore instance. To learn more, see Configure a VPC network .
  • CLUSTER_LOCATION : the region or zone where you want to create your cluster. We recommend that you create the cluster in a supported Parallelstore location for best performance. If you want to create your cluster in a non-supported Parallelstore location, when creating a Parallelstore StorageClass, you must specify a custom topology that uses a supported Parallelstore location, otherwise provisioning will fail.

Set up necessary permissions

To transfer data from a Cloud Storage bucket, you need to set up permissions for Workload Identity Federation for GKE .

  1. Create a Kubernetes namespace:

     kubectl  
    create  
    namespace  
     NAMESPACE 
     
    

    Replace NAMESPACE with the namespace that your workloads will run on.

  2. Create a Kubernetes service account .

     kubectl  
    create  
    serviceaccount  
     KSA_NAME 
      
     \ 
      
    --namespace = 
     NAMESPACE 
     
    

    Replace KSA_NAME with the name of the Kubernetes service account that your Pod uses to authenticate to Google Cloud APIs.

  3. Create an IAM service account. You can also use any existing IAM service account in any project in your organization:

     gcloud  
    iam  
    service-accounts  
    create  
     IAM_SA_NAME 
      
     \ 
      
    --project = 
     PROJECT_ID 
     
    

    Replace the following:

    • IAM_SA_NAME : the name for your IAM service account.
    • PROJECT_ID : your Google Cloud project ID .
  4. Grant your IAM service account the role roles/storage.objectViewer so that it can access your Cloud Storage bucket:

     gcloud  
    storage  
    buckets  
     \ 
      
    add-iam-policy-binding  
    gs:// GCS_BUCKET 
      
     \ 
      
    --member  
     "serviceAccount: IAM_SA_NAME 
    @ PROJECT_ID 
    .iam.gserviceaccount.com" 
      
     \ 
      
    --role  
     "roles/storage.objectViewer" 
     
    

    Replace GCS_BUCKET with your Cloud Storage bucket name.

  5. Create the IAM allow policy that gives the Kubernetes service account access to impersonate the IAM service account:

     gcloud  
    iam  
    service-accounts  
     \ 
      
    add-iam-policy-binding  
     IAM_SA_NAME 
    @ PROJECT_ID 
    .iam.gserviceaccount.com  
     \ 
      
    --role  
    roles/iam.workloadIdentityUser  
     \ 
      
    --member  
     "serviceAccount: PROJECT_ID 
    .svc.id.goog[ NAMESPACE 
    / KSA_NAME 
    ]" 
     
    
  6. Annotate the Kubernetes service account so that GKE sees the link between the service accounts.

     kubectl  
    annotate  
    serviceaccount  
     KSA_NAME 
      
     \ 
      
    --namespace  
     NAMESPACE 
      
     \ 
      
    iam.gke.io/gcp-service-account = 
     IAM_SA_NAME 
    @ PROJECT_ID 
    .iam.gserviceaccount.com 
    
  7. Create the Parallelstore service identity:

     gcloud  
    beta  
    services  
    identity  
    create  
     \ 
      
    --service = 
    parallelstore.googleapis.com  
     \ 
      
    --project = 
     PROJECT_ID 
     
    
  8. To allow the Parallelstore service identity to impersonate the IAM service account, grant the roles/iam.serviceAccountTokenCreator role to the Parallelstore service identity. Set the PROJECT_NUMBER environment variable so you can use it in subsequent steps.

      export 
      
     PROJECT_NUMBER 
     = 
     $( 
    gcloud  
    projects  
    describe  
     PROJECT_ID 
      
    --format = 
     "value(projectNumber)" 
     ) 
    gcloud  
    iam  
    service-accounts  
     \ 
      
    add-iam-policy-binding  
     " IAM_SA_NAME 
    @ PROJECT_ID 
    .iam.gserviceaccount.com" 
      
     \ 
      
    --member = 
    serviceAccount: "service- 
     ${ 
     PROJECT_NUMBER 
     ? 
     } 
     @gcp-sa-parallelstore.iam.gserviceaccount.com" 
      
     \ 
      
    --role = 
    roles/iam.serviceAccountTokenCreator 
    

    The PROJECT_NUMBER value is the automatically generated unique identifier for your project. To find this value, refer to Creating and managing projects .

  9. To allow the Parallelstore service identity access to all the resources that the IAM service account can access, grant the roles/iam.serviceAccountUser role to the Parallelstore service identity:

     gcloud  
    iam  
    service-accounts  
     \ 
      
    add-iam-policy-binding  
     " IAM_SA_NAME 
    @ PROJECT_ID 
    .iam.gserviceaccount.com" 
      
     \ 
      
    --member = 
    serviceAccount: "service- 
     ${ 
     PROJECT_NUMBER 
     ? 
     } 
     @gcp-sa-parallelstore.iam.gserviceaccount.com" 
      
     \ 
      
    --role = 
    roles/iam.serviceAccountUser 
    
  10. To allow the GKE service identity to access all the resources that the IAM service account can access, grant the roles/iam.serviceAccountUser role to the GKE service identity. This step is not required if the GKE cluster and the IAM service account are in the same project.

     gcloud  
    iam  
    service-accounts  
     \ 
      
    add-iam-policy-binding  
     " IAM_SA_NAME 
    @ PROJECT_ID 
    .iam.gserviceaccount.com" 
      
     \ 
      
    --member = 
    serviceAccount: "service- 
     ${ 
     PROJECT_NUMBER 
     ? 
     } 
     @container-engine-robot.iam.gserviceaccount.com" 
      
     \ 
      
    --role = 
    roles/iam.serviceAccountUser 
    

Create a Parallelstore volume with preloaded data

The following sections describe the typical process for creating a Parallelstore volume with data preloaded from a Cloud Storage bucket, using the GKE Volume Populator.

  1. Create a GCPDataSource resource .
  2. Create a Parallelstore StorageClass .
  3. Create a PersistentVolumeClaim to access the volume .
  4. (Optional) View the data transfer progress .
  5. Create a workload that consumes the volume .

Create a GCPDataSource resource

To use GKE Volume Populator, create a GCPDataSource custom resource. This resource defines the source storage properties to use for volume population.

  1. Save the following manifest in a file named gcpdatasource.yaml .

      apiVersion 
     : 
      
     datalayer.gke.io/v1 
     kind 
     : 
      
     GCPDataSource 
     metadata 
     : 
      
     name 
     : 
      
      GCP_DATA_SOURCE 
     
      
     namespace 
     : 
      
      NAMESPACE 
     
     spec 
     : 
      
     cloudStorage 
     : 
      
     serviceAccountName 
     : 
      
      KSA_NAME 
     
      
     uri 
     : 
      
     gs:// GCS_BUCKET 
    / 
     
    

    Replace the following values:

    • GCP_DATA_SOURCE : the name of the GCPDataSource CRD that holds a reference to your Cloud Storage bucket. See the GCPDataSource CRD reference for more details.
    • NAMESPACE : the namespace that your workloads will run on. The namespace value should be the same as your workload namespace.
    • KSA_NAME : the name of the Kubernetes service account that your Pod uses to authenticate to Google Cloud APIs. The cloudStorage.serviceAccountName value should be the Kubernetes service account you set up for Workload Identity Federation for GKE in the Set up necessary permissions step.
    • GCS_BUCKET : your Cloud Storage bucket name. Alternatively, you can also specify gs:// GCS_BUCKET / PATH_INSIDE_BUCKET / for the uri field.
  2. Create the GCPDataSource resource by running this command:

     kubectl  
    apply  
    -f  
    gcpdatasource.yaml 
    

Create a Parallelstore StorageClass

Create a StorageClass to direct the Parallelstore CSI driver to provision Parallelstore instances in the same region as your GKE cluster. This helps to ensure optimal I/O performance.

  1. Save the following manifest as parallelstore-class.yaml .

      apiVersion 
     : 
      
     storage.k8s.io/v1 
     kind 
     : 
      
     StorageClass 
     metadata 
     : 
      
     name 
     : 
      
     parallelstore-class 
     provisioner 
     : 
      
     parallelstore.csi.storage.gke.io 
     volumeBindingMode 
     : 
      
     Immediate 
     reclaimPolicy 
     : 
      
     Delete 
     
    
  2. Create the StorageClass by running this command:

     kubectl  
    apply  
    -f  
    parallelstore-class.yaml 
    

If you want to create a custom StorageClass with a specific topology, refer to the Parallelstore CSI guide .

Create a PersistentVolumeClaim to access the volume

The following manifest file shows an example of how to create a PersistentVolumeClaim in ReadWriteMany access mode that references the StorageClass you created earlier.

  1. Save the following manifest in a file named volume-populator-pvc.yaml :

      apiVersion 
     : 
      
     v1 
     kind 
     : 
      
     PersistentVolumeClaim 
     metadata 
     : 
      
     name 
     : 
      
      PVC_NAME 
     
      
     namespace 
     : 
      
      NAMESPACE 
     
     spec 
     : 
      
     accessModes 
     : 
      
     - 
      
     ReadWriteMany 
      
     storageClassName 
     : 
      
     parallelstore-class 
      
     resources 
     : 
      
     requests 
     : 
      
     storage 
     : 
      
     12Gi 
      
     dataSourceRef 
     : 
      
     apiGroup 
     : 
      
     datalayer.gke.io 
      
     kind 
     : 
      
     GCPDataSource 
      
     name 
     : 
      
      GCP_DATA_SOURCE 
     
     
    

    Replace the following values:

    • PVC_NAME : the name of the PersistentVolumeClaim where you want to transfer your data. The PersistentVolumeClaim must be backed by a Parallelstore instance.
    • NAMESPACE : the namespace where your workloads will run. The namespace value should be the same as your workload namespace.
    • GCP_DATA_SOURCE : the name of the GCPDataSource CRD that holds a reference to your Cloud Storage bucket. For more details, see the GCPDataSource CRD reference .
  2. Create the PersistentVolumeClaim by running the following command:

     kubectl  
    apply  
    -f  
    volume-populator-pvc.yaml 
    

GKE won't schedule the workload Pod until the PersistentVolumeClaim provisioning is complete. To check on your data transfer progress, see View the data transfer progress . If you encounter errors during provisioning, refer to Troubleshooting .

(Optional) View the data transfer progress

This section shows how you can track the progress of your data transfers from a Cloud Storage bucket to a Parallelstore volume. You can do this to monitor the status of your transfer and ensure that your data is copied successfully. You should also run this command if your PersistentVolumeClaim binding operation is taking too long.

  1. Verify the status of your PersistentVolumeClaim by running the following command:

     kubectl  
    describe  
    pvc  
     PVC_NAME 
      
    -n  
     NAMESPACE 
     
    
  2. Check the PersistentVolumeClaim events message to find the progress of the data transfer transfer. GKE logs the messages about once per minute. The output is similar to the following:

     Reason                          Message
    ------                          -------
    PopulateOperationStartSuccess   Populate operation started
    PopulateOperationStartSuccess   Populate operation started
    Provisioning                    External provisioner is provisioning volume for claim "my-namespace/my-pvc"
    Provisioning                    Assuming an external populator will provision the volume
    ExternalProvisioning            Waiting for a volume to be created either by the external provisioner 'parallelstore.csi.storage.gke.io' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
    PopulateOperationStartSuccess   Populate operation started
    PopulatorPVCCreationProgress    objects found 7, objects copied 7, objects skipped 0. bytes found 1000020010, bytes copied 1000020010, bytes skipped 0
    PopulateOperationFinished       Populate operation finished
    PopulatorFinished               Populator finished 
    

It can take some time for the populate operation to start; this operation is dependent on file size. If you don't see any progress in the data transfer after several minutes, refer to the Troubleshooting section.

Create a workload that consumes the volume

This section shows an example of how to create a Pod that consumes the PersistentVolumeClaim resource you created earlier.

  1. Save the following YAML manifest for your Pod as pod.yaml .

      apiVersion 
     : 
      
     v1 
     kind 
     : 
      
     Pod 
     metadata 
     : 
      
     name 
     : 
      
      POD_NAME 
     
      
     namespace 
     : 
      
      NAMESPACE 
     
     spec 
     : 
      
     volumes 
     : 
      
     - 
      
     name 
     : 
      
     parallelstore-volume 
      
      
     persistentVolumeClaim 
     : 
      
     claimName 
     : 
      
      PVC_NAME 
     
      
     containers 
     : 
      
     - 
      
     image 
     : 
      
     nginx 
      
      
     name 
     : 
      
     nginx 
      
     volumeMounts 
     : 
      
     - 
      
     name 
     : 
      
     parallelstore-volume 
      
      
     mountPath 
     : 
      
     /mnt/data 
     
    

    Replace the following values:

    • POD_NAME : the name of the Pod that runs your workload.
    • NAMESPACE : the namespace where your workloads will run. The namespace value should be the same as your workload namespace.
    • PVC_NAME : the name of the PersistentVolumeClaim where you want to transfer your data. The PersistentVolumeClaim must be backed by a Parallelstore instance.
  2. Run the following command to apply the manifest to the cluster:

     kubectl  
    apply  
    -f  
    pod.yaml 
    
  3. Check the status of your Pod and wait until its status is RUNNING . Your PersistentVolumeClaim should be bound before the workload can run.

     kubectl  
    describe  
    pod  
     POD_NAME 
      
    -n  
     NAMESPACE 
     
    
  4. Verify that the files were successfully transferred and can be accessed by your workload.

     kubectl  
     exec 
      
    -it  
     POD_NAME 
      
    -n  
     NAMESPACE 
      
    -c  
    nginx  
    --  
    /bin/sh 
    

    Change to the /mnt/data directory and run ls :

      cd 
      
    /mnt/data
    ls 
    

    The output should list all the files that exist in your Cloud Storage bucket URI.

Delete a PersistentVolumeClaim during dynamic provisioning

If you need to delete your PersistentVolumeClaim while data is still being transferred during dynamic provisioning, you have two options: graceful deletion and forced deletion.

Graceful deletion requires less effort, but can be more time-consuming and doesn't account for user misconfiguration that prevents data transfer from completing. Forceful deletion offers a faster alternative that allows for greater flexibility and control; this option is suitable when you need to quickly restart or correct misconfigurations.

Graceful deletion

Use this deletion option to ensure that the data transfer process is completed before GKE deletes the associated resources.

  1. Delete the workload Pod, if it exists, by running this command:

     kubectl  
    delete  
    pod  
     POD_NAME 
      
    -n  
     NAMESPACE 
     
    
  2. Find the name of the temporary PersistentVolumeClaim:

      PVC_UID 
     = 
     $( 
    kubectl  
    get  
    pvc  
     PVC_NAME 
      
    -n  
     NAMESPACE 
      
    -o  
    yaml  
     | 
      
    grep  
    uid  
     | 
      
    awk  
     '{print $2}' 
     ) 
     TEMP_PVC 
     = 
    prime- $PVC_UID 
     echo 
      
     $TEMP_PVC 
     
    
  3. Find the name of the PersistentVolume:

      PV_NAME 
     = 
     $( 
    kubectl  
    describe  
    pvc  
     ${ 
     TEMP_PVC 
     ? 
     } 
      
    -n  
    gke-managed-volumepopulator  
     | 
      
    grep  
     "Volume:" 
      
     | 
      
    awk  
     '{print $2}' 
     ) 
     echo 
      
     ${ 
     PV_NAME 
     ? 
     } 
     
    

    If the output is empty, that means that the PersistentVolume has not been created yet.

  4. Delete your PersistentVolumeClaim by running this command.

     kubectl  
    delete  
    pvc  
     PVC_NAME 
      
    -n  
     NAMESPACE 
     
    

    Wait for data transfer to complete. GKE will eventually delete the PersistentVolumeClaim, PersistentVolume, and Parallelstore instance.

  5. Check that the temporary PersistentVolumeClaim, PersistentVolumeClaim, and PersistentVolume resources are deleted:

     kubectl  
    get  
    pvc,pv  
    -A  
     | 
      
    grep  
    -E  
     " 
     ${ 
     TEMP_PVC 
     ? 
     } 
     | PVC_NAME 
    | 
     ${ 
     PV_NAME 
     ? 
     } 
     " 
     
    
  6. Check that the Parallelstore instance is deleted. The Parallelstore instance will share the same name as the PersistentVolume.

     gcloud  
    beta  
    parallelstore  
    instances  
    list  
     \ 
      
    --project = 
     PROJECT_ID 
      
     \ 
      
    --location = 
    -  
     | 
      
    grep  
     ${ 
     PV_NAME 
     ? 
     } 
     
    

Forced deletion

Use this deletion option when you need to delete a PersistentVolumeClaim and its associated resources before the data transfer process is complete. You might need to use this option in situations where the data transfer is taking too long or has encountered errors, or if you need to reclaim resources quickly.

  1. Delete the workload Pod if it exists:

     kubectl  
    delete  
    pod  
     POD_NAME 
      
    -n  
     NAMESPACE 
     
    
  2. Update the PersistentVolume reclaim policy to Delete . This setting ensures that the PersistentVolume, along with the underlying storage, is automatically deleted when the associated PersistentVolumeClaim is deleted.

    Skip the following command if any of the following apply:

    • You don't want to delete the PersistentVolume or the underlying storage.
    • Your current reclaim policy is Retain and you want to keep the underlying storage. Clean up the PersistentVolume and storage instance manually as needed.
    • The following echo $PV_NAME command outputs an empty string, which means that the PersistentVolume has not been created yet.

        PV_NAME 
       = 
       $( 
      kubectl  
      describe  
      pvc  
       $TEMP_PVC 
        
      -n  
      gke-managed-volumepopulator  
       | 
        
      grep  
       "Volume:" 
        
       | 
        
      awk  
       '{print $2}' 
       ) 
       echo 
        
       $PV_NAME 
      kubectl  
      patch  
      pv  
       $PV_NAME 
        
      -p  
       '{"spec":{"persistentVolumeReclaimPolicy":"Delete"}}' 
       
      
  3. Find the name of the temporary PersistentVolumeClaim and set the environment variable for a later step:

      PVC_UID 
     = 
     $( 
    kubectl  
    get  
    pvc  
     PVC_NAME 
      
    -n  
     NAMESPACE 
      
    -o  
    yaml  
     | 
      
    grep  
    uid  
     | 
      
    awk  
     '{print $2}' 
     ) 
     TEMP_PVC 
     = 
    prime- $PVC_UID 
     echo 
      
     $TEMP_PVC 
     
    
  4. Delete the PersistentVolumeClaim by running this command. The finalizer will block your deletion operation. Press Control+C , then move on to the next step.

     kubectl  
    delete  
    pvc  
     PVC_NAME 
      
    -n  
     NAMESPACE 
     
    
  5. Remove the datalayer.gke.io/populate-target-protection finalizer from your PersistentVolumeClaim. This step is needed after deleting the PersistentVolumeClaim. Otherwise, gke-volume-populator adds the finalizer back to the PersistentVolumeClaim.

     kubectl  
    get  
    pvc  
     PVC_NAME 
      
    -n  
     NAMESPACE 
      
    -o = 
    json  
     | 
      
     \ 
    jq  
     '.metadata.finalizers = null' 
      
     | 
      
    kubectl  
    apply  
    -f  
    - 
    
  6. Delete the temporary PersistentVolumeClaim in the gke-managed-volumepopulator namespace.

     kubectl  
    delete  
    pvc  
     $TEMP_PVC 
      
    -n  
    gke-managed-volumepopulator 
    
  7. Check that the temporary PersistentVolumeClaim, PersistentVolumeClaim, and PersistentVolume resources are deleted:

     kubectl  
    get  
    pvc,pv  
    -A  
     | 
      
    grep  
    -E  
     " 
     ${ 
     TEMP_PVC 
     ? 
     } 
     | PVC_NAME 
    | 
     ${ 
     PV_NAME 
     ? 
     } 
     " 
     
    
  8. Check that the Parallelstore instance is deleted. The Parallelstore instance will share the same name as the PersistentVolume.

     gcloud  
    beta  
    parallelstore  
    instances  
    list  
     \ 
      
    --project = 
     PROJECT_ID 
      
     \ 
      
    --location = 
    -  
     | 
      
    grep  
     ${ 
     PV_NAME 
     ? 
     } 
     
    

Troubleshooting

This section shows you how to resolve issues related to GKE Volume Populator.

Before proceeding, run the following command to check for PersistentVolumeClaim event warnings:

 kubectl  
describe  
pvc  
 PVC_NAME 
  
-n  
 NAMESPACE 
 

Error: An internal error has occurred

If you encounter the following error, this indicates that a Parallelstore API internal error has occurred.

 Warning  PopulateOperationStartError  gkevolumepopulator-populator  Failed to start populate operation: populate data for PVC "xxx". Import data failed, error: rpc error: code = Internal desc = An internal error has occurred ("xxx") 

To resolve this issue, you'll need to follow these steps to gather data for Support:

  1. Run the following commands to get the name of the temporary PersistentVolumeClaim, replacing placeholders with the actual names:

      PVC_UID 
     = 
     $( 
    kubectl  
    get  
    pvc  
     PVC_NAME 
      
    -n  
     NAMESPACE 
      
    -o  
    yaml  
     | 
      
    grep  
    uid  
     | 
      
    awk  
     '{print $2}' 
     ) 
     TEMP_PVC 
     = 
    prime- ${ 
     PVC_UID 
     ? 
     } 
     echo 
      
     ${ 
     TEMP_PVC 
     ? 
     } 
     
    
  2. Run the following command to get the volume name:

      PV_NAME 
     = 
     $( 
    kubectl  
    describe  
    pvc  
     ${ 
     TEMP_PVC 
     ? 
     } 
      
    -n  
    gke-managed-volumepopulator  
     | 
      
    grep  
     "Volume:" 
      
     | 
      
    awk  
     '{print $2}' 
     ) 
     
    
  3. Contact the support team with the error message, your project name, and the volume name.

Permission issues

If you encounter errors like the following during volume population, it indicates GKE encountered a permissions problem:

  • Cloud Storage bucket doesn't exist: PopulateOperationStartError with code = PermissionDenied
  • Missing permissions on the Cloud Storage bucket or service accounts: PopulateOperationFailed with "code: "xxx" message:"Verify if bucket "xxx" exists and grant access" .
  • Service account not found: PopulateOperationStartError with code = Unauthenticated .

To resolve these errors, double-check the following:

  • Cloud Storage bucket access: verify the bucket exists and the service account has the roles/storage.objectViewer permission .
  • Service accounts: confirm both the Kubernetes service account and the IAM service account exist and are correctly linked.
  • Parallelstore service account: ensure that the Parallelstore service account exists and has the necessary permissions ( roles/iam.serviceAccountTokenCreator and roles/iam.serviceAccountUser on the IAM account).

For detailed steps and verification commands, refer to Set up necessary permissions . If errors persist, contact support with the error message, your project name, and the Cloud Storage bucket name.

Invalid argument errors

If you encounter InvalidArgument errors, it means you've likely provided incorrect values in either the GCPDataSource resource or PersistentVolumeClaim. The error log will pinpoint the exact fields containing the invalid data. Check your Cloud Storage bucket URI and other relevant fields for accuracy.

Verify that the PersistentVolumeClaim provisioning completed

GKE Volume Populator uses a temporary PersistentVolumeClaim in the gke-managed-volumepopulator namespace for volume provisioning.

The temporary PersistentVolumeClaim is essentially a snapshot of your PersistentVolumeClaim that is still in transit (waiting for data to be fully loaded). Its name has the format prime- YOUR_PVC_UID .

To check its status:

  1. Run the following commands:

      PVC_UID 
     = 
     $( 
    kubectl  
    get  
    pvc  
     PVC_NAME 
      
    -n  
     NAMESPACE 
      
    -o  
    yaml  
     | 
      
    grep  
    uid  
     | 
      
    awk  
     '{print $2}' 
     ) 
     TEMP_PVC 
     = 
    prime- $PVC_UID 
     echo 
      
     $TEMP_PVC 
    kubectl  
    describe  
    pvc  
     ${ 
     TEMP_PVC 
     ? 
     } 
      
    -n  
    gke-managed-volumepopulator 
    

    If the output is empty, this means the temporary PersistentVolumeClaim was not created. Run the following command to check for PersistentVolumeClaim event warnings:

     kubectl  
    describe  
    pvc  
     PVC_NAME 
      
    -n  
     NAMESPACE 
     
    

    If provisioning is successful, the output is similar to the following. Look for the ProvisioningSucceeded log:

     Warning  ProvisioningFailed     9m12s                   parallelstore.csi.storage.gke.io_gke-10fedd76bae2494db688-2237-793f-vm_5f284e53-b25c-46bb-b231-49e894cbba6c  failed to provision volume with StorageClass "parallelstore-class": rpc error: code = DeadlineExceeded desc = context deadline exceeded
    Warning  ProvisioningFailed     3m41s (x11 over 9m11s)  parallelstore.csi.storage.gke.io_gke-10fedd76bae2494db688-2237-793f-vm_5f284e53-b25c-46bb-b231-49e894cbba6c  failed to provision volume with StorageClass "parallelstore-class": rpc error: code = DeadlineExceeded desc = Volume pvc-808e41a4-b688-4afe-9131-162fe5d672ec not ready, current state: CREATING
    Normal   ExternalProvisioning   3m10s (x43 over 13m)    persistentvolume-controller                                                                                  Waiting for a volume to be created either by the external provisioner 'parallelstore.csi.storage.gke.io' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
    Normal  Provisioning  8s (x13 over 10m)  "xxx"  External provisioner is provisioning volume for claim "xxx"
    Normal  ProvisioningSucceeded  7s  "xxx"  Successfully provisioned volume "xxx" 
    
  2. Check if the Parallelstore instance creation has started.

     gcloud  
    beta  
    parallelstore  
    instances  
    list  
     \ 
      
    --project = 
     PROJECT_ID 
      
     \ 
      
    --location = 
    - 
    

    The output is similar to the following. Verify that your volume is in the CREATING state. When the Parallelstore instance creation is finished, the state will change to ACTIVE .

     "projects/ PROJECT_ID 
    /locations/<my-location>/<my-volume>"  12000  2024-10-09T17:59:42.582857261Z  2024-10-09T17:59:42.582857261Z  CREATING  projects/ PROJECT_ID 
    /global/ NETWORK_NAME 
     
    

If provisioning fails, refer to the Parallelstore troubleshooting guide for additional guidance.

What's next

Design a Mobile Site
View Site in Mobile | Classic
Share by: