Transfer data

Data transfers can occur between the following:

Persistent Volume Claim (PVC) and object storage
Object storage and object storage (within GDC)

Object storage on GDC is S3-compatible and referred to as s3 type in Kubernetes yamls.

Types of data sources/destinations

Object storage (referred to as 's3'): Object storage present on GDC
Local storage (referred to as 'local'): Storage on attached PVCs

Copying from object storage to object storage

Ensure you have the following prerequisites:

An S3 Endpoint with read permissions for the source, and an s3 endpoint with write permissions for the destination.
If you do not have bucket creation permission with the credentials, the transfer fails if the destination bucket does not exist. Ensure the destination bucket exists if that is the case.
Privileges to create Jobs and create or read Secrets inside your cluster or namespace. See the following example for permissions.

Create a job

To create a job, work through these steps:

Create a namespace:

  apiVersion 
 : 
  
 v1 
 kind 
 : 
  
 Namespace 
 metadata 
 : 
  
 name 
 : 
  
 transfer-ns

Create credentials:

  --- 
 apiVersion 
 : 
  
 v1 
 kind 
 : 
  
 Secret 
 metadata 
 : 
  
 name 
 : 
  
 src-secret 
  
 namespace 
 : 
  
 transfer-ns 
 data 
 : 
  
 access-key-id 
 : 
  
 NkFDTUg3WDBCVDlQMVpZMU5MWjU= 
  
 # base 64 encoded version of key 
  
 access-key 
 : 
  
 VkRkeWJsbFgzb2FZanMvOVpnSi83SU5YUjk3Y0Q2TUdxZ2d4Q3dpdw== 
  
 # base 64 encoded version of secret key 
 --- 
 apiVersion 
 : 
  
 v1 
 kind 
 : 
  
 Secret 
 metadata 
 : 
  
 name 
 : 
  
 dst-secret 
  
 namespace 
 : 
  
 transfer-ns 
 data 
 : 
  
 access-key-id 
 : 
  
 NkFDTUg3WDBCVDlQMVpZMU5MWjU= 
  
 # base 64 encoded version of key 
  
 access-key 
 : 
  
 VkRkeWJsbFgzb2FZanMvOVpnSi83SU5YUjk3Y0Q2TUdxZ2d4Q3dpdw== 
  
 # base 64 encoded version of secret key 
 ---

These credentials are the same that you obtained in the object storage section.

Create a service account (SA) that is used by your transfer, and then add permissions to the account to read and write secrets using roles and role bindings. You do not need to add permissions if your default namespace SA or custom SA already has these permissions.

  --- 
 apiVersion 
 : 
  
 v1 
 kind 
 : 
  
 ServiceAccount 
 metadata 
 : 
  
 name 
 : 
  
 transfer-service-account 
  
 namespace 
 : 
  
 transfer-ns 
 --- 
 apiVersion 
 : 
  
 rbac.authorization.k8s.io/v1 
 kind 
 : 
  
 Role 
 metadata 
 : 
  
 name 
 : 
  
 read-secrets-role 
  
 namespace 
 : 
  
 transfer-ns 
 rules 
 : 
 - 
  
 apiGroups 
 : 
  
 [ 
 "" 
 ] 
  
 resources 
 : 
  
 [ 
 "secrets" 
 ] 
  
 verbs 
 : 
  
 [ 
 "get" 
 , 
  
 "watch" 
 , 
  
 "list" 
 ] 
 --- 
 apiVersion 
 : 
  
 rbac.authorization.k8s.io/v1 
 kind 
 : 
  
 RoleBinding 
 metadata 
 : 
  
 name 
 : 
  
 read-secrets-rolebinding 
  
 namespace 
 : 
  
 transfer-ns 
 subjects 
 : 
 - 
  
 kind 
 : 
  
 ServiceAccount 
  
 name 
 : 
  
 transfer-service-account 
  
 namespace 
 : 
  
 transfer-ns 
 roleRef 
 : 
  
 kind 
 : 
  
 Role 
  
 name 
 : 
  
 read-secrets-role 
  
 apiGroup 
 : 
  
 rbac.authorization.k8s.io 
 ---

Obtain the CA certificates for your object storage systems. You can obtain the same certificates from your AO/PA.

  --- 
 apiVersion 
 : 
  
 v1 
 kind 
 : 
  
 Secret 
 metadata 
 : 
  
 name 
 : 
  
 src-cert 
  
 namespace 
 : 
  
 transfer-ns 
 data 
 : 
  
 ca.crt 
 : 
  
 LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURBekNDQWV1Z0F3SUJBZ0lSQUpHM2psOFZhTU85a1FteGdXUFl3N3d3RFFZSktvWklodmNOQVFFTEJRQXcKR3pFWk1CY0dBMVVFQXhNUVltOXZkSE4wY21Gd0xYZGxZaTFqWVRBZUZ3MHlNekF5TVRVd01USXlNakZhRncweQpNekExTVRZd01USXlNakZhTUJzeEdUQVhCZ05WQkFNVEVHSnZiM1J6ZEhKaGNDMTNaV0l0WTJFd2dnRWlNQTBHCkNTcUdTSWI== 
  
 # base 64 encoded version of certificate 
 --- 
 apiVersion 
 : 
  
 v1 
 kind 
 : 
  
 Secret 
 metadata 
 : 
  
 name 
 : 
  
 dst-cert 
  
 namespace 
 : 
  
 transfer-ns 
 data 
 : 
  
 ca.crt 
 : 
  
 LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURBekNDQWV1Z0F3SUJBZ0lSQUtoaEJXWWo3VGZlUUZWUWo0U0RpckV3RFFZSktvWklodmNOQVFFTEJRQXcKR3pFWk1CY0dBMVVFQXhNUVltOXZkSE4wY21Gd0xYZGxZaTFqWVRBZUZ3MHlNekF6TURZeU16TTROVEJhRncweQpNekEyTURReU16TTROVEJhTUJzeEdUQVhCZ05WQkFNVEVHSnZiM1J6ZEhKaGNDMTNaV0l0WTJFd2dnRWlNQTBHCkNTcUdTSWIzRFFF== 
  
 # base 64 encoded version of certificate. Can be same OR different than source certificate. 
 ---

Optional: Create a LoggingTarget to see transfer-service logs in Loki.

  apiVersion 
 : 
  
 logging.gdc.goog/v1 
 kind 
 : 
  
 LoggingTarget 
 metadata 
 : 
  
 namespace 
 : 
  
 transfer-ns 
  
 # Same namespace as your transfer job 
  
 name 
 : 
  
 logtarg1 
 spec 
 : 
  
 # Choose matching pattern that identifies pods for this job 
  
 # Optional 
  
 # Relationship between different selectors: AND 
  
 selector 
 : 
  
 # Choose pod name prefix(es) to consider for this job 
  
 # Observability platform will scrape all pods 
  
 # where names start with specified prefix(es) 
  
 # Should contain [a-z0-9-] characters only 
  
 # Relationship between different list elements: OR 
  
 matchPodNames 
 : 
  
 - 
  
 transfer-job 
  
 # Choose the prefix here that matches your transfer job name 
  
 serviceName 
 : 
  
 transfer-service

Create the job:

  --- 
 apiVersion 
 : 
  
 batch 
 / 
 v1 
 kind 
 : 
  
 Job 
 metadata 
 : 
  
 name 
 : 
  
 transfer 
 - 
 job 
  
 namespace 
 : 
  
 transfer 
 - 
 ns 
 spec 
 : 
  
 template 
 : 
  
 spec 
 : 
  
 serviceAccountName 
 : 
  
 transfer 
 - 
 service 
 - 
 account 
  
 # 
 service 
  
 account 
  
 created 
  
 earlier 
  
 containers 
 : 
  
 - 
  
 name 
 : 
  
 storage 
 - 
 transfer 
 - 
 pod 
  
 # 
  
 image 
 : 
  
 gcr 
 . 
 io 
 / 
 private 
 - 
 cloud 
 - 
 staging 
 / 
 storage 
 - 
 transfer 
 : 
 latest 
  
 imagePullPolicy 
 : 
  
 Always 
  
 # 
 will 
  
 always 
  
 pull 
  
 the 
  
 latest 
  
 image 
  
 command 
 : 
  
 - 
  
 / 
 storage 
 - 
 transfer 
  
 args 
 : 
  
 - 
  
 ' 
 -- 
 src_endpoint 
 = 
 objectstorage 
 . 
 zone1 
 . 
 google 
 . 
 gdch 
 . 
 test 
 ' 
  
 # 
 Your 
  
 endpoint 
  
 here 
  
 - 
  
 ' 
 -- 
 dst_endpoint 
 = 
 objectstorage 
 . 
 zone1 
 . 
 google 
 . 
 gdch 
 . 
 test 
 ' 
  
 # 
 Your 
  
 endpoint 
  
 here 
  
 - 
  
 ' 
 -- 
 src_path 
 = 
 aecvd 
 - 
 bucket1 
 ' 
  
 # 
 Please 
  
 use 
  
 Fully 
  
 Qualified 
  
 Name 
  
 - 
  
 ' 
 -- 
 dst_path 
 = 
 aklow 
 - 
 bucket2 
 ' 
  
 # 
 Please 
  
 use 
  
 Fully 
  
 Qualified 
  
 Name 
  
 - 
  
 ' 
 -- 
 src_credentials 
 = 
 transfer 
 - 
 ns 
 / 
 src 
 - 
 secret 
 ' 
  
 # 
 Created 
  
 earlier 
  
 - 
  
 ' 
 -- 
 dst_credentials 
 = 
 transfer 
 - 
 ns 
 / 
 dst 
 - 
 secret 
 ' 
  
 # 
 Created 
  
 earlier 
  
 - 
  
 ' 
 -- 
 dst_ca_certificate_reference 
 = 
 transfer 
 - 
 ns 
 / 
 dst 
 - 
 cert 
 ' 
  
 # 
 Created 
  
 earlier 
  
 - 
  
 ' 
 -- 
 src_ca_certificate_reference 
 = 
 transfer 
 - 
 ns 
 / 
 src 
 - 
 cert 
 ' 
  
 # 
 Created 
  
 earlier 
  
 - 
  
 ' 
 -- 
 src_type 
 = 
 s3 
 ' 
  
 - 
  
 ' 
 -- 
 dst_type 
 = 
 s3 
 ' 
  
 - 
  
 ' 
 -- 
 bandwidth_limit 
 = 
 10 
 M 
 ' 
  
 # 
 Optional 
  
 of 
  
 the 
  
 form 
  
 ' 
 10 
 K 
 ' 
 , 
  
 ' 
 100 
 M 
 ' 
 , 
  
 ' 
 1 
 G 
 ' 
  
 bytes 
  
 per 
  
 second 
  
 restartPolicy 
 : 
  
 OnFailure 
  
 # 
 Will 
  
 restart 
  
 on 
  
 failure 
 . 
 ---

Monitor your data transfer

After you instantiate the Job, you can monitor its status using kubectl commands, such as kubectl describe . To verify the transfer, list the objects inside of your destination bucket to validate that your data transferred. The data transfer tool is agnostic to the location of the endpoints involved in the transfer.

Run the following:

  kubectl describe transfer-job -n transfer-ns

The preceding command tells you the status of the job.

The job prompts a pod to transfer the data. You can get the name of the pod and look at logs to see if there are any errors during the transfer.

To view pod logs, run the following:

  kubectl logs transfer-job-<pod_id_suffix_obtained_from_describe_operation_on_job> -n transfer-ns

Successful job logs:

  DEBUG 
  
 : 
  
 Starting 
  
 main 
  
 for 
  
 transfer 
 I0607 
  
 21 
 : 
 34 
 : 
 39.183106 
  
 1 
  
 transfer 
 . 
 go 
 : 
 103 
 ] 
  
 "msg" 
 = 
 "Starting transfer " 
  
 "destination" 
 = 
 "sample-bucket" 
  
 "source" 
 = 
 "/data" 
 2023 
 /06/ 
 07 
  
 21 
 : 
 34 
 : 
 39 
  
 NOTICE 
 : 
  
 Bandwidth 
  
 limit 
  
 set 
  
 to 
  
 { 
 100 
 Mi 
  
 100 
 Mi 
 } 
 I0607 
  
 21 
 : 
 34 
 : 
 49.238901 
  
 1 
  
 transfer 
 . 
 go 
 : 
 305 
 ] 
  
 "msg" 
 = 
 "Job finished polling " 
  
 "Finished" 
 = 
 true 
  
 "Number of Attempts" 
 = 
 2 
  
 "Success" 
 = 
 true 
 I0607 
  
 21 
 : 
 34 
 : 
 49.239675 
  
 1 
  
 transfer 
 . 
 go 
 : 
 153 
 ] 
  
 "msg" 
 = 
 "Transfer completed." 
  
 "AvgSpeed" 
 = 
 "10 KB/s" 
  
 "Bytes Moved" 
 = 
 "10.0 kB" 
  
 "Errors" 
 = 
 0 
  
 "Files Moved" 
 = 
 10 
  
 "FilesComparedAtSourceAndDest" 
 = 
 3 
  
 "Time since beginning of transfer" 
 = 
 "1.0s"

Viewing logs allows you to see the data transfer speed, which is not the same as bandwidth used, bytes moved, number of errored files, and files moved.

Copy block storage to object storage

Ensure that you meet the following prerequisites:

An S3 endpoint with a S3 key ID and secret access key with at least WRITE permissions to the dedicated bucket that you want to transfer data to.
A working cluster with connectivity to the S3 endpoint.
Privileges to create Jobs and Secrets inside your cluster.
For replication of block storage, a Pod with an attached PersistentVolumeClaim (PVC) that you want to back up to object storage, and privileges to inspect running Jobs and PVCs.
For replication of the block storage, a window during which no writes take place to the PersistentVolume (PV).
For the restoration of block storage from an object storage endpoint, privileges to allocate a PV with sufficient capacity.

To replicate a PV to object storage, you must attach a volume to an existing Pod. During the window of the transfer, the Pod must not perform any writes. To avoid detaching the mounted PV from the Job, the data transfer process works by running the transfer Job on the same machine as the Pod, and using a hostPath mount to expose the volume on the disk. In preparation for the transfer, you must first find the node on which the Pod is running, and additional metadata such as the Pod UID and PVC type to reference the appropriate path on the Node. You must substitute this metadata into the sample YAML file outlined in the following section.

Collect metadata

To collect the metadata required to create the data transfer Job, work through these steps:

Find the Node that has the scheduled Pod:
```
 kubectl  
get  
pod  
 POD_NAME 
  
-o  
 jsonpath 
 = 
 '{.spec.nodeName}' 
 
```
Record the output of this command as the NODE_NAME to use in the data transfer Job YAML file.
Find the Pod UID:
```
 kubectl  
get  
pod  
 POD_NAME 
  
-o  
 'jsonpath={.metadata.uid}' 
 
```
Record the output of this command as the POD_UID to use in the data transfer Job YAML file.
Find the PVC name:
```
 kubectl  
get  
pvc  
www-web-0  
-o  
 'jsonpath={.spec.volumeName}' 
 
```
Record the output of this command as the PVC_NAME to use in the data transfer Job YAML file.

Find the PVC storage provisioner:

 kubectl  
get  
pvc  
www-web-0  
-o  
 jsonpath 
 = 
 '{.metadata.annotations.volume\.v1\.kubernetes\.io\/storage-provisioner}'

Record the output of this command as the PROVISIONER_TYPE to use in the data transfer Job YAML file.

Create secrets

To replicate file to object storage across clusters, you must first instantiate the secrets inside your Kubernetes cluster. You must use matching keys for the Secret data for the tool to pull the credentials.

To perform the transfer in an existing namespace, see the following example of creating Secrets in a transfer namespace:

  apiVersion 
 : 
  
 v1 
 kind 
 : 
  
 Secret 
 metadata 
 : 
  
 name 
 : 
  
 src-secret 
  
 namespace 
 : 
  
 transfer 
 data 
 : 
  
 access-key-id 
 : 
  
 c3JjLWtleQ== 
  
 # echo -n src-key| base64 -w0 
  
 access-key 
 : 
  
 c3JjLXNlY3JldA== 
  
 # echo -n src-secret| base64 -w0 
 --- 
 apiVersion 
 : 
  
 v1 
 kind 
 : 
  
 Secret 
 metadata 
 : 
  
 name 
 : 
  
 dst-secret 
  
 namespace 
 : 
  
 transfer 
 data 
 : 
  
 access-key-id 
 : 
  
 ZHN0LWtleQ== 
  
 # echo -n dst-key| base64 -w0 
  
 access-key 
 : 
  
 ZHN0LXNlY3JldA== 
  
 # echo -n dst-secret| base64 -w0

Create the Job

With the data that you collected in the previous section, create a Job with the data transfer tool. The data transfer Job has a hostPath mount referencing the path for the PV of interest, and a nodeSelector for the relevant node.

The following is an example of a data transfer Job:

  apiVersion 
 : 
  
 batch/v1 
 kind 
 : 
  
 Job 
 metadata 
 : 
  
 name 
 : 
  
 transfer-job 
  
 namespace 
 : 
  
 transfer 
 spec 
 : 
  
 template 
 : 
  
 spec 
 : 
  
 nodeSelector 
 : 
  
  NODE_NAME 
 
  
 serviceAccountName 
 : 
  
 data-transfer-sa 
  
 containers 
 : 
  
 - 
  
 name 
 : 
  
 storage-transfer-pod 
  
 image 
 : 
  
 storage-transfer 
  
 command 
 : 
  
 - 
  
 /storage-transfer 
  
 args 
 : 
  
 - 
  
 --dst_endpoint=https://your-dst-endpoint.com 
  
 - 
  
 --src_path=/pvc-data 
  
 - 
  
 --dst_path=transfer-dst-bucket 
  
 - 
  
 --dst_credentials=transfer/dst-secret 
  
 - 
  
 --src_type=local 
  
 - 
  
 --dst_type=s3 
  
 volumeMounts 
 : 
  
 - 
  
 mountPath 
 : 
  
 /pvc-data 
  
 name 
 : 
  
 pvc-volume 
  
 volumes 
 : 
  
 - 
  
 name 
 : 
  
 pvc-volume 
  
 hostPath 
 : 
  
 path 
 : 
  
 /var/lib/kubelet/pods/ POD_UID 
/volumes/ PROVISIONER_TYPE 
/ PVC_NAME 
 
  
 restartPolicy 
 : 
  
 Never

As with the S3 data transfer , you must create a Secret containing the access keys for the destination endpoint in the Kubernetes cluster, and the data transfer Job must run with a service account with adequate privileges to read the Secret from the API server. Monitor the status of the transfer with standard kubectl commands operating on the Job.

Consider the following details when transferring block storage to object storage:

By default, symbolic links follow and replicate to the object storage, but a deep rather than shallow copy performs. Upon restoration, it destroys symlinks.
As with object storage replication, cloning into a subdirectory of the bucket is destructive. Ensure that the bucket is available exclusively for your volume.

Restore from object storage to block storage

Allocate a PV

To restore block storage from an object storage endpoint, follow these steps:

Allocate a persistent volume to target in the restore. Use a PVC to allocate the volume, as shown in the following example:

  apiVersion 
 : 
  
 v1 
 kind 
 : 
  
 PersistentVolumeClaim 
 metadata 
 : 
  
 name 
 : 
  
 restore-pvc 
  
 namespace 
 : 
  
 restore-ns 
 spec 
 : 
  
 storageClassName 
 : 
  
 "default" 
  
 accessModes 
 : 
 ReadWriteOnce 
  
 resources 
 : 
  
 requests 
 : 
  
 storage 
 : 
  
 1Gi 
  
 # Need sufficient capacity for full restoration.

Check the status of the PVC:
```
 kubectl  
get  
pvc  
restore-pvc  
-n  
restore-ns 
```
After the PVC is in a Bound state, it is ready to consume inside the Pod that rehydrates it.

If a Stateful set eventually consumes the PV, you must match the rendered StatefulSet PVCs. The Pods that StatefulSet produces consumes the hydrated volumes. The following example shows volume claim templates in a StatefulSet named ss .

   
 volumeClaimTemplates 
 : 
  
 - 
  
 metadata 
 : 
  
 name 
 : 
  
 pvc-name 
  
 spec 
 : 
  
 accessModes 
 : 
  
 [ 
  
 "ReadWriteOnce" 
  
 ] 
  
 storageClassName 
 : 
  
 "default" 
  
 resources 
 : 
  
 requests 
 : 
  
 storage 
 : 
  
 1Gi

Pre-allocate PVCs with names such as ss-pvc-name-0 and ss-pvc-name-1 to ensure that the resultant Pods consume the pre-allocated volumes.

Hydrate the PV

After the PVC is bound to a PV, start the Job to populate the PV:

  apiVersion 
 : 
  
 batch/v1 
 kind 
 : 
  
 Job 
 metadata 
 : 
  
 name 
 : 
  
 transfer-job 
  
 namespace 
 : 
  
 transfer 
 spec 
 : 
  
 template 
 : 
  
 spec 
 : 
  
 serviceAccountName 
 : 
  
 data-transfer-sa 
  
 volumes 
 : 
  
 - 
  
 name 
 : 
  
 data-transfer-restore-volume 
  
 persistentVolumeClaim 
 : 
  
 claimName 
 : 
  
 restore-pvc 
  
 containers 
 : 
  
 - 
  
 name 
 : 
  
 storage-transfer-pod 
  
 image 
 : 
  
 storage-transfer 
  
 command 
 : 
  
 - 
  
 /storage-transfer 
  
 args 
 : 
  
 - 
  
 --src_endpoint=https://your-src-endpoint.com 
  
 - 
  
 --src_path=/your-src-bucket 
  
 - 
  
 --src_credentials=transfer/src-secret 
  
 - 
  
 --dst_path=/restore-pv-mnt-path 
  
 - 
  
 --src_type=s3 
  
 - 
  
 --dst_type=local 
  
 volumeMounts 
 : 
  
 - 
  
 mountPath 
 : 
  
 /restore-pv-mnt-path 
  
 name 
 : 
  
 data-transfer-restore-volume

After the Job has finished running, the data from the object storage bucket populates the volume. A separate Pod can consume the data by using the same standard mechanisms for mounting a volume.

Transfer data Stay organized with collections Save and categorize content based on your preferences.