Spanner to Cloud Storage Text template

The Spanner to Cloud Storage Text template is a batch pipeline that reads in data from a Spanner table, and writes it to Cloud Storage as CSV text files.

Pipeline requirements

  • The input Spanner table must exist before running the pipeline.

Template parameters

Required parameters

  • spannerTable: The Spanner table to read the data from.
  • spannerProjectId: The ID of the Google Cloud project that contains the Spanner database to read data from.
  • spannerInstanceId: The instance ID of the requested table.
  • spannerDatabaseId: The database ID of the requested table.
  • textWritePrefix: The Cloud Storage path prefix that specifies where the data is written. For example, gs://mybucket/somefolder/ .

Optional parameters

Run the template

Console

  1. Go to the Dataflow Create job from template page.
  2. Go to Create job from template
  3. In the Job name field, enter a unique job name.
  4. Optional: For Regional endpoint , select a value from the drop-down menu. The default region is us-central1 .

    For a list of regions where you can run a Dataflow job, see Dataflow locations .

  5. From the Dataflow template drop-down menu, select the Cloud Spanner to Text Files on Cloud Storage template.
  6. In the provided parameter fields, enter your parameter values.
  7. Click Run job .

gcloud

In your shell or terminal, run the template:

gcloud  
dataflow  
 jobs 
  
run  
 JOB_NAME 
  
 \ 
  
--gcs-location  
gs://dataflow-templates- REGION_NAME 
/ VERSION 
/Spanner_to_GCS_Text  
 \ 
  
--region  
 REGION_NAME 
  
 \ 
  
--parameters  
 \ 
 spannerProjectId 
 = 
 SPANNER_PROJECT_ID 
, \ 
 spannerDatabaseId 
 = 
 DATABASE_ID 
, \ 
 spannerInstanceId 
 = 
 INSTANCE_ID 
, \ 
 spannerTable 
 = 
 TABLE_ID 
, \ 
 textWritePrefix 
 = 
gs:// BUCKET_NAME 
/output/

Replace the following:

  • JOB_NAME : a unique job name of your choice
  • VERSION : the version of the template that you want to use

    You can use the following values:

  • REGION_NAME : the region where you want to deploy your Dataflow job—for example, us-central1
  • SPANNER_PROJECT_ID : the Google Cloud project ID of the Spanner database from which you want to read data
  • DATABASE_ID : the Spanner database ID
  • BUCKET_NAME : the name of your Cloud Storage bucket
  • INSTANCE_ID : the Spanner instance ID
  • TABLE_ID : the Spanner table ID

API

To run the template using the REST API, send an HTTP POST request. For more information on the API and its authorization scopes, see projects.templates.launch .

 POST 
  
 h 
 tt 
 ps 
 : 
 //dataflow.googleapis.com/v1b3/projects/ PROJECT_ID 
/locations/ LOCATION 
/templates:launch?gcsPath=gs://dataflow-templates- LOCATION 
/ VERSION 
/Spanner_to_GCS_Text { 
  
 "jobName" 
 : 
  
 " JOB_NAME 
" 
 , 
  
 "parameters" 
 : 
  
 { 
  
 "spannerProjectId" 
 : 
  
 " SPANNER_PROJECT_ID 
" 
 , 
  
 "spannerDatabaseId" 
 : 
  
 " DATABASE_ID 
" 
 , 
  
 "spannerInstanceId" 
 : 
  
 " INSTANCE_ID 
" 
 , 
  
 "spannerTable" 
 : 
  
 " TABLE_ID 
" 
 , 
  
 "textWritePrefix" 
 : 
  
 "gs:// BUCKET_NAME 
/output/" 
  
 }, 
  
 "environment" 
 : 
  
 { 
  
 "zone" 
 : 
  
 "us-central1-f" 
  
 } 
 } 
 

Replace the following:

  • PROJECT_ID : the Google Cloud project ID where you want to run the Dataflow job
  • JOB_NAME : a unique job name of your choice
  • VERSION : the version of the template that you want to use

    You can use the following values:

  • LOCATION : the region where you want to deploy your Dataflow job—for example, us-central1
  • SPANNER_PROJECT_ID : the Google Cloud project ID of the Spanner database from which you want to read data
  • DATABASE_ID : the Spanner database ID
  • BUCKET_NAME : the name of your Cloud Storage bucket
  • INSTANCE_ID : the Spanner instance ID
  • TABLE_ID : the Spanner table ID

What's next

Design a Mobile Site
View Site in Mobile | Classic
Share by: