Bigtable to Cloud Storage Avro template

The Bigtable to Cloud Storage Avro template is a pipeline that reads data from a Bigtable table and writes it to a Cloud Storage bucket in Avro format. You can use the template to move data from Bigtable to Cloud Storage.

Pipeline requirements

  • The Bigtable table must exist.
  • The output Cloud Storage bucket must exist before running the pipeline.

Template parameters

Required parameters

  • bigtableProjectId: The ID of the Google Cloud project that contains the Bigtable instance that you want to read data from.
  • bigtableInstanceId: The ID of the Bigtable instance that contains the table.
  • bigtableTableId: The ID of the Bigtable table to export.
  • outputDirectory: The Cloud Storage path where data is written. For example, gs://mybucket/somefolder .
  • filenamePrefix: The prefix of the Avro filename. For example, output- . Defaults to: part.

Optional parameters

Run the template

Console

  1. Go to the Dataflow Create job from template page.
  2. Go to Create job from template
  3. In the Job name field, enter a unique job name.
  4. Optional: For Regional endpoint , select a value from the drop-down menu. The default region is us-central1 .

    For a list of regions where you can run a Dataflow job, see Dataflow locations .

  5. From the Dataflow template drop-down menu, select the Cloud Bigtable to Avro Files on Cloud Storage template .
  6. In the provided parameter fields, enter your parameter values.
  7. Click Run job .

gcloud

In your shell or terminal, run the template:

gcloud  
dataflow  
 jobs 
  
run  
 JOB_NAME 
  
 \ 
  
--gcs-location  
gs://dataflow-templates- REGION_NAME 
/ VERSION 
/Cloud_Bigtable_to_GCS_Avro  
 \ 
  
--region  
 REGION_NAME 
  
 \ 
  
--parameters  
 \ 
 bigtableProjectId 
 = 
 BIGTABLE_PROJECT_ID 
, \ 
 bigtableInstanceId 
 = 
 INSTANCE_ID 
, \ 
 bigtableTableId 
 = 
 TABLE_ID 
, \ 
 outputDirectory 
 = 
 OUTPUT_DIRECTORY 
, \ 
 filenamePrefix 
 = 
 FILENAME_PREFIX 

Replace the following:

  • JOB_NAME : a unique job name of your choice
  • VERSION : the version of the template that you want to use

    You can use the following values:

  • REGION_NAME : the region where you want to deploy your Dataflow job—for example, us-central1
  • BIGTABLE_PROJECT_ID : the ID of the Google Cloud project of the Bigtable instance that you want to read data from
  • INSTANCE_ID : the ID of the Bigtable instance that contains the table
  • TABLE_ID : the ID of the Bigtable table to export
  • OUTPUT_DIRECTORY : the Cloud Storage path where data is written, for example, gs://mybucket/somefolder
  • FILENAME_PREFIX : the prefix of the Avro filename, for example, output-

API

To run the template using the REST API, send an HTTP POST request. For more information on the API and its authorization scopes, see projects.templates.launch .

 POST 
  
 h 
 tt 
 ps 
 : 
 //dataflow.googleapis.com/v1b3/projects/ PROJECT_ID 
/locations/ LOCATION 
/templates:launch?gcsPath=gs://dataflow-templates- LOCATION 
/ VERSION 
/Cloud_Bigtable_to_GCS_Avro { 
  
 "jobName" 
 : 
  
 " JOB_NAME 
" 
 , 
  
 "parameters" 
 : 
  
 { 
  
 "bigtableProjectId" 
 : 
  
 " BIGTABLE_PROJECT_ID 
" 
 , 
  
 "bigtableInstanceId" 
 : 
  
 " INSTANCE_ID 
" 
 , 
  
 "bigtableTableId" 
 : 
  
 " TABLE_ID 
" 
 , 
  
 "outputDirectory" 
 : 
  
 " OUTPUT_DIRECTORY 
" 
 , 
  
 "filenamePrefix" 
 : 
  
 " FILENAME_PREFIX 
" 
 , 
  
 }, 
  
 "environment" 
 : 
  
 { 
  
 "zone" 
 : 
  
 "us-central1-f" 
  
 } 
 } 
 

Replace the following:

  • PROJECT_ID : the Google Cloud project ID where you want to run the Dataflow job
  • JOB_NAME : a unique job name of your choice
  • VERSION : the version of the template that you want to use

    You can use the following values:

  • LOCATION : the region where you want to deploy your Dataflow job—for example, us-central1
  • BIGTABLE_PROJECT_ID : the ID of the Google Cloud project of the Bigtable instance that you want to read data from
  • INSTANCE_ID : the ID of the Bigtable instance that contains the table
  • TABLE_ID : the ID of the Bigtable table to export
  • OUTPUT_DIRECTORY : the Cloud Storage path where data is written, for example, gs://mybucket/somefolder
  • FILENAME_PREFIX : the prefix of the Avro filename, for example, output-

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: