Datastore to Cloud Storage Text template [Deprecated]

This template is deprecated and will be removed in Q3 2023. Please migrate to Firestore to Cloud Storage Text template.

The Datastore to Cloud Storage Text template is a batch pipeline that reads Datastore entities and writes them to Cloud Storage as text files. You can provide a function to process each entity as a JSON string. If you don't provide such a function, every line in the output file will be a JSON-serialized entity.

Pipeline requirements

Datastore must be set up in the project before running the pipeline.

Template parameters

Required parameters

  • datastoreReadGqlQuery: A GQL ( https://cloud.google.com/datastore/docs/reference/gql_reference ) query that specifies which entities to grab. For example, SELECT * FROM MyKind .
  • datastoreReadProjectId: The ID of the Google Cloud project that contains the Datastore instance that you want to read data from.
  • textWritePrefix: The Cloud Storage path prefix that specifies where the data is written. For example, gs://mybucket/somefolder/ .

Optional parameters

  • datastoreReadNamespace: The namespace of the requested entities. To use the default namespace, leave this parameter blank.
  • javascriptTextTransformGcsPath: The Cloud Storage URI of the .js file that defines the JavaScript user-defined function (UDF) to use. For example, gs://my-bucket/my-udfs/my_file.js .
  • javascriptTextTransformFunctionName: The name of the JavaScript user-defined function (UDF) to use. For example, if your JavaScript function code is myTransform(inJson) { /*...do stuff...*/ } , then the function name is myTransform . For sample JavaScript UDFs, see UDF Examples ( https://github.com/GoogleCloudPlatform/DataflowTemplates#udf-examples ).

Run the template

Console

  1. Go to the Dataflow Create job from template page.
  2. Go to Create job from template
  3. In the Job name field, enter a unique job name.
  4. Optional: For Regional endpoint , select a value from the drop-down menu. The default region is us-central1 .

    For a list of regions where you can run a Dataflow job, see Dataflow locations .

  5. From the Dataflow template drop-down menu, select the Datastore to Text Files on Cloud Storage template.
  6. In the provided parameter fields, enter your parameter values.
  7. Click Run job .

gcloud

In your shell or terminal, run the template:

gcloud  
dataflow  
 jobs 
  
run  
 JOB_NAME 
  
 \ 
  
--gcs-location  
gs://dataflow-templates- REGION_NAME 
/ VERSION 
/Datastore_to_GCS_Text  
 \ 
  
--region  
 REGION_NAME 
  
 \ 
  
--parameters  
 \ 
 datastoreReadGqlQuery 
 = 
 "SELECT * FROM DATASTORE_KIND 
" 
, \ 
 datastoreReadProjectId 
 = 
 DATASTORE_PROJECT_ID 
, \ 
 datastoreReadNamespace 
 = 
 DATASTORE_NAMESPACE 
, \ 
 javascriptTextTransformGcsPath 
 = 
 PATH_TO_JAVASCRIPT_UDF_FILE 
, \ 
 javascriptTextTransformFunctionName 
 = 
 JAVASCRIPT_FUNCTION 
, \ 
 textWritePrefix 
 = 
gs:// BUCKET_NAME 
/output/

Replace the following:

  • JOB_NAME : a unique job name of your choice
  • REGION_NAME : the region where you want to deploy your Dataflow job—for example, us-central1
  • VERSION : the version of the template that you want to use

    You can use the following values:

  • BUCKET_NAME : the name of your Cloud Storage bucket
  • DATASTORE_PROJECT_ID : the Google Cloud project ID where the Datastore instance exists
  • DATASTORE_KIND : the type of your Datastore entities
  • DATASTORE_NAMESPACE : the namespace of your Datastore entities
  • JAVASCRIPT_FUNCTION : the name of the JavaScript user-defined function (UDF) that you want to use

    For example, if your JavaScript function code is myTransform(inJson) { /*...do stuff...*/ } , then the function name is myTransform . For sample JavaScript UDFs, see UDF Examples .

  • PATH_TO_JAVASCRIPT_UDF_FILE : the Cloud Storage URI of the .js file that defines the JavaScript user-defined function (UDF) you want to use—for example, gs://my-bucket/my-udfs/my_file.js

API

To run the template using the REST API, send an HTTP POST request. For more information on the API and its authorization scopes, see projects.templates.launch .

 POST 
  
 h 
 tt 
 ps 
 : 
 //dataflow.googleapis.com/v1b3/projects/ PROJECT_ID 
/locations/ LOCATION 
/templates:launch?gcsPath=gs://dataflow-templates- LOCATION 
/ VERSION 
/Datastore_to_GCS_Text { 
  
 "jobName" 
 : 
  
 " JOB_NAME 
" 
 , 
  
 "parameters" 
 : 
  
 { 
  
 "datastoreReadGqlQuery" 
 : 
  
 "SELECT * FROM DATASTORE_KIND 
" 
  
 "datastoreReadProjectId" 
 : 
  
 " DATASTORE_PROJECT_ID 
" 
 , 
  
 "datastoreReadNamespace" 
 : 
  
 " DATASTORE_NAMESPACE 
" 
 , 
  
 "javascriptTextTransformGcsPath" 
 : 
  
 " PATH_TO_JAVASCRIPT_UDF_FILE 
" 
 , 
  
 "javascriptTextTransformFunctionName" 
 : 
  
 " JAVASCRIPT_FUNCTION 
" 
 , 
  
 "textWritePrefix" 
 : 
  
 "gs:// BUCKET_NAME 
/output/" 
  
 }, 
  
 "environment" 
 : 
  
 { 
  
 "zone" 
 : 
  
 "us-central1-f" 
  
 } 
 } 
 

Replace the following:

  • PROJECT_ID : the Google Cloud project ID where you want to run the Dataflow job
  • JOB_NAME : a unique job name of your choice
  • LOCATION : the region where you want to deploy your Dataflow job—for example, us-central1
  • VERSION : the version of the template that you want to use

    You can use the following values:

  • BUCKET_NAME : the name of your Cloud Storage bucket
  • DATASTORE_PROJECT_ID : the Google Cloud project ID where the Datastore instance exists
  • DATASTORE_KIND : the type of your Datastore entities
  • DATASTORE_NAMESPACE : the namespace of your Datastore entities
  • JAVASCRIPT_FUNCTION : the name of the JavaScript user-defined function (UDF) that you want to use

    For example, if your JavaScript function code is myTransform(inJson) { /*...do stuff...*/ } , then the function name is myTransform . For sample JavaScript UDFs, see UDF Examples .

  • PATH_TO_JAVASCRIPT_UDF_FILE : the Cloud Storage URI of the .js file that defines the JavaScript user-defined function (UDF) you want to use—for example, gs://my-bucket/my-udfs/my_file.js

What's next

Design a Mobile Site
View Site in Mobile | Classic
Share by: