The BigQuery to MongoDB template is a batch pipeline that reads rows from a BigQuery and writes them to MongoDB as documents. Currently each row is stored as a document.
Pipeline requirements
- The source BigQuery table must exist.
- The target MongoDB instance should be accessible from the Dataflow worker machines.
Template parameters
Required parameters
- mongoDbUri: The MongoDB connection URI in the format
mongodb+srv://:@
. - database: Database in MongoDB to store the collection. For example,
my-db
. - collection: The name of the collection in the MongoDB database. For example,
my-collection
. - inputTableSpec: The BigQuery table to read from. For example,
bigquery-project:dataset.input_table
.
Run the template
Console
- Go to the Dataflow Create job from template page. Go to Create job from template
- In the Job name field, enter a unique job name.
- Optional: For Regional endpoint
, select a value from the drop-down menu. The default
region is
us-central1
.For a list of regions where you can run a Dataflow job, see Dataflow locations .
- From the Dataflow template drop-down menu, select the BigQuery to MongoDB template.
- In the provided parameter fields, enter your parameter values.
- Click Run job .
gcloud
In your shell or terminal, run the template:
gcloud dataflow flex-template run JOB_NAME \ --project = PROJECT_ID \ --region = REGION_NAME \ --template-file-gcs-location = gs://dataflow-templates- REGION_NAME / VERSION /flex/BigQuery_to_MongoDB \ --parameters \ inputTableSpec = INPUT_TABLE_SPEC , \ mongoDbUri = MONGO_DB_URI , \ database = DATABASE , \ collection = COLLECTION
Replace the following:
-
PROJECT_ID
: the Google Cloud project ID where you want to run the Dataflow job -
JOB_NAME
: a unique job name of your choice -
REGION_NAME
: the region where you want to deploy your Dataflow job—for example,us-central1
-
VERSION
: the version of the template that you want to useYou can use the following values:
-
latest
to use the latest version of the template, which is available in the non-datedparent folder in the bucket— gs://dataflow-templates- REGION_NAME /latest/ - the version name, like
2023-09-12-00_RC00
, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket— gs://dataflow-templates- REGION_NAME /
-
-
INPUT_TABLE_SPEC
: your source BigQuery table name. -
MONGO_DB_URI
: your MongoDB URI. -
DATABASE
: your MongoDB database. -
COLLECTION
: your MongoDB collection.
API
To run the template using the REST API, send an HTTP POST request. For more information on the
API and its authorization scopes, see projects.templates.launch
.
POST h tt ps : //dataflow.googleapis.com/v1b3/projects/ PROJECT_ID /locations/ LOCATION /flexTemplates:launch { "launch_parameter" : { "jobName" : " JOB_NAME " , "parameters" : { "inputTableSpec" : " INPUT_TABLE_SPEC " , "mongoDbUri" : " MONGO_DB_URI " , "database" : " DATABASE " , "collection" : " COLLECTION " }, "containerSpecGcsPath" : "gs://dataflow-templates- LOCATION / VERSION /flex/BigQuery_to_MongoDB" , } }
Replace the following:
-
PROJECT_ID
: the Google Cloud project ID where you want to run the Dataflow job -
JOB_NAME
: a unique job name of your choice -
LOCATION
: the region where you want to deploy your Dataflow job—for example,us-central1
-
VERSION
: the version of the template that you want to useYou can use the following values:
-
latest
to use the latest version of the template, which is available in the non-datedparent folder in the bucket— gs://dataflow-templates- REGION_NAME /latest/ - the version name, like
2023-09-12-00_RC00
, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket— gs://dataflow-templates- REGION_NAME /
-
-
INPUT_TABLE_SPEC
: your source BigQuery table name. -
MONGO_DB_URI
: your MongoDB URI. -
DATABASE
: your MongoDB database. -
COLLECTION
: your MongoDB collection.
What's next
- Learn about Dataflow templates .
- See the list of Google-provided templates .