Stay organized with collectionsSave and categorize content based on your preferences.
This page describes the basics about deploying and running pipelines in
Cloud Data Fusion.
Deploy pipelines
After you finish designing and debugging a data pipeline and are satisfied with
the data you see in Preview, you're ready to deploy the pipeline.
When you deploy the pipeline, the Cloud Data Fusion Studio creates the
workflow and corresponding Apache Spark jobs in the background.
Run pipelines
After you deploy a pipeline, you can run a pipeline in the following ways:
To run a pipeline on demand, open a deployed pipeline and clickRun.
To schedule the pipeline to run at a certain time, open a deployed
pipeline and clickSchedule.
To trigger the pipeline based when another pipeline completes, open a
deployed pipeline and clickIncoming triggers.
The Pipeline Studio saves a pipeline's history each time it runs. You can
toggle between different runtime versions of the pipeline.
If the pipeline has macros, set theruntime argumentsfor each macro. You
can also review and change thepipeline configurationsbefore running the
deployed pipeline. You can see the status change during the phases of the
pipeline run, such asProvisioning,Starting,Running, andSucceeded. You can also stop the pipeline at any time.
If you enable instrumentation, you can explore the metrics generated by the
pipeline by clickingPropertieson any node in your pipeline, such as a
source, transformation, or sink.
For more information about the pipeline runs, clickSummary.
View run records
After a pipeline run completes, you can view the run record. By default, you can
view the last 30 days of run records. Cloud Data Fusion deletes them
after that period. You can extend that period using the REST API.
REST API
To retain run records more than 30 days, update theapp.run.records.ttloptions using the following command:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003eThis page provides an overview of deploying and running data pipelines within Cloud Data Fusion.\u003c/p\u003e\n"],["\u003cp\u003ePipelines can be deployed after design and debugging, creating workflows and Spark jobs in the background.\u003c/p\u003e\n"],["\u003cp\u003eDeployed pipelines can be run on demand, scheduled for specific times, or triggered by the completion of other pipelines.\u003c/p\u003e\n"],["\u003cp\u003eEach pipeline run is saved, allowing users to toggle between different versions and view run-time arguments and configurations.\u003c/p\u003e\n"],["\u003cp\u003eRun records are viewable by default for the last 30 days, but this period can be extended via the REST API, though it may affect web interface performance.\u003c/p\u003e\n"]]],[],null,["# Deploy and run pipelines\n\nThis page describes the basics about deploying and running pipelines in\nCloud Data Fusion.\n\nDeploy pipelines\n----------------\n\nAfter you finish designing and debugging a data pipeline and are satisfied with\nthe data you see in Preview, you're ready to deploy the pipeline.\n| **Note:** A deployed pipeline name must be unique in the namespace. You might be prompted to enter a unique name.\n\nWhen you deploy the pipeline, the Cloud Data Fusion Studio creates the\nworkflow and corresponding Apache Spark jobs in the background.\n\nRun pipelines\n-------------\n\nAfter you deploy a pipeline, you can run a pipeline in the following ways:\n\n- To run a pipeline on demand, open a deployed pipeline and click **Run**.\n- To schedule the pipeline to run at a certain time, open a deployed pipeline and click **Schedule**.\n- To trigger the pipeline based when another pipeline completes, open a deployed pipeline and click **Incoming triggers**.\n\nThe Pipeline Studio saves a pipeline's history each time it runs. You can\ntoggle between different runtime versions of the pipeline.\n\nIf the pipeline has macros, set the [runtime arguments](/data-fusion/docs/how-to/manage-macros-prefs-and-runtime-args) for each macro. You\ncan also review and change the [pipeline configurations](/data-fusion/docs/concepts/manage-pipeline-configurations) before running the\ndeployed pipeline. You can see the status change during the phases of the\npipeline run, such as **Provisioning** , **Starting** , **Running** , and\n**Succeeded**. You can also stop the pipeline at any time.\n\nIf you enable instrumentation, you can explore the metrics generated by the\npipeline by clicking **Properties** on any node in your pipeline, such as a\nsource, transformation, or sink.\n\nFor more information about the pipeline runs, click **Summary**.\n\nView run records\n----------------\n\nAfter a pipeline run completes, you can view the run record. By default, you can\nview the last 30 days of run records. Cloud Data Fusion deletes them\nafter that period. You can extend that period using the REST API. \n\n### REST API\n\n\nTo retain run records more than 30 days, update the `app.run.records.ttl`\noptions using the following command: \n\n curl -X PATCH -H 'Content-Type: application/json' -H \"Authorization: Bearer $(gcloud auth print-access-token)\" '\n https://datafusion.googleapis.com/v1beta1/projects/\u003cvar label=\"project name\" translate=\"no\"\u003ePROJECT_NAME\u003c/var\u003e/locations/\u003cvar label=\"region name\" translate=\"no\"\u003eREGION_NAME\u003c/var\u003e/instances/\u003cvar label=\"instance name\" translate=\"no\"\u003eINSTANCE_NAME\u003c/var\u003e?updateMask=options'\n -d '{ \"options\": { \"app.run.records.ttl.days\": \"\u003cvar label=\"days\" translate=\"no\"\u003eDAYS\u003c/var\u003e\", \"app.run.records.ttl.frequency.hours\": \"\u003cvar label=\"hours\" translate=\"no\"\u003eHOURS\u003c/var\u003e\" } }'\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003ePROJECT_NAME\u003c/var\u003e: the Google Cloud project name\n- \u003cvar translate=\"no\"\u003eREGION_NAME\u003c/var\u003e: the Cloud Data Fusion instance's region---for example, `us-east4`\n- \u003cvar translate=\"no\"\u003eINSTANCE_NAME\u003c/var\u003e: the Cloud Data Fusion instance ID\n- \u003cvar translate=\"no\"\u003eDAYS\u003c/var\u003e: Amount of time, in days, to retain run records for old pipeline runs---for example, `30`.\n- \u003cvar translate=\"no\"\u003eHOURS\u003c/var\u003e: frequency, in hours, to check for and delete old run records---for example, `24`.\n\n**Example:** \n\n curl -X PATCH -H 'Content-Type: application/json' -H \"Authorization: Bearer $(gcloud auth print-access-token)\" '\n https://datafusion.googleapis.com/v1beta1/projects/project-1/locations/us-east4/instances/data-fusion-instance-1?updateMask=options'\n -d '{ \"options\": { \"app.run.records.ttl.days\": \"30\", \"app.run.records.ttl.frequency.hours\": \"24\" } }'\n\n| **Note:** Storing run records for longer periods of time could affect the performance of the web interface.\n\nWhat's next\n-----------\n\n- Learn more about [pipeline configurations](/data-fusion/docs/concepts/manage-pipeline-configurations)."]]