Reuse clusters

This page describes how to reuse Dataproc clusters for your pipeline runs in Cloud Data Fusion. For more information, see When to reuse clusters and Run a pipeline against an existing Dataproc cluster .

Before you begin

  • You must have a Cloud Data Fusion instance in version 6.5.0 or later.

Enable cluster reuse

You can reuse clusters in a new compute profile, or in one that's been used in a deployed pipeline.

Enable cluster reuse in a new profile

  1. Go to your instance:

    1. In the Google Cloud console, go to the Cloud Data Fusion page.

    2. To open the instance in the Cloud Data Fusion Studio, click Instances, and then click View instance.

      Go to Instances

  2. Click System admin > Configuration > System compute profiles.

  3. Click Create new profile.

  4. Choose the Dataprocprovisioner.

  5. In the Create a profile for Dataprocwindow, enter the details about your cluster:

    1. In the Profile labeland Profile namefields, enter a name to identify the profile—for example, execution_compute-profile .
    2. In the Descriptionfield, describe the purpose of the profile—for example, Profile used for pipeline execution .
    3. In the Max idle timefield, enter a value. For more information, see Set max idle time .
    4. Set the Skip cluster deletefield to True . For more information, see When to reuse clusters .
    5. Optional: configure other optional fields.
    6. Click Create.

Enable cluster reuse in a deployed pipeline

  1. Go to your instance:

    1. In the Google Cloud console, go to the Cloud Data Fusion page.

    2. To open the instance in the Cloud Data Fusion Studio, click Instances, and then click View instance.

      Go to Instances

  2. Click List.

  3. Click the Deployedtab and click a pipeline name. The deployed pipeline opens on the Studiopage in the Cloud Data Fusion web interface.

  4. Click Configure.

  5. In the Compute configwindow, go to the chosen profile and click Customize.

  6. In the window that opens, enter the following values:

    1. In the Max Idle Timefield, enter a value. For more information, see Set max idle time .
    2. Set Skip cluster deleteto True . For more information, see When to reuse clusters .
  7. Click Done.

What's next

Design a Mobile Site
View Site in Mobile | Classic
Share by: