Run a managed notebooks instance on a Dataproc cluster

Vertex AI Workbench managed notebooks is deprecated . On April 14, 2025, support for managed notebooks will end and the ability to create managed notebooks instances will be removed. Existing instances will continue to function but patches, updates, and upgrades won't be available. To continue using Vertex AI Workbench, we recommend that you migrate your managed notebooks instances to Vertex AI Workbench instances .

This page shows you how to run a managed notebooks instance's notebook file on a Dataproc cluster.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project .

  4. Enable the Notebooks and Dataproc APIs.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Verify that billing is enabled for your Google Cloud project .

  7. Enable the Notebooks and Dataproc APIs.

    Enable the APIs

  8. If you haven't already, create a managed notebooks instance .

Required roles

To ensure that the service account has the necessary permissions to run a notebook file on a Dataproc Serverless cluster, ask your administrator to grant the service account the following IAM roles:

  • Dataproc Worker ( roles/dataproc.worker ) on your project
  • Dataproc Editor ( roles/dataproc.editor ) on the cluster for the dataproc.clusters.use permission

For more information about granting roles, see Manage access to projects, folders, and organizations .

These predefined roles contain the permissions required to run a notebook file on a Dataproc Serverless cluster. To see the exact permissions that are required, expand the Required permissionssection:

Required permissions

The following permissions are required to run a notebook file on a Dataproc Serverless cluster:

  • dataproc.agents.create
  • dataproc.agents.delete
  • dataproc.agents.get
  • dataproc.agents.update
  • dataproc.tasks.lease
  • dataproc.tasks.listInvalidatedLeases
  • dataproc.tasks.reportStatus
  • dataproc.clusters.use

Your administrator might also be able to give the service account these permissions with custom roles or other predefined roles .

Create a Dataproc cluster

To run a managed notebooks instance's notebook file in a Dataproc cluster, your cluster must meet the following criteria:

  • The cluster's component gateway must be enabled.

  • The cluster must have the Jupyter component .

  • The cluster must be in the same region as your managed notebooks instance.

To create your Dataproc cluster, enter the following command in either Cloud Shell or another environment where the Google Cloud CLI is installed.

gcloud  
dataproc  
clusters  
create  
 CLUSTER_NAME 
 \ 
  
--region = 
 REGION 
  
 \ 
  
--enable-component-gateway  
 \ 
  
--optional-components = 
JUPYTER

Replace the following:

  • REGION : the Google Cloud location of your managed notebooks instance

  • CLUSTER_NAME : the name of your new cluster

After a few minutes, your Dataproc cluster is available for use. Learn more about creating Dataproc clusters .

Open JupyterLab

  1. If you haven't already, create a managed notebooks instance in the same region where your Dataproc cluster is.

  2. In the Google Cloud console, go to the Managed notebookspage.

    Go to Managed notebooks

  3. Next to your managed notebooks instance's name, click Open JupyterLab.

Run a notebook file in your Dataproc cluster

You can run a notebook file in your Dataproc cluster from any managed notebooks instance in the same project and region.

Run a new notebook file

  1. In your managed notebooks instance's JupyterLab interface, select File  > New  > Notebook.

  2. Your Dataproc cluster's available kernels appear in the Select kernelmenu. Select the kernel that you want to use, and then click Select.

    Your new notebook file opens.

  3. Add code to your new notebook file, and run the code.

To change the kernel that you want to use after you've created your notebook file, see the following section.

Run an existing notebook file

  1. In your managed notebooks instance's JupyterLab interface, click the File Browserbutton, navigate to the notebook file that you want to run, and open it.

  2. To open the Select kerneldialog, click the kernel name of your notebook file, for example: Python (Local).

  3. To select a kernel from your Dataproc cluster, select a kernel name that includes your cluster name at the end of it. For example, a PySpark kernel on a Dataproc cluster named mycluster is named PySpark on mycluster.

  4. Click Selectto close the dialog.

    You can now run your notebook file's code on the Dataproc cluster.

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: