Create a repository

This document helps you understand the concept of repositories in Dataform and how to create a new repository.

About Dataform repositories

Each Dataform repository houses a collection of SQLX and JavaScript files that make up your workflow, as well as Dataform configuration files and packages. You interact with the contents of your repository in a development workspace .

Dataform displays your repositories on the Dataform page in the alphabetical order of repository IDs. You can sort and filter them.

  • To view your repositories, in the Google Cloud console, go to the Dataformpage.

    Go to Dataform

Each Dataform repository is connected to default Dataform service agent or a custom service account. You can only select a custom service account when you create a repository . You can edit the service account later.

By default, Dataform uses a service agent or service account derived from your project number in the following format:

 service- PROJECT_NUMBER 
@gcp-sa-dataform.iam.gserviceaccount.com 

Dataform uses Git to record changes and manage file versions. Each Dataform repository corresponds with a Git repository. After you create a Dataform repository, you can connect it to a remote GitHub, GitLab, or Bitbucket repository.

In a Dataform repository, Dataform stores the repository code. In a connected repository, the third-party repository stores the repository code. Dataform interacts with the third-party repository to allow you to edit and execute its contents in a Dataform development workspace.

The Dataform repository page consists of the following components:

Development Workspaces tab
Displays development workspaces created in the repository.
Workflow Execution Logs tab
Displays Dataform workflow execution logs .
Releases & scheduling tab
Lets you inspect, create, edit, and delete release configurations and workflow configurations .
Settings tab
Displays the name and location of the repository. For a repository connected to a third-party Git repository, displays the third-party repository source, default branch name, and secret token. Displays the buttons to connect the repository to a third-party Git repository and to edit the Git connection .
Create development workspace button
Lets you create a development workspace .

After you create and initialize a development workspace, you can edit your workflow settings file to configure the following Dataform settings of your repository:

  • The default database (Google Cloud project ID).
  • The default schema (BigQuery dataset ID).
  • The default BigQuery location.
  • The default schema (BigQuery dataset ID) for assertions.
  • The warehouse, which must be set to bigquery .
  • User-defined variables that are made available to project code during compilation.

For more information about Dataform repository settings, see IProjectConfig in the Dataform core reference .

Repository settings

When you create a Dataform repository, you need to set the following repository settings:

Repository ID
A unique ID of the repository. IDs can only include numbers, letters, hyphens, and underscores.
Region

Dataform region for storing the repository and its contents.

This storage region can be different than the processing region where Dataform processes your code and stores the output of executions. By default, the processing region is set to your default BigQuery dataset region. You can edit the processing region in the workflow settings file after creating the repository. For more information, see Configure Dataform workflow settings .

Service agent or service account

The Dataform service agent or custom service account associated with the repository. For new repositories, you must provide a custom service account. You can select a service account associated with your Google Cloud project or manually enter a different service account.

By default, Dataform uses a service agent or service account derived from your project number in the following format:

 service- PROJECT_NUMBER 
@gcp-sa-dataform.iam.gserviceaccount.com 

You must use a custom service account to run workflows in your repository, but the default Dataform service agent is still used for all other repository operations.

Strict act-as mode

Enables an additional security check that requires the iam.serviceAccounts.actAs permission on the service account. For new repositories, strict act-as mode is enforced. For existing repositories, we recommend using custom service accounts and enabling strict act-as mode to ensure a more secure and predictable permissions model.

Encryption

Encryption method for the repository. You can use the default encryption , a unique customer-managed Cloud KMS encryption key, or a default Dataform CMEK key. For more information about using customer-managed encryption keys (CMEK) in Dataform, see Use customer-managed encryption keys .

After you create a repository, you can connect it to GitHub or GitLab .

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project : To create a project, you need the Project Creator role ( roles/resourcemanager.projectCreator ), which contains the resourcemanager.projects.create permission. Learn how to grant roles .

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project .

  4. Enable the BigQuery and Dataform APIs.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role ( roles/serviceusage.serviceUsageAdmin ), which contains the serviceusage.services.enable permission. Learn how to grant roles .

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project : To create a project, you need the Project Creator role ( roles/resourcemanager.projectCreator ), which contains the resourcemanager.projects.create permission. Learn how to grant roles .

    Go to project selector

  6. Verify that billing is enabled for your Google Cloud project .

  7. Enable the BigQuery and Dataform APIs.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role ( roles/serviceusage.serviceUsageAdmin ), which contains the serviceusage.services.enable permission. Learn how to grant roles .

    Enable the APIs

  8. To use CMEK encryption for the repository, enable CMEK encryption of Dataform repositories .

Required roles

To get the permissions that you need to create and delete a repository, ask your administrator to grant you the following IAM roles on repositories:

  • Dataform Admin ( roles/dataform.admin ) - the project
  • Service Account User ( roles/iam.serviceAccountUser ) - the custom service account

For more information about granting roles, see Manage access to projects, folders, and organizations .

You might also be able to get the required permissions through custom roles or other predefined roles .

Grant required roles

To run workflows in your Dataform repository and in BigQuery, you can use a custom service account or your Google Account.

Your custom service account must have the following required roles:

  • BigQuery Data Editor ( roles/bigquery.dataEditor ) on projects or specific BigQuery datasets to which Dataform needs both read and write access. This usually includes the project hosting your Dataform repository.
  • BigQuery Data Viewer ( roles/bigquery.dataViewer ) on projects or specific BigQuery datasets to which Dataform needs read-only access.
  • BigQuery Job User ( roles/bigquery.jobUser ) on the project hosting your Dataform repository.

To let Dataform use your custom service account, the default Dataform service agent must have the following roles on the custom service account resource:

To grant these roles, follow these steps:

  1. In the Google Cloud console, go to the IAMpage.

    Go to IAM

  2. Click Grant access.

  3. In the New principalsfield, enter your custom service account ID.

  4. In the Select a rolemenu, select the following roles one by one, using Add another rolefor each additional role:

    • BigQuery Data Editor
    • BigQuery Data Viewer
    • BigQuery Job User
  5. Click Save.

  6. In the Google Cloud console, go to the Service accountspage.

    Go to Service accounts

  7. Select your custom service account.

  8. Go to Principals with access, and then click Grant access.

  9. In the New principalsfield, enter your default Dataform service agent ID.

    Your default Dataform service agent ID is in the following format:

     service- PROJECT_NUMBER 
    @gcp-sa-dataform.iam.gserviceaccount.com 
    

    Replace PROJECT_NUMBER with the numeral ID of your Google Cloud project. You can find your Google Cloud project ID in the Google Cloud console dashboard. For more information, see Identifying projects .

  10. In the Select a rolelist, add the following roles:

    • Service Account User
    • Service Account Token Creator
  11. Click Save.

For more information on granting roles, see Grant Dataform the required access .

Create a repository

To create a Dataform repository, follow these steps:

  1. In the Google Cloud console, go to the Dataformpage.

    Go to Dataform

  2. Click Create repository.

  3. On the Create repositorypage, in the Repository IDfield, enter a unique ID.

    IDs can only include numbers, letters, hyphens, and underscores.

  4. In the Regiondrop-down list, select a Dataform region for storing the repository and its contents. Select the Dataform region nearest to your location.

    For a list of available Dataform regions, see Locations . The repository region does not have to match the location of your BigQuery datasets.

    In the workflow_settings.yaml file, you can set the processing region where Dataform processes your code and stores the output of executions. The processing region has to match the location of your BigQuery datasets, but does not need to match the repository region. For more information, see Configure Dataform workflow settings .

  5. In the Service accountmenu, select a custom service account for the repository.

    In the menu, you can select a custom service account associated with your Google Cloud project that you have access to. Custom service accounts are used only for workflow execution. All other repository operations are performed by the default Dataform service agent.

    1. Optional: To select a service account that is not displayed in the menu, click Enter manuallyand enter a service account ID.
  6. In the actAs permission checkssection, enforce the permission checks on user actions on the repository. For details on these checks, see Use strict act-as mode .

  7. Configure your selected encryption mechanism for the repository:

    Default CMEK key

    Dataform displays the Use the default KMS keycheckbox and selects it by default.

    • To encrypt the repository with the default Dataform CMEK key, leave the Use the default KMS keycheckbox selected.

    Unique CMEK key

    To encrypt the repository with a unique CMEK key, do the following:

    1. If the Use the default KMS keycheckbox is selected by default, deselect the checkbox.
    2. In the Encryptionsection, select the Customer-managed encryption keys (CMEK)option.
    3. In the Select a customer-managed keydrop-down, select a unique CMEK key.

    Encryption at rest

    • To use the default encryption , in the Encryptionsection, select the Google-managed encryption keyoption.
  8. Click Create, and then click Go to repositories.

You must associate a custom service account with a Dataform repository for workflow execution. All other repository operations are still performed by the default Dataform service agent.

To edit the service account for a Dataform repository, follow these steps:

  1. In the Google Cloud console, go to the Dataformpage.

    Go to Dataform

  2. Select a repository, and then click Settings.

  3. By the Service accountfield, click Edit Service account.

  4. In the Service accountmenu, select a service account for the repository.

    In the menu, you can select a custom service account associated with your Google Cloud project that you have access to.

    1. Optional: To select a service account that is not displayed in the menu, click Enter manuallyand enter a service account ID.
  5. Click Save.

Delete a repository

To delete a repository and all its contents, follow these steps:

  1. In the Google Cloud console, go to the Dataformpage.

    Go to Dataform

  2. By the repository that you want to delete, click the Moremenu, and then select Delete.

  3. In the Delete repositorywindow, enter the name of the repository to confirm deletion.

  4. Click Delete.

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: