Create Amazon S3 BigLake tables

This document describes how to create an Amazon Simple Storage Service (Amazon S3) BigLake table. A BigLake table lets you use access delegation to query data in Amazon S3. Access delegation decouples access to the BigLake table from access to the underlying datastore.

For information about how data flows between BigQuery and Amazon S3, see Data flow when querying data .

Before you begin

Ensure that you have a connection to access Amazon S3 data .

Required roles

To get the permissions that you need to create an external table, ask your administrator to grant you the BigQuery Admin ( roles/bigquery.admin ) IAM role on your dataset. For more information about granting roles, see Manage access to projects, folders, and organizations .

This predefined role contains the permissions required to create an external table. To see the exact permissions that are required, expand the Required permissionssection:

Required permissions

The following permissions are required to create an external table:

bigquery.tables.create
bigquery.connections.delegate

You might also be able to get these permissions with custom roles or other predefined roles .

Create a dataset

Before you create an external table, you need to create a dataset in the supported region . Select one of the following options:

Console

Go to the BigQuery page.

Go to BigQuery
In the left pane, click Explorer .
In the Explorer pane, select the project where you want to create the dataset.
Click View actions , and then click Create dataset .
On the Create dataset page, specify the following details:

For Dataset ID enter a unique dataset name .
For Data location choose a supported region .
Optional: To delete tables automatically, select the Enable table expiration checkbox and set the Default maximum table age in days. Data in Amazon S3 is not deleted when the table expires.
If you want to use default collation , expand the Advanced options section and then select the Enable default collation option.
Click Create dataset .

SQL

Use the CREATE SCHEMA DDL statement . The following example create a dataset in the aws-us-east-1 region:

In the Google Cloud console, go to the BigQuerypage.

Go to BigQuery

In the query editor, enter the following statement:

 CREATE 
  
 SCHEMA 
  
 mydataset 
  OPTIONS 
 
  
 ( 
  
 location 
  
 = 
  
 'aws-us-east-1' 
 );

Click Run.

For more information about how to run queries, see Run an interactive query .

bq

In a command-line environment, create a dataset using the bq mk command :

bq  
--location = 
 LOCATION 
  
mk  
 \ 
  
--dataset  
 \ 
 PROJECT_ID 
: DATASET_NAME

The --project_id parameter overrides the default project.

Replace the following:

LOCATION : the location of your dataset

For information about supported regions, see Locations . After you create a dataset, you can't change its location. You can set a default value for the location by using the .bigqueryrc file .
PROJECT_ID : your project ID
DATASET_NAME : the name of the dataset that you want to create

To create a dataset in a project other than your default project, add the project ID to the dataset name in the following format: PROJECT_ID : DATASET_NAME .

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries . For more information, see the BigQuery Java API reference documentation .

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries .

  import 
  
 com.google.cloud.bigquery. BigQuery 
 
 ; 
 import 
  
 com.google.cloud.bigquery. BigQueryException 
 
 ; 
 import 
  
 com.google.cloud.bigquery. BigQueryOptions 
 
 ; 
 import 
  
 com.google.cloud.bigquery. Dataset 
 
 ; 
 import 
  
 com.google.cloud.bigquery. DatasetInfo 
 
 ; 
 // Sample to create a aws dataset 
 public 
  
 class 
 CreateDatasetAws 
  
 { 
  
 public 
  
 static 
  
 void 
  
 main 
 ( 
 String 
 [] 
  
 args 
 ) 
  
 { 
  
 // TODO(developer): Replace these variables before running the sample. 
  
 String 
  
 projectId 
  
 = 
  
 "MY_PROJECT_ID" 
 ; 
  
 String 
  
 datasetName 
  
 = 
  
 "MY_DATASET_NAME" 
 ; 
  
 // Note: As of now location only supports aws-us-east-1 
  
 String 
  
 location 
  
 = 
  
 "aws-us-east-1" 
 ; 
  
 createDatasetAws 
 ( 
 projectId 
 , 
  
 datasetName 
 , 
  
 location 
 ); 
  
 } 
  
 public 
  
 static 
  
 void 
  
 createDatasetAws 
 ( 
 String 
  
 projectId 
 , 
  
 String 
  
 datasetName 
 , 
  
 String 
  
 location 
 ) 
  
 { 
  
 try 
  
 { 
  
 // Initialize client that will be used to send requests. This client only needs to be created 
  
 // once, and can be reused for multiple requests. 
  
  BigQuery 
 
  
 bigquery 
  
 = 
  
  BigQueryOptions 
 
 . 
 getDefaultInstance 
 (). 
 getService 
 (); 
  
  DatasetInfo 
 
  
 datasetInfo 
  
 = 
  
  DatasetInfo 
 
 . 
 newBuilder 
 ( 
 projectId 
 , 
  
 datasetName 
 ). 
 setLocation 
 ( 
 location 
 ). 
 build 
 (); 
  
  Dataset 
 
  
 dataset 
  
 = 
  
 bigquery 
 . 
  create 
 
 ( 
 datasetInfo 
 ); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
  
 "Aws dataset created successfully :" 
  
 + 
  
 dataset 
 . 
 getDatasetId 
 (). 
 getDataset 
 ()); 
  
 } 
  
 catch 
  
 ( 
  BigQueryException 
 
  
 e 
 ) 
  
 { 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "Aws dataset was not created. \n" 
  
 + 
  
 e 
 . 
 toString 
 ()); 
  
 } 
  
 } 
 }

Delta Lake Type	BigQuery Type
`boolean`	`BOOL`
`byte`	`INT64`
`int`	`INT64`
`long`	`INT64`
`float`	`FLOAT64`
`double`	`FLOAT64`
`Decimal(P/S)`	`NUMERIC` or `BIG_NUMERIC` depending on precision
`date`	`DATE`
`time`	`TIME`
`timestamp (not partition column)`	`TIMESTAMP`
`timestamp (partition column)`	`DATETIME`
`string`	`STRING`
`binary`	`BYTES`
`array<Type>`	`ARRAY<Type>`
`struct`	`STRUCT`
`map<KeyType, ValueType>`	`ARRAY<Struct<key KeyType, value ValueType>>`

Region	VPC ID
aws-ap-northeast-2	vpc-0b488548024288af2
aws-ap-southeast-2	vpc-0726e08afef3667ca
aws-eu-central-1	vpc-05c7bba12ad45558f
aws-eu-west-1	vpc-0e5c646979bbe73a0
aws-us-east-1	vpc-0bf63a2e71287dace
aws-us-west-2	vpc-0cc24e567b9d2c1cb

Create Amazon S3 BigLake tables

Before you begin

Required roles

Required permissions

Create a dataset

Console

SQL

bq

Java

Create BigLake tables on unpartitioned data

Console

SQL

bq

API

Java

Create BigLake tables on partitioned data

Console

SQL

bq

API

Delta Lake tables

Schema synchronization

Type conversion

Limitations

Create a Delta Lake table

BigQuery Omni transfer with Delta Lake

Query BigLake tables

View resource metadata

Console

bq

API

VPC Service Controls

Required permission

Set up VPC Service Controls using the Google Cloud console

Set up VPC Service Controls using the gcloud CLI

Set the default access policy

Create the egress policy input file

Examples

Add the egress policy

Verify your perimeter

Allow BigQuery Omni VPC access to Amazon S3

Apply an S3 bucket policy for BigQuery Omni VPC

AWS CLI

Terraform

BigQuery Omni VPC Resource IDs

Limitations

What's next