Create Amazon S3 BigLake tables

This document describes how to create an Amazon Simple Storage Service (Amazon S3) BigLake table. A BigLake table lets you use access delegation to query data in Amazon S3. Access delegation decouples access to the BigLake table from access to the underlying data store.

For information about how data flows between BigQuery and Amazon S3, see Data flow when querying data .

Before you begin

Ensure that you have a connection to access Amazon S3 data .

Required roles

To get the permissions that you need to create an external table, ask your administrator to grant you the BigQuery Admin ( roles/bigquery.admin ) IAM role on your dataset. For more information about granting roles, see Manage access to projects, folders, and organizations .

This predefined role contains the permissions required to create an external table. To see the exact permissions that are required, expand the Required permissionssection:

Required permissions

The following permissions are required to create an external table:

  • bigquery.tables.create
  • bigquery.connections.delegate

You might also be able to get these permissions with custom roles or other predefined roles .

Create a dataset

Before you create an external table, you need to create a dataset in the supported region . Select one of the following options:

Console

  1. Go to the BigQuery page.

    Go to BigQuery

  2. In the Explorer pane, select the project where you want to create the dataset.
  3. Expand the View actions option and click Create dataset .
  4. On the Create dataset page, specify the following details:
    1. For Dataset ID enter a unique dataset name .
    2. For Data location choose a supported region .
    3. Optional: To delete tables automatically, select the Enable table expiration checkbox and set the Default maximum table age in days. Data in Amazon S3 is not deleted when the table expires.
    4. If you want to use default collation , expand the Advanced options section and then select the Enable default collation option.
    5. Click Create dataset .

SQL

Use the CREATE SCHEMA DDL statement . The following example create a dataset in the aws-us-east-1 region:

  1. In the Google Cloud console, go to the BigQuerypage.

    Go to BigQuery

  2. In the query editor, enter the following statement:

     CREATE 
      
     SCHEMA 
      
     mydataset 
      OPTIONS 
     
      
     ( 
      
     location 
      
     = 
      
    ' aws 
     - 
     us 
     - 
     east 
     - 
     1 
    ' ); 
    
  3. Click Run.

For more information about how to run queries, see Run an interactive query .

bq

In a command-line environment, create a dataset using the bq mk command :

bq  
--location = 
 LOCATION 
  
mk  
 \ 
  
--dataset  
 \ 
 PROJECT_ID 
: DATASET_NAME 

The --project_id parameter overrides the default project.

Replace the following:

  • LOCATION : the location of your dataset

    For information about supported regions, see Locations . After you create a dataset, you can't change its location. You can set a default value for the location by using the .bigqueryrc file .

  • PROJECT_ID : your project ID

  • DATASET_NAME : the name of the dataset that you want to create

    To create a dataset in a project other than your default project, add the project ID to the dataset name in the following format: PROJECT_ID : DATASET_NAME .

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries . For more information, see the BigQuery Java API reference documentation .

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries .

  import 
  
 com.google.cloud.bigquery.BigQuery 
 ; 
 import 
  
 com.google.cloud.bigquery.BigQueryException 
 ; 
 import 
  
 com.google.cloud.bigquery.BigQueryOptions 
 ; 
 import 
  
 com.google.cloud.bigquery.Dataset 
 ; 
 import 
  
 com.google.cloud.bigquery.DatasetInfo 
 ; 
 // Sample to create a aws dataset 
 public 
  
 class 
 CreateDatasetAws 
  
 { 
  
 public 
  
 static 
  
 void 
  
 main 
 ( 
 String 
 [] 
  
 args 
 ) 
  
 { 
  
 // TODO(developer): Replace these variables before running the sample. 
  
 String 
  
 projectId 
  
 = 
  
" MY_PROJECT_ID 
" ; 
  
 String 
  
 datasetName 
  
 = 
  
" MY_DATASET_NAME 
" ; 
  
 // Note: As of now location only supports aws-us-east-1 
  
 String 
  
 location 
  
 = 
  
" aws 
 - 
 us 
 - 
 east 
 - 
 1 
" ; 
  
 createDatasetAws 
 ( 
 projectId 
 , 
  
 datasetName 
 , 
  
 location 
 ); 
  
 } 
  
 public 
  
 static 
  
 void 
  
 createDatasetAws 
 ( 
 String 
  
 projectId 
 , 
  
 String 
  
 datasetName 
 , 
  
 String 
  
 location 
 ) 
  
 { 
  
 try 
  
 { 
  
 // Initialize client that will be used to send requests. This client only needs to be created 
  
 // once, and can be reused for multiple requests. 
  
 BigQuery 
  
 bigquery 
  
 = 
  
 BigQueryOptions 
 . 
 getDefaultInstance 
 (). 
 getService 
 (); 
  
 DatasetInfo 
  
 datasetInfo 
  
 = 
  
 DatasetInfo 
 . 
 newBuilder 
 ( 
 projectId 
 , 
  
 datasetName 
 ). 
 setLocation 
 ( 
 location 
 ). 
 build 
 (); 
  
 Dataset 
  
 dataset 
  
 = 
  
 bigquery 
 . 
 create 
 ( 
 datasetInfo 
 ); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
  
" Aws 
  
 dataset 
  
 created 
  
 successfully 
  
 : 
"  
 + 
  
 dataset 
 . 
 getDatasetId 
 (). 
 getDataset 
 ()); 
  
 } 
  
 catch 
  
 ( 
 BigQueryException 
  
 e 
 ) 
  
 { 
  
 System 
 . 
 out 
 . 
 println 
 ( 
" Aws 
  
 dataset 
  
 was 
  
 not 
  
 created 
 . 
  
 \ 
 n 
"  
 + 
  
 e 
 . 
 toString 
 ()); 
  
 } 
  
 } 
 }