Apache Hive SQL translation guide
This document details the similarities and differences in SQL syntax between Apache Hive and BigQuery to help you plan your migration. To migrate your SQL scripts in bulk, use batch SQL translation . To translate ad hoc queries, use interactive SQL translation .
In some cases, there's no direct mapping between a SQL element in Hive and BigQuery. However, in most cases, BigQuery offers an alternative element to Hive to help you achieve the same functionality, as shown in the examples in this document.
The intended audience for this document is enterprise architects, database administrators, application developers, and IT security specialists. It assumes that you're familiar with Hive.
Data types
Hive and BigQuery have different data type
systems. In most cases, you can map data types in Hive to BigQuery data types
with a few exceptions, such as MAP
and UNION
. Hive
supports more implicit type casting than BigQuery. As a result,
the batch SQL translator inserts many explicit casts.
Hive | BigQuery |
---|---|
TINYINT
|
INT64
|
SMALLINT
|
INT64
|
INT
|
INT64
|
BIGINT
|
INT64
|
DECIMAL
|
NUMERIC
|
FLOAT
|
FLOAT64
|
DOUBLE
|
FLOAT64
|
BOOLEAN
|
BOOL
|
STRING
|
STRING
|
VARCHAR
|
STRING
|
CHAR
|
STRING
|
BINARY
|
BYTES
|
DATE
|
DATE
|
- | DATETIME
|
- | TIME
|
TIMESTAMP
|
DATETIME/TIMESTAMP
|
INTERVAL
|
- |
ARRAY
|
ARRAY
|
STRUCT
|
STRUCT
|
MAPS
|
STRUCT
with key values ( REPEAT
field) |
UNION
|
STRUCT
with different types |
- | GEOGRAPHY
|
- | JSON
|
Query syntax
This section addresses differences in query syntax between Hive and BigQuery.
SELECT
statement
Most Hive SELECT
statements are
compatible with BigQuery. The following table contains a list of
minor differences:
Case | Hive | BigQuery |
---|---|---|
Subquery
|
|
|
Column filtering
|
|
|
Exploding an array
|
|
|
FROM
clause
The FROM
clause in a query lists the table references from which data is
selected. In Hive, possible table references include
tables, views, and subqueries. BigQuery also supports all these
table references.
You can reference BigQuery tables in the FROM
clause by using
the following:
-
[project_id].[dataset_id].[table_name]
-
[dataset_id].[table_name]
-
[table_name]
BigQuery also supports additional table references :
- Historical versions of the table definition and rows using
FOR SYSTEM_TIME AS OF
- Field paths
,
or any path that resolves to a field within a data type (such as a
STRUCT
) - Flattened arrays
Comparison operators
The following table provides details about converting operators from Hive to BigQuery:
Function or operator | Hive | BigQuery |
---|---|---|
-
Unary minus*
Multiplication/
Division+
Addition-
Subtraction |
All number types | All number types
. To prevent errors during the divide operation, consider using |
~
Bitwise not|
Bitwise OR&
Bitwise AND^
Bitwise XOR |
Boolean data type | Boolean data type. |
Left shift
|
|
|
Right shift
|
|
|
Modulus (remainder)
|
X % Y
All number types |
MOD(X, Y)
|
Integer division
|
A DIV B
and A/B
for detailed precision |
All number types
. Note: To prevent errors during the divide operation, consider using |
Unary negation
|
!
, NOT
|
NOT
|
Types supporting equality comparisons
|
All primitive types | All comparable types and STRUCT
. |
|
a <=> b
|
Not supported. Translate to the following: |
|
a <> b
|
Not supported. Translate to the following: |
Relational operators (
=, ==, !=, <, >, >=
) |
All primitive types | All comparable types . |
String comparison
|
RLIKE
, REGEXP
|
REGEXP_CONTAINS
built-in function. Uses BigQuery regex syntax for string functions
for the regular expression patterns. |
[NOT] LIKE, [NOT] BETWEEN, IS [NOT] NULL
|
A [NOT] BETWEEN B AND C, A IS [NOT] (TRUE|FALSE), A [NOT] LIKE B
|
Same as Hive. In addition, BigQuery also supports the IN
operator
. |
JOIN conditions
Both Hive and BigQuery support the following types of joins:
-
[INNER] JOIN
-
LEFT [OUTER] JOIN
-
RIGHT [OUTER] JOIN
-
FULL [OUTER] JOIN
-
CROSS JOIN
and the equivalent implicit comma cross join
For more information, see Join operation and Hive joins .
Type conversion and casting
The following table provides details about converting functions from Hive to BigQuery:
Function or operator | Hive | BigQuery |
---|---|---|
Type casting
|
When a cast fails, `NULL` is returned. | Same syntax as Hive. For more information about BigQuery type conversion rules, see Conversion rules . If cast fails, you see an error. To have the same behavior as Hive, use |
SAFE
function calls |
If you prefix function calls with SAFE
, the function returns NULL
instead of reporting failure. For example, SAFE.SUBSTR('foo', 0, -2) AS safe_output;
returns NULL
.Note: When casting safely without errors, use |
Implicit conversion types
When migrating to BigQuery, you need to convert most of your Hive implicit conversions to BigQuery explicit conversions except for the following data types, which BigQuery implicitly converts.
From BigQuery type | To BigQuery type |
---|---|
INT64
|
FLOAT64
, NUMERIC
, BIGNUMERIC
|
BIGNUMERIC
|
FLOAT64
|
NUMERIC
|
BIGNUMERIC
, FLOAT64
|
BigQuery also performs implicit conversions for the following literals:
From BigQuery type | To BigQuery type |
---|---|
STRING
literal (for example, "2008-12-25"
) |
DATE
|
STRING
literal (for example, "2008-12-25 15:30:00"
) |
TIMESTAMP
|
STRING
literal (for example, "2008-12-25T07:30:00"
) |
DATETIME
|
STRING
literal (for example, "15:30:00"
) |
TIME
|
Explicit conversion types
If you want to convert Hive data types that BigQuery doesn't
implicitly convert, use the BigQuery CAST(expression AS type)
function
.
Functions
This section covers common functions used in Hive and BigQuery.
Aggregate functions
The following table shows mappings between common Hive aggregate, statistical aggregate, and approximate aggregate functions with their BigQuery equivalents:
Hive | BigQuery |
---|---|
count(DISTINCT expr[, expr...])
|
count(DISTINCT expr[, expr...])
|
percentile_approx(DOUBLE col, array(p1 [, p2]...) [, B]) WITHIN GROUP (ORDER BY expression)
|
APPROX_QUANTILES
(expression, 100)[OFFSET(CAST(TRUNC(percentile * 100) as INT64))]
BigQuery doesn't support the rest of the arguments that Hive defines. |
AVG
|
AVG
|
X | Y
|
BIT_OR
/ X | Y
|
X ^ Y
|
BIT_XOR
/ X ^ Y
|
X & Y
|
BIT_AND
/ X & Y
|
COUNT
|
COUNT
|
COLLECT_SET(col), \
COLLECT_LIST(col
) |
ARRAY_AGG
(col)
|
COUNT
|
COUNT
|
MAX
|
MAX
|
MIN
|
MIN
|
REGR_AVGX
|
AVG(
|
REGR_AVGY
|
AVG(
|
REGR_COUNT
|
SUM(
|
REGR_INTERCEPT
|
AVG(dep_var_expr)
|
REGR_R2
|
(COUNT(dep_var_expr) *
|
REGR_SLOPE
|
COVAR_SAMP(ind_var_expr,
|
REGR_SXX
|
SUM(POWER(ind_var_expr, 2)) - COUNT(ind_var_expr) * POWER(AVG(ind_var_expr),2)
|
REGR_SXY
|
SUM(ind_var_expr*dep_var_expr) - COUNT(ind_var_expr) * AVG(ind) * AVG(dep_var_expr)
|
REGR_SYY
|
SUM(POWER(dep_var_expr, 2)) - COUNT(dep_var_expr) * POWER(AVG(dep_var_expr),2)
|
ROLLUP
|
ROLLUP
|
STDDEV_POP
|
STDDEV_POP
|
STDDEV_SAMP
|
STDDEV_SAMP, STDDEV
|
SUM
|
SUM
|
VAR_POP
|
VAR_POP
|
VAR_SAMP
|
VAR_SAMP, VARIANCE
|
CONCAT_WS
|
STRING_AGG
|
Analytical functions
The following table shows mappings between common Hive analytical functions with their BigQuery equivalents:
Hive | BigQuery |
---|---|
AVG
|
AVG
|
COUNT
|
COUNT
|
COVAR_POP
|
COVAR_POP
|
COVAR_SAMP
|
COVAR_SAMP
|
CUME_DIST
|
CUME_DIST
|
DENSE_RANK
|
DENSE_RANK
|
FIRST_VALUE
|
FIRST_VALUE
|
LAST_VALUE
|
LAST_VALUE
|
LAG
|
LAG
|
LEAD
|
LEAD
|
COLLECT_LIST, \
COLLECT_SET
|
ARRAY_AGG
ARRAY_CONCAT_AGG
|
MAX
|
MAX
|
MIN
|
MIN
|
NTILE
|
NTILE
(constant_integer_expression)
|
PERCENT_RANK
|
PERCENT_RANK
|
RANK ()
|
RANK
|
ROW_NUMBER
|
ROW_NUMBER
|
STDDEV_POP
|
STDDEV_POP
|
STDDEV_SAMP
|
STDDEV_SAMP
, STDDEV
|
SUM
|
SUM
|
VAR_POP
|
VAR_POP
|
VAR_SAMP
|
VAR_SAMP
, VARIANCE
|
VARIANCE
|
VARIANCE ()
|
WIDTH_BUCKET
|
A user-defined function (UDF) can be used. |
Date and time functions
The following table shows mappings between common Hive date and time functions and their BigQuery equivalents:
DATE_ADD
|
DATE_ADD(date_expression, INTERVAL int64_expression date_part)
|
DATE_SUB
|
DATE_SUB(date_expression, INTERVAL int64_expression date_part)
|
CURRENT_DATE
|
CURRENT_DATE
|
CURRENT_TIME
|
CURRENT_TIME
|
CURRENT_TIMESTAMP
|
CURRENT_DATETIME
is recommended, as this value is timezone-free and synonymous with CURRENT_TIMESTAMP
\ CURRENT_TIMESTAMP
in Hive. |
EXTRACT(field FROM source)
|
EXTRACT(part FROM datetime_expression)
|
LAST_DAY
|
DATE_SUB( DATE_TRUNC( DATE_ADD(
|
MONTHS_BETWEEN
|
DATE_DIFF
(date_expression, date_expression, MONTH)
|
NEXT_DAY
|
DATE_ADD(
|
TO_DATE
|
PARSE_DATE
|
FROM_UNIXTIME
|
UNIX_SECONDS
|
FROM_UNIXTIMESTAMP
|
FORMAT_TIMESTAMP
|
YEAR \
QUARTER \
MONTH \
HOUR \
MINUTE \
SECOND \
WEEKOFYEAR
|
EXTRACT
|
DATEDIFF
|
DATE_DIFF
|
BigQuery offers the following additional date and time functions:
String functions
The following table shows mappings between Hive string functions and their BigQuery equivalents:
Hive | BigQuery |
---|---|
ASCII
|
TO_CODE_POINTS(string_expr)[OFFSET(0)]
|
HEX
|
TO_HEX
|
LENGTH
|
CHAR_LENGTH
|
LENGTH
|
CHARACTER_LENGTH
|
CHR
|
CODE_POINTS_TO_STRING
|
CONCAT
|
CONCAT
|
LOWER
|
LOWER
|
LPAD
|
LPAD
|
LTRIM
|
LTRIM
|
REGEXP_EXTRACT
|
REGEXP_EXTRACT
|
REGEXP_REPLACE
|
REGEXP_REPLACE
|
REPLACE
|
REPLACE
|
REVERSE
|
REVERSE
|
RPAD
|
RPAD
|
RTRIM
|
RTRIM
|
SOUNDEX
|
SOUNDEX
|
SPLIT
|
SPLIT
(instring, delimiter)[ORDINAL(tokennum)]
|
SUBSTR, \
SUBSTRING
|
SUBSTR
|
TRANSLATE
|
TRANSLATE
|
LTRIM
|
LTRIM
|
RTRIM
|
RTRIM
|
TRIM
|
TRIM
|
UPPER
|
UPPER
|
BigQuery offers the following additional string functions:
Math functions
The following table shows mappings between Hive math functions and their BigQuery equivalents:
Hive | BigQuery |
---|---|
ABS
|
ABS
|
ACOS
|
ACOS
|
ASIN
|
ASIN
|
ATAN
|
ATAN
|
CEIL
|
CEIL
|
CEILING
|
CEILING
|
COS
|
COS
|
FLOOR
|
FLOOR
|
GREATEST
|
GREATEST
|
LEAST
|
LEAST
|
LN
|
LN
|
LNNVL
|
Use with ISNULL
. |
LOG
|
LOG
|
MOD (% operator)
|
MOD
|
POWER
|
POWER
, POW
|
RAND
|
RAND
|
ROUND
|
ROUND
|
SIGN
|
SIGN
|
SIN
|
SIN
|
SQRT
|
SQRT
|
HASH
|
FARM_FINGERPRINT, MD5, SHA1, SHA256, SHA512
|
STDDEV_POP
|
STDDEV_POP
|
STDDEV_SAMP
|
STDDEV_SAMP
|
TAN
|
TAN
|
TRUNC
|
TRUNC
|
NVL
|
IFNULL
(expr, 0), COALESCE
(exp, 0)
|
BigQuery offers the following additional math functions:
Logical and conditional functions
The following table shows mappings between Hive logical and conditional functions and their BigQuery equivalents:
Hive | BigQuery |
---|---|
CASE
|
CASE
|
COALESCE
|
COALESCE
|
NVL
|
IFNULL
(expr, 0), COALESCE
(exp, 0)
|
NULLIF
|
NULLIF
|
IF
|
IF(expr, true_result, else_result)
|
ISNULL
|
IS NULL
|
ISNOTNULL
|
IS NOT NULL
|
NULLIF
|
NULLIF
|
UDFs and UDAFs
BigQuery supports UDFs but not user-defined aggregate functions (UDAFs).
DML syntax
This section addresses differences in data manipulation language (DML) syntax between Hive and BigQuery.
INSERT
statement
Most Hive INSERT
statements are compatible with
BigQuery. The following table shows exceptions:
Hive | BigQuery |
---|---|
INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]
|
INSERT INTO
table
(...) VALUES (...);
Note: In BigQuery, omitting column names in the |
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
|
BigQuery doesn't support the insert-overwrite operations.
This Hive syntax can be migrated to TRUNCATE
and INSERT
statements. |
BigQuery imposes DML quotas that restrict the number of DML statements that you can execute daily. To make the best use of your quota, consider the following approaches:
-
Combine multiple rows in a single
INSERT
statement, instead of one row for eachINSERT
operation. -
Combine multiple DML statements (including
INSERT
) by using aMERGE
statement. -
Use
CREATE TABLE ... AS SELECT
to create and populate new tables.
UPDATE
statement
Most Hive UPDATE
statements are compatible with
BigQuery. The following table shows exceptions:
Hive | BigQuery |
---|---|
UPDATE tablename SET column = value [, column = value ...] [WHERE expression]
|
UPDATE table
Note: All |
DELETE
and TRUNCATE
statements
You can use DELETE
or TRUNCATE
statements to remove rows from a table
without affecting the table schema or indexes.
In BigQuery, the DELETE
statement must have a WHERE
clause.
For more information about DELETE
in BigQuery, see DELETE
examples
.
Hive | BigQuery |
---|---|
DELETE
FROM tablename [WHERE expression]
|
DELETE FROM
table_name
WHERE TRUE
BigQuery |
TRUNCATE
[TABLE] table_name [PARTITION partition_spec];
|
TRUNCATE TABLE
[[project_name.]dataset_name.]table_name
|
MERGE
statement
The MERGE
statement can combine INSERT
, UPDATE
, and DELETE
operations
into a single upsert
statement and perform the operations. The MERGE
operation must match one source row at most for each target row.
Hive | BigQuery |
---|---|
MERGE INTO
ON
|
MERGE target
USING source
Note: You must list all columns that need to be updated. |
ALTER
statement
The following table provides details about converting CREATE VIEW
statements
from Hive to BigQuery:
Rename table
ALTER TABLE
table_name RENAME TO new_table_name;
bq copy project.dataset.old_table project.dataset.new_table
bq rm --table project.dataset.old_table
Table properties
ALTER TABLE
table_name SET TBLPROPERTIES table_properties;
table_properties:
: (property_name = property_value, property_name = property_value, ... )
Table Comment:
ALTER TABLE table_name SET TBLPROPERTIES ('comment' = new_comment);
SerDe properties (Serialize and deserialize)
ALTER TABLE
table_name [PARTITION partition_spec] SET SERDE serde_class_name [WITH SERDEPROPERTIES serde_properties];
ALTER TABLE table_name [PARTITION partition_spec] SET SERDEPROPERTIES serde_properties;
serde_properties:
: (property_name = property_value, property_name = property_value, ... )
To learn how to let BigQuery read data from CSV, JSON, AVRO, PARQUET, or ORC files, see Create Cloud Storage external tables .
Supports CSV, JSON, AVRO, and PARQUET export formats. For more information, see Export formats and compression types .
Table storage properties
ALTER TABLE
table_name CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name, ...)]
INTO num_buckets BUCKETS;
ALTER
statements.Skewed table
Skewed:
ALTER TABLE
table_name SKEWED BY (col_name1, col_name2, ...)
ON ([(col_name1_value, col_name2_value, ...) [, (col_name1_value, col_name2_value), ...]
[STORED AS DIRECTORIES];
Not Skewed:
ALTER TABLE table_name NOT SKEWED;
Not Stored as Directories:
ALTER TABLE table_name NOT STORED AS DIRECTORIES;
Skewed Location:
ALTER TABLE table_name SET SKEWED LOCATION (col_name1="location1" [, col_name2="location2", ...] );
Table constraints
ALTER TABLE
table_name ADD CONSTRAINT constraint_name PRIMARY KEY (column, ...) DISABLE NOVALIDATE;
ALTER TABLE table_name ADD CONSTRAINT constraint_name FOREIGN KEY (column, ...) REFERENCES table_name(column, ...) DISABLE NOVALIDATE RELY;
ALTER TABLE table_name DROP CONSTRAINT constraint_name;
ALTER TABLE [[project_name.]dataset_name.]table_name
ADD [CONSTRAINT [IF NOT EXISTS] [constraint_name]] constraint NOT ENFORCED;
ALTER TABLE [[project_name.]dataset_name.]table_name
ADD PRIMARY KEY(column_list) NOT ENFORCED;
For more information, see ALTER TABLE ADD PRIMARY KEY
statement
.
Add partition
ALTER TABLE
table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location'][, PARTITION partition_spec [LOCATION 'location'], ...];
partition_spec:
: (partition_column = partition_col_value, partition_column = partition_col_value, ...)
For more information, see Managing partitioned tables .
Rename partition
ALTER TABLE
table_name PARTITION partition_spec RENAME TO PARTITION partition_spec;
Exchange partition
-- Move partition from table_name_1 to table_name_2
ALTER TABLE table_name_2 EXCHANGE
PARTITION (partition_spec) WITH TABLE table_name_1;
-- multiple partitions
ALTER TABLE table_name_2 EXCHANGE PARTITION (partition_spec, partition_spec2, ...) WITH TABLE table_name_1;
Drop partition
ALTER TABLE table_name DROP
[IF EXISTS] PARTITION partition_spec[, PARTITION partition_spec, ...]
[IGNORE PROTECTION] [PURGE];
-
bq rm 'mydataset.table_name$partition_id'
-
DELETE from table_name$partition_id WHERE 1=1
For more information, see Delete a partition .
(Un)Archive partition
ALTER TABLE table_name ARCHIVE
PARTITION partition_spec;
ALTER TABLE table_name UNARCHIVE PARTITION partition_spec;
Table and partition file format
ALTER TABLE table_name [PARTITION partition_spec]
SET FILEFORMAT file_format;
Table and partition location
ALTER TABLE table_name [PARTITION partition_spec]
SET LOCATION "new location";
Table and partition protection
ALTER TABLE table_name [PARTITION partition_spec]
ENABLE|DISABLE NO_DROP [CASCADE];
ALTER TABLE table_name [PARTITION partition_spec] ENABLE|DISABLE OFFLINE;
Table and partition compact
ALTER TABLE table_name [PARTITION (partition_key = 'partition_value' [, ...])]
COMPACT 'compaction_type'[AND WAIT]
[WITH OVERWRITE TBLPROPERTIES ("property"="value" [, ...])];
Table and artition concatenate
Table and partition columns
ALTER TABLE
statements.Column name, type, position, and comment
ALTER TABLE table_name [PARTITION partition_spec]
CHANGE [COLUMN] col_old_name col_new_name column_type
[COMMENT col_comment] [FIRST|AFTER column_name] [CASCADE|RESTRICT];
DDL syntax
This section addresses differences in Data Definition Language (DDL) syntax between Hive and BigQuery.
CREATE TABLE
and DROP TABLE
statements
The following table provides details about converting CREATE TABLE
statements
from Hive to BigQuery:
create table table_name (
id int,
dtDontQuery string,
name string
)
CREATE TABLE `myproject`.mydataset.table_name ( id INT64,
dtDontQuery STRING,
name STRING
)
create table table_name (
id int,
dt string,
name string
)
partitioned by (date string)
CREATE TABLE `myproject`.mydataset.table_name ( id INT64,
dt DATE,
name STRING
)
PARTITION BY dt
OPTIONS(
partition_expiration_days=3,
description="a table partitioned by date_col"
)
Create table as select (CTAS)
CREATE TABLE new_key_value_store
ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
STORED AS RCFile
AS
SELECT (key % 1024) new_key, concat(key, value) key_value_pair, dt
FROM key_value_store
SORT BY new_key, key_value_pair;
CREATE TABLE `myproject`.mydataset.new_key_value_store
When partitioning by date, uncomment the following:
PARTITION BY dt
OPTIONS(
description="Table Description",
When partitioning by date, uncomment the following. It's recommended to use require_partition
when the table is partitioned.
require_partition_filter=TRUE
) AS
SELECT (key % 1024) new_key, concat(key, value) key_value_pair, dt
FROM key_value_store
SORT BY new_key, key_value_pair'
Create Table Like:
The LIKE
form of CREATE TABLE
lets you copy an existing table definition exactly.
CREATE TABLE empty_key_value_store
LIKE key_value_store [TBLPROPERTIES (property_name=property_value, ...)];
CREATE TABLE page_view(
viewTime INT,
userid BIGINT,
page_url STRING,
referrer_url STRING,
ip STRING COMMENT 'IP Address of the User'
)
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
CLUSTERED BY(userid) SORTED BY(viewTime) INTO 32 BUCKETS
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003'
STORED AS SEQUENCEFILE;
CREATE TABLE `myproject` mydataset.page_view ( viewTime INT,
dt DATE,
userId BIGINT,
page_url STRING,
referrer_url STRING,
ip STRING OPTIONS (description="IP Address of the User")
)
PARTITION BY dt
CLUSTER BY userId
OPTIONS (
partition_expiration_days=3,
description="This is the page view table",
require_partition_filter=TRUE
)'
For more information, see Create and use clustered tables .
CREATE TABLE list_bucket_multiple (col1 STRING, col2 int, col3 STRING)
SKEWED BY (col1, col2) ON (('s1',1), ('s3',3), ('s13',13), ('s78',78)) [STORED AS DIRECTORIES];
CREATE TEMPORARY TABLE list_bucket_multiple (
col1 STRING,
col2 int,
col3 STRING);
CREATE TABLE mydataset.newtable
(
col1 STRING OPTIONS(description="An optional INTEGER field"),
col2 INT64,
col3 STRING
)
PARTITION BY DATE(_PARTITIONTIME)
OPTIONS(
expiration_timestamp=TIMESTAMP "2020-01-01 00:00:00 UTC",
partition_expiration_days=1,
description="a table that expires in 2020, with each partition living for 24 hours",
labels=[("org_unit", "development")]
)
CREATE TRANSACTIONAL TABLE transactional_table_test(key string, value string) PARTITIONED BY(ds string) STORED AS ORC;
DROP TABLE [IF EXISTS] table_name [PURGE];
{DROP TABLE | DROP TABLE IF EXISTS} table_name
TRUNCATE TABLE table_name [PARTITION partition_spec];
partition_spec:
: (partition_column = partition_col_value, partition_column = partition_col_value, ...)
- Drop and create the table again with the same schema.
- Set write disposition for table to
WRITE_TRUNCATE
if the truncate operation is a common use case for the given table. - Use the
CREATE OR REPLACE TABLE
statement. - Use the
DELETE from table_name WHERE 1=1
statement.
Note: Specific partitions can also be truncated. For more information, see Delete a partition .
CREATE EXTERNAL TABLE
and DROP EXTERNAL TABLE
statements
For external table support in BigQuery, see Introduction to external data sources .
CREATE VIEW
and DROP VIEW
statements
The following table provides details about converting CREATE VIEW
statements
from Hive to BigQuery:
Hive | BigQuery |
---|---|
CREATE VIEW [IF NOT EXISTS] [db_name.]view_name [(column_name [COMMENT column_comment], ...) ]
|
{CREATE VIEW | CREATE VIEW IF NOT EXISTS | CREATE OR REPLACE VIEW}
|
CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]materialized_view_name
|
CREATE MATERIALIZED VIEW [IF NOT EXISTS] \
[project_id].[dataset_id].materialized_view_name
|
CREATE FUNCTION
and DROP FUNCTION
statements
The following table provides details about converting stored procedures from Hive to BigQuery:
Hive | BigQuery |
---|---|
CREATE TEMPORARY FUNCTION function_name AS class_name;
|
CREATE { TEMPORARY | TEMP } FUNCTION function_name ([named_parameter[, ...]])
|
DROP TEMPORARY FUNCTION [IF EXISTS] function_name;
|
Not supported. |
CREATE FUNCTION [db_name.]function_name AS class_name
|
Supported for allowlisted projects as an alpha feature. |
DROP FUNCTION [IF EXISTS] function_name;
|
DROP FUNCTION [ IF EXISTS ] function_name
|
RELOAD FUNCTION;
|
Not supported. |
CREATE MACRO
and DROP MACRO
statements
The following table provides details about converting procedural SQL statements used in creating macro from Hive to BigQuery with variable declaration and assignment:
Hive | BigQuery |
---|---|
CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression;
|
Not supported. In some cases, this can be substituted with a UDF. |
DROP TEMPORARY MACRO [IF EXISTS] macro_name;
|
Not supported. |
Error codes and messages
Hive error codes and BigQuery error codes are different. If your application logic is catching errors, eliminate the source of the error because BigQuery doesn't return the same error codes.
In BigQuery, it's common to use the INFORMATION_SCHEMA views or audit logging to examine errors.
Consistency guarantees and transaction isolation
Both Hive and BigQuery support transactions with ACID semantics. Transactions are enabled by default in Hive 3.
ACID semantics
Hive supports snapshot isolation . When you execute a query, the query is provided with a consistent snapshot of the database, which it uses until the end of its execution. Hive provides full ACID semantics at the row level, letting one application add rows when another application reads from the same partition without interfering with each other.
BigQuery provides optimistic concurrency control (first to commit wins) with snapshot isolation , in which a query reads the last committed data before the query starts. This approach guarantees the same level of consistency for each row and mutation, and across rows within the same DML statement, while avoiding deadlocks. For multiple DML updates to the same table, BigQuery switches to pessimistic concurrency control . Load jobs can run independently and append tables; however, BigQuery doesn't provide an explicit transaction boundary or session.
Transactions
Hive doesn't support multi-statement transactions. It
doesn't support BEGIN
, COMMIT
, and ROLLBACK
statements. In
Hive, all language operations are auto-committed.
BigQuery supports multi-statement transactions inside a single query or across multiple queries when you use sessions. A multi-statement transaction lets you perform mutating operations, such as inserting or deleting rows from one or more tables and either committing or rolling back the changes. For more information, see Multi-statement transactions .