The ML.DISTANCE function
This document describes the ML.DISTANCE
scalar function, which lets you
compute the distance between two vectors.
Syntax
ML.DISTANCE(vector1, vector2 [, type])
Arguments
ML.DISTANCE
has the following arguments:
-
vector1: anARRAYvalue that represents the first vector, in one of the following forms:-
ARRAY<Numerical type> -
ARRAY<STRUCT<STRING, Numerical type>> -
ARRAY<STRUCT<INT64, Numerical type>>
where
Numerical typeisBIGNUMERIC,FLOAT64,INT64orNUMERIC. For exampleARRAY<STRUCT<INT64, BIGNUMERIC>>.When a vector is expressed as
ARRAY<Numerical type>, each element of the array denotes one dimension of the vector. An example of a four-dimensional vector is[0.0, 1.0, 1.0, 0.0].When a vector is expressed as
ARRAY<STRUCT<STRING, Numerical type>>orARRAY<STRUCT<INT64, Numerical type>>, eachSTRUCTarray item denotes one dimension of the vector. An example of a three-dimensional vector is[("a", 0.0), ("b", 1.0), ("c", 1.0)].The initial
INT64orSTRINGvalue in theSTRUCTis used as an identifier to match theSTRUCTvalues invector2. The ordering of data in the array doesn't matter; the values are matched by the identifier rather than by their position in the array. If either vector has anySTRUCTvalues with duplicate identifiers, running this function returns an error. -
-
vector2: anARRAYvalue that represents the second vector.vector2must have the same type asvector1.For example, if
vector1is anARRAY<STRUCT<STRING, FLOAT64>>column with three elements, like[("a", 0.0), ("b", 1.0), ("c", 1.0)], thenvector2must also be anARRAY<STRUCT<STRING, FLOAT64>>column.When
vector1andvector2areARRAY<Numerical type>columns, they must have the same array length. -
type: aSTRINGvalue that specifies the type of distance to calculate. Valid values areEUCLIDEAN,MANHATTAN, andCOSINE. If this argument isn't specified, the default value isEUCLIDEAN.
Output
ML.DISTANCE
returns a FLOAT64
value that represents the distance between
the vectors. Returns NULL
if either vector1
or vector2
is NULL
.
Example
Get the Euclidean distance for two tensors of ARRAY<FLOAT64>
values:
-
Create the table
t1:CREATE TABLE mydataset . t1 ( v1 ARRAY<FLOAT64 > , v2 ARRAY<FLOAT64 > )
-
Populate
t1:INSERT mydataset . t1 ( v1 , v2 ) VALUES ([ 4 . 1 , 0 . 5 , 1 . 0 ], [ 3 . 0 , 0 . 0 , 2 . 5 ])
-
Calculate the Euclidean norm for
v1andv2:SELECT v1 , v2 , ML . DISTANCE ( v1 , v2 , 'EUCLIDEAN' ) AS output FROM mydataset . t1
This query produces the following output:
+---------------+---------------+-------------------+ | v1 | v2 | output | +---------------+---------------+-------------------| | [ 4.1 , 0.5 , 1.0 ] | [ 3.0 , 0.0 , 2.5 ] | 1.926136028425822 | +------------+------------------+-------------------+
What's next
- For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model .

