Stay organized with collectionsSave and categorize content based on your preferences.
The ML.DISTANCE function
This document describes theML.DISTANCEscalar function, which lets you
compute the distance between two vectors.
Syntax
ML.DISTANCE(vector1, vector2 [, type])
Arguments
ML.DISTANCEhas the following arguments:
vector1: anARRAYvalue that represents the first vector, in one of the
following forms:
ARRAY<Numerical type>
ARRAY<STRUCT<STRING, Numerical type>>
ARRAY<STRUCT<INT64, Numerical type>>
whereNumerical typeisBIGNUMERIC,FLOAT64,INT64orNUMERIC.
For exampleARRAY<STRUCT<INT64, BIGNUMERIC>>.
When a vector is expressed asARRAY<Numerical type>, each element
of the array denotes one dimension of the vector. An example of a
four-dimensional vector is[0.0, 1.0, 1.0, 0.0].
When a vector is expressed asARRAY<STRUCT<STRING, Numerical type>>orARRAY<STRUCT<INT64, Numerical type>>, eachSTRUCTarray item
denotes one dimension of the vector. An example of a three-dimensional
vector is[("a", 0.0), ("b", 1.0), ("c", 1.0)].
The initialINT64orSTRINGvalue in theSTRUCTis used as an
identifier to match theSTRUCTvalues invector2. The ordering of data
in the array doesn't matter; the values are matched by the identifier rather
than by their position in the array. If either vector has anySTRUCTvalues with duplicate identifiers, running this function returns an error.
vector2: anARRAYvalue that represents the second vector.
vector2must have the same type asvector1.
For example, ifvector1is anARRAY<STRUCT<STRING, FLOAT64>>column with three elements, like[("a", 0.0), ("b", 1.0), ("c", 1.0)], thenvector2must also be anARRAY<STRUCT<STRING, FLOAT64>>column.
Whenvector1andvector2areARRAY<Numerical type>columns,
they must have the same array length.
type: aSTRINGvalue that specifies the type of distance to calculate.
Valid values areEUCLIDEAN,MANHATTAN, andCOSINE.
If this argument isn't specified, the default value isEUCLIDEAN.
Output
ML.DISTANCEreturns aFLOAT64value that represents the distance between
the vectors. ReturnsNULLif eithervector1orvector2isNULL.
Example
Get the Euclidean distance for two tensors ofARRAY<FLOAT64>values:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-29 UTC."],[[["\u003cp\u003eThe \u003ccode\u003eML.DISTANCE\u003c/code\u003e function calculates the distance between two vectors, returning a \u003ccode\u003eFLOAT64\u003c/code\u003e value representing that distance.\u003c/p\u003e\n"],["\u003cp\u003eIt accepts two \u003ccode\u003eARRAY\u003c/code\u003e values (\u003ccode\u003evector1\u003c/code\u003e and \u003ccode\u003evector2\u003c/code\u003e) representing the vectors, which can be of numerical types or structured with identifiers, and the vectors must have the same type.\u003c/p\u003e\n"],["\u003cp\u003eThe function supports three distance types: \u003ccode\u003eEUCLIDEAN\u003c/code\u003e (default), \u003ccode\u003eMANHATTAN\u003c/code\u003e, and \u003ccode\u003eCOSINE\u003c/code\u003e, specified via the optional \u003ccode\u003etype\u003c/code\u003e argument.\u003c/p\u003e\n"],["\u003cp\u003eIf either input vector (\u003ccode\u003evector1\u003c/code\u003e or \u003ccode\u003evector2\u003c/code\u003e) is \u003ccode\u003eNULL\u003c/code\u003e, \u003ccode\u003eML.DISTANCE\u003c/code\u003e returns \u003ccode\u003eNULL\u003c/code\u003e.\u003c/p\u003e\n"],["\u003cp\u003eIf any vector contains duplicate identifiers, then an error will occur.\u003c/p\u003e\n"]]],[],null,["# The ML.DISTANCE function\n========================\n\nThis document describes the `ML.DISTANCE` scalar function, which lets you\ncompute the distance between two vectors.\n| **Note:** The [`VECTOR_SEARCH` function](/bigquery/docs/reference/standard-sql/search_functions#vector_search) is another vector function that calculates the distance between vectors. You should use the `VECTOR_SEARCH` function if you need to search a dataset for vectors similar to an input vector. You should use the `ML.DISTANCE` function if you need to compare two specific vectors to determine the distance between them.\n\nSyntax\n------\n\n```sql\nML.DISTANCE(vector1, vector2 [, type])\n```\n\n### Arguments\n\n`ML.DISTANCE` has the following arguments:\n\n- `vector1`: an `ARRAY` value that represents the first vector, in one of the\n following forms:\n\n - `ARRAY\u003cNumerical type\u003e`\n - `ARRAY\u003cSTRUCT\u003cSTRING, Numerical type\u003e\u003e`\n - `ARRAY\u003cSTRUCT\u003cINT64, Numerical type\u003e\u003e`\n\n where `Numerical type` is `BIGNUMERIC`, `FLOAT64`, `INT64` or `NUMERIC`.\n For example `ARRAY\u003cSTRUCT\u003cINT64, BIGNUMERIC\u003e\u003e`.\n\n When a vector is expressed as `ARRAY\u003cNumerical type\u003e`, each element\n of the array denotes one dimension of the vector. An example of a\n four-dimensional vector is `[0.0, 1.0, 1.0, 0.0]`.\n\n When a vector is expressed as `ARRAY\u003cSTRUCT\u003cSTRING, Numerical type\u003e\u003e` or\n `ARRAY\u003cSTRUCT\u003cINT64, Numerical type\u003e\u003e`, each `STRUCT` array item\n denotes one dimension of the vector. An example of a three-dimensional\n vector is `[(\"a\", 0.0), (\"b\", 1.0), (\"c\", 1.0)]`.\n\n The initial `INT64` or `STRING` value in the `STRUCT` is used as an\n identifier to match the `STRUCT` values in `vector2`. The ordering of data\n in the array doesn't matter; the values are matched by the identifier rather\n than by their position in the array. If either vector has any `STRUCT`\n values with duplicate identifiers, running this function returns an error.\n- `vector2`: an `ARRAY` value that represents the second vector.\n\n `vector2` must have the same type as `vector1`.\n\n For example, if `vector1`\n is an `ARRAY\u003cSTRUCT\u003cSTRING, FLOAT64\u003e\u003e` column with three elements, like\n `[(\"a\", 0.0), (\"b\", 1.0), (\"c\", 1.0)]`, then `vector2` must also be an\n `ARRAY\u003cSTRUCT\u003cSTRING, FLOAT64\u003e\u003e` column.\n\n When `vector1` and `vector2` are `ARRAY\u003cNumerical type\u003e` columns,\n they must have the same array length.\n- `type`: a `STRING` value that specifies the type of distance to calculate.\n Valid values are\n [`EUCLIDEAN`](https://xlinux.nist.gov/dads/HTML/euclidndstnc.html),\n [`MANHATTAN`](https://xlinux.nist.gov/dads/HTML/manhattanDistance.html), and\n [`COSINE`](https://en.wikipedia.org/wiki/Cosine_similarity#Cosine_Distance).\n If this argument isn't specified, the default value is `EUCLIDEAN`.\n\nOutput\n------\n\n`ML.DISTANCE` returns a `FLOAT64` value that represents the distance between\nthe vectors. Returns `NULL` if either `vector1` or `vector2` is `NULL`.\n\nExample\n-------\n\nGet the Euclidean distance for two tensors of `ARRAY\u003cFLOAT64\u003e` values:\n\n1. Create the table `t1`:\n\n ```sql\n CREATE TABLE mydataset.t1\n (\n v1 ARRAY\u003cFLOAT64\u003e,\n v2 ARRAY\u003cFLOAT64\u003e\n )\n ```\n2. Populate `t1`:\n\n ```sql\n INSERT mydataset.t1 (v1,v2)\n VALUES ([4.1,0.5,1.0], [3.0,0.0,2.5])\n ```\n3. Calculate the Euclidean norm for `v1` and `v2`:\n\n ```sql\n SELECT v1, v2, ML.DISTANCE(v1, v2, 'EUCLIDEAN') AS output FROM mydataset.t1\n ```\n\n This query produces the following output: \n\n +---------------+---------------+-------------------+\n | v1 | v2 | output |\n +---------------+---------------+-------------------|\n | [4.1,0.5,1.0] | [3.0,0.0,2.5] | 1.926136028425822 |\n +------------+------------------+-------------------+\n\nWhat's next\n-----------\n\n- For information about the supported SQL statements and functions for each model type, see [End-to-end user journey for each model](/bigquery/docs/e2e-journey)."]]