Distinct (Transformation Stage)

Description

Find out the distinct values of a field or an expression from the previous stages.

Syntax

distinct stage has similar syntax as select . It takes one or more selectable expressions to select and find distinct values on. Strings can be used when the expression is just a field reference:

Node.js

  const 
  
 cities 
  
 = 
  
 await 
  
 db 
 . 
 pipeline 
 () 
  
 . 
 collection 
 ( 
 '/cities' 
 ) 
  
 . 
 distinct 
 ( 
 "country" 
 ) 
  
 . 
 execute 
 (); 
 const 
  
 cities 
  
 = 
  
 await 
  
 db 
 . 
 pipeline 
 () 
  
 . 
 collection 
 ( 
 '/cities' 
 ) 
  
 . 
 distinct 
 ( 
  
 field 
 ( 
 "state" 
 ). 
 toLower 
 (). 
 as 
 ( 
 "normalized_state" 
 ), 
  
 field 
 ( 
 "country" 
 )) 
  
 . 
 execute 
 (); 
 

Client examples

Node.js
 let 
  
 cities 
  
 = 
  
 await 
  
 db 
 . 
 pipeline 
 () 
  
 . 
 collection 
 ( 
 "cities" 
 ) 
  
 . 
 distinct 
 ( 
 "country" 
 ) 
  
 . 
 execute 
 (); 
 cities 
  
 = 
  
 await 
  
 db 
 . 
 pipeline 
 () 
  
 . 
 collection 
 ( 
 "cities" 
 ) 
  
 . 
 distinct 
 ( 
  
 field 
 ( 
 "state" 
 ). 
 toLower 
 (). 
 as 
 ( 
 "normalizedState" 
 ), 
  
 field 
 ( 
 "country" 
 )) 
  
 . 
 execute 
 (); 
  

Web

 let 
  
 cities 
  
 = 
  
 await 
  
 execute 
 ( 
 db 
 . 
 pipeline 
 () 
  
 . 
 collection 
 ( 
 "cities" 
 ) 
  
 . 
 distinct 
 ( 
 "country" 
 )); 
 cities 
  
 = 
  
 await 
  
 execute 
 ( 
 db 
 . 
 pipeline 
 () 
  
 . 
 collection 
 ( 
 "cities" 
 ) 
  
 . 
 distinct 
 ( 
  
 field 
 ( 
 "state" 
 ). 
 toLower 
 (). 
 as 
 ( 
 "normalizedState" 
 ), 
  
 field 
 ( 
 "country" 
 ))); 
  
Swift
 let 
  
 results 
  
 = 
  
 try 
  
 await 
  
 db 
 . 
 pipeline 
 () 
  
 . 
 collection 
 ( 
 "books" 
 ) 
  
 . 
 distinct 
 ([ 
  
 Field 
 ( 
 "author" 
 ). 
 toUpper 
 (). 
 as 
 ( 
 "author" 
 ), 
  
 Field 
 ( 
 "genre" 
 ) 
  
 ]) 
  
 . 
 execute 
 () 
  
Kotlin
Android
 var 
  
 cities 
  
 = 
  
 db 
 . 
 pipeline 
 () 
  
 . 
 collection 
 ( 
 "cities" 
 ) 
  
 . 
 distinct 
 ( 
 "country" 
 ) 
  
 . 
 execute 
 () 
 cities 
  
 = 
  
 db 
 . 
 pipeline 
 () 
  
 . 
 collection 
 ( 
 "cities" 
 ) 
  
 . 
 distinct 
 ( 
  
 field 
 ( 
 "state" 
 ). 
 toLower 
 (). 
 alias 
 ( 
 "normalizedState" 
 ), 
  
 field 
 ( 
 "country" 
 ) 
  
 ) 
  
 . 
 execute 
 () 
  
Java
Android
 Task<Pipeline 
 . 
 Snapshot 
>  
 cities 
 ; 
 cities 
  
 = 
  
 db 
 . 
 pipeline 
 () 
  
 . 
 collection 
 ( 
 "cities" 
 ) 
  
 . 
 distinct 
 ( 
 "country" 
 ) 
  
 . 
 execute 
 (); 
 cities 
  
 = 
  
 db 
 . 
 pipeline 
 () 
  
 . 
 collection 
 ( 
 "cities" 
 ) 
  
 . 
 distinct 
 ( 
  
 field 
 ( 
 "state" 
 ). 
 toLower 
 (). 
 alias 
 ( 
 "normalizedState" 
 ), 
  
 field 
 ( 
 "country" 
 )) 
  
 . 
 execute 
 (); 
  
Python
 from 
  
 google.cloud.firestore_v1.pipeline_expressions 
  
 import 
 Field 
 cities 
 = 
 client 
 . 
 pipeline 
 () 
 . 
 collection 
 ( 
 "cities" 
 ) 
 . 
 distinct 
 ( 
 "country" 
 ) 
 . 
 execute 
 () 
 cities 
 = 
 ( 
 client 
 . 
 pipeline 
 () 
 . 
 collection 
 ( 
 "cities" 
 ) 
 . 
 distinct 
 ( 
 Field 
 . 
 of 
 ( 
 "state" 
 ) 
 . 
 to_lower 
 () 
 . 
 as_ 
 ( 
 "normalizedState" 
 ), 
 "country" 
 ) 
 . 
 execute 
 () 
 ) 
  
Java
 Pipeline 
 . 
 Snapshot 
  
 cities1 
  
 = 
  
 firestore 
 . 
 pipeline 
 (). 
 collection 
 ( 
 "cities" 
 ). 
 distinct 
 ( 
 "country" 
 ). 
 execute 
 (). 
 get 
 (); 
 Pipeline 
 . 
 Snapshot 
  
 cities2 
  
 = 
  
 firestore 
  
 . 
 pipeline 
 () 
  
 . 
 collection 
 ( 
 "cities" 
 ) 
  
 . 
 distinct 
 ( 
 toLower 
 ( 
 field 
 ( 
 "state" 
 )). 
 as 
 ( 
 "normalizedState" 
 ), 
  
 field 
 ( 
 "country" 
 )) 
  
 . 
 execute 
 () 
  
 . 
 get 
 (); 
  

Behavior

In terms of projection behaviors, distinct is similar to select with deduplication, therefore any selectable expression available to select can also be used for distinct .

The distinct stage works similarly to an aggregate stage without groups.

See also Aggregate Stage and Select Stage .

Find Distinct Field Values

For example, to get a list of every country in the following cities collection:

Node.js

  await 
  
 db 
 . 
 collection 
 ( 
 'cities' 
 ). 
 doc 
 ( 
 'SF' 
 ). 
 set 
 ({ 
 name 
 : 
  
 'San Francisco' 
 , 
  
 state 
 : 
  
 'CA' 
 , 
  
 country 
 : 
  
 'USA' 
 }); 
 await 
  
 db 
 . 
 collection 
 ( 
 'cities' 
 ). 
 doc 
 ( 
 'LA' 
 ). 
 set 
 ({ 
 name 
 : 
  
 'Los Angeles' 
 , 
  
 state 
 : 
  
 'CA' 
 , 
  
 country 
 : 
  
 'USA' 
 }); 
 await 
  
 db 
 . 
 collection 
 ( 
 'cities' 
 ). 
 doc 
 ( 
 'NY' 
 ). 
 set 
 ({ 
 name 
 : 
  
 'New York' 
 , 
  
 state 
 : 
  
 'NY' 
 , 
  
 country 
 : 
  
 'USA' 
 }); 
 await 
  
 db 
 . 
 collection 
 ( 
 'cities' 
 ). 
 doc 
 ( 
 'TOR' 
 ). 
 set 
 ({ 
 name 
 : 
  
 'Toronto' 
 , 
  
 state 
 : 
  
 null 
 , 
  
 country 
 : 
  
 'Canada' 
 }); 
 await 
  
 db 
 . 
 collection 
 ( 
 'cities' 
 ). 
 doc 
 ( 
 'MEX' 
 ). 
 set 
 ({ 
 name 
 : 
  
 'Mexico City' 
 , 
  
 state 
 : 
  
 null 
 , 
  
 country 
 : 
  
 'Mexico' 
 }); 
 

Distinct countries can be found using:

Node.js

  const 
  
 cities 
  
 = 
  
 await 
  
 db 
 . 
 pipeline 
 () 
  
 . 
 collection 
 ( 
 '/cities' 
 ) 
  
 . 
 distinct 
 ( 
 "country" 
 ) 
  
 . 
 execute 
 (); 
 

which generates the following result:

 {country: "USA"}
{country: "Canada"}
{country: "Mexico"} 

Distinct Output of Expressions

You can also find the distinct combinations of multiple fields, or more complicated expressions. For example:

Node.js

  const 
  
 cities 
  
 = 
  
 await 
  
 db 
 . 
 pipeline 
 () 
  
 . 
 collection 
 ( 
 '/cities' 
 ) 
  
 . 
 distinct 
 ( 
  
 field 
 ( 
 "state" 
 ). 
 toLower 
 (). 
 as 
 ( 
 "normalized_state" 
 ), 
  
 field 
 ( 
 "country" 
 )) 
  
 . 
 execute 
 (); 
 

to get:

 {country: "USA", normalized_state: "ca"}
{country: "USA", normalized_state: "ny"}
{country: "Canada", normalized_state: null}
{country: "Mexico", normalized_state: null} 

Equivalence Behaviors

The equivalence behavior on distinct values follows the same semantics as equalities.

This means that equivalent values, for example mathmatically equivalent numeric values, regardless of original types (32-bit integer, 64-bit integer, floating point numbers, decimal numbers, etc), are considered the same distinct value.

As an example, in a collection numerics with different documents containing foo values of 32-bit integer 1 , 64-bit integer 1L and floating point 1.0 respectively, distinct will only return 1 result.

In such cases of having different equivalent values present in the dataset, the output value of the group can be anyof these equivalent values. In this example, this value of foo could be returned as 1 , 1L , or 1.0 .

Even if it appears to be deterministic, you should notattempt to rely on the behavior of one specific value getting selected.

Memory Usage

How the distinct stage is executed depends on the available indexes. When there is not an appropriate index chosen by the query optimizer, distinct requires buffering all distinct values in the memory.

In the event of having a very large number of distinct values, or values being very large (e.g. distinct on huge values), this stage may run out of memory.

In such cases, you should apply filters to limit the dataset to perform distinct on, or create indexes as recommended to avoid large memory usages.

Query Explain will provide information on the actual query execution plan and profiling data to help with the debugging.

Create a Mobile Website
View Site in Mobile | Classic
Share by: