Use the Population Dynamics Insights embeddings

Prepare Ground Truth Data

To use Population Dynamics embeddings, your ground truth data must be aggregated to a supported geographic boundary. Because administrative boundary types vary globally, you can align your data using either universal mathematical grid systems (like S2 cells) or local administrative regions (such as counties or districts, depending on the specific country dataset).

Option 1: Incorporate Embeddings into an Existing Model

  • Prepare existing model-based ground truth: Use the embeddings as geospatial covariates to enhance an existing model.
  • Train an error correction model: Improve an existing model by integrating the embeddings into a model that takes the original model output, the expected value or ground truth, and the embeddings to learn a new error correction model.

Option 2: Tune for Specific Use Cases

  • Choose a prediction model: Any model, such as GBDT, MLP, or linear, can be used for predictions.
  • Use embeddings for prediction: Use Population Dynamics embeddings as input features, alongside other contextual data, to improve prediction accuracy.

Query examples

Replace your-project.your_dataset.embeddings_table with your actual project, dataset, and target table name.

SQL: Fetch Embeddings

This query retrieves the embedding vector and administrative metadata for the S2 cells in your provisioned dataset.

 SELECT 
  
 geo_id 
 , 
  
 administrative_area_level_1_name 
  
 AS 
  
 state 
 , 
  
 administrative_area_level_2_name 
  
 AS 
  
 county 
 , 
  
 features 
  
 -- The 330-dim vector 
 FROM 
  
 ` your-project.your_dataset.embeddings_table 
` 
 LIMIT 
  
 10 
 ; 

SQL: Find Similar Locations

This query identifies behaviorally similar locations without requiring external data.

It uses the ML.DISTANCE function to calculate cosine similarity, returning the top matches for a target S2 cell. This approach supports expansion planning scenarios, such as determining where to open a new store based on the profile of a successful existing location.

 WITH 
  
 TargetLocation 
  
 AS 
  
 ( 
  
 SELECT 
  
 features 
  
 AS 
  
 target_vector 
  
 FROM 
  
 ` your-project.your_dataset.embeddings_table 
` 
  
 -- Replace with your target S2 hex token (e.g., '80ead45') 
  
 WHERE 
  
 geo_id 
  
 = 
  
 ' YOUR_TARGET_S2_TOKEN 
' 
 ) 
 SELECT 
  
 t 
 . 
 geo_id 
 , 
  
 t 
 . 
 administrative_area_level_1_name 
  
 AS 
  
 state 
 , 
  
 t 
 . 
 administrative_area_level_2_name 
  
 AS 
  
 county 
 , 
  
 -- Calculate Similarity (1.0 is identical, 0.0 is dissimilar) 
  
 ( 
 1 
  
 - 
  
 ML 
 . 
 DISTANCE 
 ( 
 t 
 . 
 features 
 , 
  
 p 
 . 
 target_vector 
 , 
  
 'COSINE' 
 )) 
  
 AS 
  
 similarity_score 
 FROM 
  
 ` your-project.your_dataset.embeddings_table 
` 
  
 t 
 , 
  
 TargetLocation 
  
 p 
 WHERE 
  
 t 
 . 
 geo_id 
  
 != 
  
 ` YOUR_TARGET_S2_TOKEN 
` 
  
 -- Exclude the target itself 
 ORDER 
  
 BY 
  
 similarity_score 
  
 DESC 
 LIMIT 
  
 20 
 ; 

SQL: Join Customer Data

This example demonstrates how to enrich your own internal data (for instance, a store performance table) with behavioral embeddings. Ensure your internal data includes matching S2 cell tokens (hex strings).

 SELECT 
  
 store 
 . 
 store_id 
 , 
  
 store 
 . 
 s2_token 
 , 
  
 store 
 . 
 total_revenue 
 , 
  
 embeddings 
 . 
 features 
  
 AS 
  
 pdfm_vector 
 FROM 
  
 ` your-project.internal_data.store_performance 
` 
  
 AS 
  
 store 
 JOIN 
  
 ` your-project.your_dataset.embeddings_table 
` 
  
 AS 
  
 embeddings 
 ON 
  
 -- Join based on the S2 hex token string 
  
 store 
 . 
 s2_token 
  
 = 
  
 embeddings 
 . 
 geo_id 

Python: Load Data for Machine Learning

The embeddings are stored as BigQuery Arrays. To use them in ML libraries, you must convert the column into a NumPy matrix.

 from 
  
 google.cloud 
  
 import 
 bigquery 
 import 
  
 numpy 
  
 as 
  
 np 
 import 
  
 pandas 
  
 as 
  
 pd 
 client 
 = 
 bigquery 
 . 
 Client 
 () 
 query 
 = 
 """ 
 SELECT 
 geo_id, 
 features -- Returns as a list of floats 
 FROM 
 ` your-project.your_dataset.embeddings_table 
` 
 LIMIT 1000 
 """ 
 # 1. Load data into DataFrame 
 df 
 = 
 client 
 . 
 query 
 ( 
 query 
 ) 
 . 
 to_dataframe 
 () 
 # 2. Convert the 'features' column (Series of Lists) into a Matrix (2D Array) 
 X_matrix 
 = 
 np 
 . 
 stack 
 ( 
 df 
 [ 
 'features' 
 ] 
 . 
 values 
 ) 
 print 
 ( 
 f 
 "Data Loaded. Matrix Shape: 
 { 
 X_matrix 
 . 
 shape 
 } 
 " 
 ) 
 # Output: Data Loaded. Matrix Shape: (1000, 330) 
Design a Mobile Site
View Site in Mobile | Classic
Share by: