Aggregate data using a parallel loop

Separate queries to a public BigQuery dataset each return the number of words in a document, or set of documents. A shared variable allows the count of the words to accumulate and be read after all the iterations complete.

Explore further

For detailed documentation that includes this code sample, see the following:

Code sample

YAML

  # Use a parallel loop to make ten queries to a public BigQuery dataset and 
 # use a shared variable to accumulate a count of words; after all iterations 
 # complete, return the total number of words across all documents 
 main 
 : 
  
 params 
 : 
  
 [ 
 input 
 ] 
  
 steps 
 : 
  
 - 
  
 init 
 : 
  
 assign 
 : 
  
 - 
  
 numWords 
 : 
  
 0 
  
 - 
  
 corpuses 
 : 
  
 - 
  
 sonnets 
  
 - 
  
 various 
  
 - 
  
 1kinghenryvi 
  
 - 
  
 2kinghenryvi 
  
 - 
  
 3kinghenryvi 
  
 - 
  
 comedyoferrors 
  
 - 
  
 kingrichardiii 
  
 - 
  
 titusandronicus 
  
 - 
  
 tamingoftheshrew 
  
 - 
  
 loveslabourslost 
  
 - 
  
 runQueries 
 : 
  
 parallel 
 : 
  
 # 'numWords' is shared so it can be written within the parallel loop 
  
 shared 
 : 
  
 [ 
 numWords 
 ] 
  
 for 
 : 
  
 value 
 : 
  
 corpus 
  
 in 
 : 
  
 ${corpuses} 
  
 steps 
 : 
  
 - 
  
 runQuery 
 : 
  
 call 
 : 
  
 googleapis.bigquery.v2.jobs.query 
  
 args 
 : 
  
 projectId 
 : 
  
 ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")} 
  
 body 
 : 
  
 useLegacySql 
 : 
  
 false 
  
 query 
 : 
  
 ${"SELECT COUNT(DISTINCT word) FROM `bigquery-public-data.samples.shakespeare` " + " WHERE corpus='" + corpus + "' "} 
  
 result 
 : 
  
 query 
  
 - 
  
 add 
 : 
  
 assign 
 : 
  
 - 
  
 numWords 
 : 
  
 ${numWords + int(query.rows[0].f[0].v)} 
  
 # first result is the count 
  
 - 
  
 done 
 : 
  
 return 
 : 
  
 ${numWords} 
 

What's next

To search and filter code samples for other Google Cloud products, see the Google Cloud sample browser .

Create a Mobile Website
View Site in Mobile | Classic
Share by: