Query a public dataset with the BigQuery Python client library

Query a public dataset with the BigQuery Python client library

Learn how to:
  1. Activate Cloud Shell in a Google Cloud project.
  2. Open the Cloud Shell Editor.
  3. Prepare files for queries.
  4. Query a public dataset in BigQuery.
  5. Clean up.

Estimated time to complete:

Click Startto begin.

Activate Cloud Shell in a Google Cloud project

  1. If you do not enable billing for a project, you automatically work in the BigQuery sandbox. The BigQuery sandbox lets you learn BigQuery with a limited set of BigQuery features at no charge. If you do not plan to use your project beyond this document, we recommend that you use the BigQuery sandbox.

  2. Click Activate Cloud Shell. Show me

To learn how to open the Cloud Shell Editor, click Next.

Open the Cloud Shell Editor

  1. In Cloud Shell, create a new Python project and file:

     mkdir  
    bigquery-python-quickstart  
     \ 
     && 
    touch  
     \ 
      
    bigquery-python-quickstart/app.py 
    

    This command creates a Python project that's named bigquery-python-quickstart and a file that's named app.py .

  2. Open the Cloud Shell Editor:

     cloudshell  
    workspace  
    bigquery-python-quickstart 
    

To learn how to prepare files for queries, click Next.

Prepare files for queries

  1. To open a terminal in the Cloud Shell Editor, click Open Terminal.

  2. Open your project directory:

      cd 
      
    bigquery-python-quickstart 
    
  3. Install the BigQuery client library for Python:

     pip  
    install  
    --upgrade  
    google-cloud-bigquery 
    

    The output is similar to the following. Several lines are omitted to simplify the output.

     Installing collected packages: google-cloud-bigquery
    ...
    Successfully installed google-cloud-bigquery-3.9.0
    ... 
    

To learn how to query a public dataset in BigQuery, click Next.

Query a public dataset in BigQuery

  1. Click Open Editor.

  2. In the Explorerpane, locate your BIGQUERY-PYTHON-QUICKSTART project.

  3. Click the app.py file to open it.

  4. To create a query against the bigquery-public-data.stackoverflow dataset that returns the top 10 most viewed Stack Overflow pages and their view counts, copy the following code into the app.py file:

      from 
      
     google.cloud 
      
     import 
      bigquery 
     
     def 
      
     query_stackoverflow 
     () 
     - 
    > None 
     : 
     client 
     = 
      bigquery 
     
     . 
      Client 
     
     () 
     results 
     = 
     client 
     . 
      query_and_wait 
     
     ( 
      
     """ 
     SELECT 
     CONCAT( 
     'https://stackoverflow.com/questions/', 
     CAST(id as STRING)) as url, 
     view_count 
     FROM `bigquery-public-data.stackoverflow.posts_questions` 
     WHERE tags like '%google-bigquery%' 
     ORDER BY view_count DESC 
     LIMIT 10""" 
     ) 
     # Waits for job to complete. 
     for 
     row 
     in 
     results 
     : 
     print 
     ( 
     " 
     {} 
     : 
     {} 
     views" 
     . 
     format 
     ( 
     row 
     . 
     url 
     , 
     row 
     . 
     view_count 
     )) 
     if 
     __name__ 
     == 
     "__main__" 
     : 
     query_stackoverflow 
     () 
     
    
  • Click Open Terminal.

  • In the terminal, run the app.py script. If you are prompted to authorize Cloud Shell and agree to the terms, click Authorize.

     python  
    app.py 
    

    The result is similar to the following:

     https://stackoverflow.com/questions/35159967 : 170023 views
    https://stackoverflow.com/questions/22879669 : 142581 views
    https://stackoverflow.com/questions/10604135 : 132406 views
    https://stackoverflow.com/questions/44564887 : 128781 views
    https://stackoverflow.com/questions/27060396 : 127008 views
    https://stackoverflow.com/questions/12482637 : 120766 views
    https://stackoverflow.com/questions/20673986 : 115720 views
    https://stackoverflow.com/questions/39109817 : 108368 views
    https://stackoverflow.com/questions/11057219 : 105175 views
    https://stackoverflow.com/questions/43195143 : 101878 views 
    
  • You have successfully queried a public dataset with the BigQuery Python client library.

    To avoid incurring charges to your account and learn about next steps, click Next.

    Next steps

    Keep the resources that you created and do more with BigQuery, or clean up to avoid billing charges.

    Do more with BigQuery

    Clean up

    To avoid incurring charges to your Google Cloud account, either delete your Google Cloud project, or delete the resources that you created in this walkthrough.

    Delete the project

    If you created a new project to learn about BigQuery and you no longer need the project, delete it . Be aware that deleting a project deletes everything in the project and custom project IDs are lost.

    Delete the resources

    If you used an existing project, delete the bigquery-python-quickstart folder that you created:

    1. In Cloud Shell, move up a directory:

        cd 
        
      .. 
      
    2. Delete the resources that you created:

       rm  
      -R  
      bigquery-python-quickstart 
      

      The -R flag deletes all assets in a folder.

    Design a Mobile Site
    View Site in Mobile | Classic
    Share by: