Work with measures

This document shows you how to define and query measures on your graphs. You can use measures to ensure that aggregations are performed correctly across joins.

Overview

A measure is an aggregate property defined within the PROPERTIES clause of a node or edge table. Measures are defined using the MEASURE keyword and one of the following supported aggregate functions:

  • SUM
  • AVG
  • COUNT
  • COUNT(DISTINCT)
  • MIN
  • MAX

Measures define their aggregation in reference to the KEY of the node or edge table on which they're defined. This means that when you query a measure, the aggregation is performed correctly even if the underlying table is joined in a way that causes rows to be duplicated.

You can't reference a property defined by a measure in a GQL query. Instead, you access measures by calling the GRAPH_EXPAND TVF to create a flattened table representation of your graph. This function doesn't accept all types of graphs. For more information about which graphs form valid input, see the input limitations .

You can call the AGG function on output from the GRAPH_EXPAND TVF to aggregate the properties according to the aggregation function you defined in the measure.

Define measures

You define measures within the PROPERTIES clause of a node or edge table definition by using the MEASURE() keyword around a supported aggregate function.

The following example creates a dataset called university and tables for colleges, departments, and courses:

  CREATE 
  
 SCHEMA 
  
 IF 
  
 NOT 
  
 EXISTS 
  
 university 
 ; 
 CREATE 
  
 OR 
  
 REPLACE 
  
 TABLE 
  
 university 
 . 
 College 
  
 ( 
  
 college_id 
  
 INT64 
  
 PRIMARY 
  
 KEY 
  
 NOT 
  
 ENFORCED 
 , 
  
 college_name 
  
 STRING 
 ); 
 CREATE 
  
 OR 
  
 REPLACE 
  
 TABLE 
  
 university 
 . 
 Department 
  
 ( 
  
 dept_id 
  
 INT64 
  
 PRIMARY 
  
 KEY 
  
 NOT 
  
 ENFORCED 
 , 
  
 dept_name 
  
 STRING 
 , 
  
 college_id 
  
 INT64 
 , 
  
 budget 
  
 FLOAT64 
 , 
  
 FOREIGN 
  
 KEY 
  
 ( 
 college_id 
 ) 
  
 REFERENCES 
  
 university 
 . 
 College 
 ( 
 college_id 
 ) 
  
 NOT 
  
 ENFORCED 
 ); 
 CREATE 
  
 OR 
  
 REPLACE 
  
 TABLE 
  
 university 
 . 
 Course 
  
 ( 
  
 course_id 
  
 INT64 
  
 PRIMARY 
  
 KEY 
  
 NOT 
  
 ENFORCED 
 , 
  
 course_name 
  
 STRING 
 , 
  
 dept_id 
  
 INT64 
 , 
  
 credits 
  
 INT64 
 , 
  
 FOREIGN 
  
 KEY 
  
 ( 
 dept_id 
 ) 
  
 REFERENCES 
  
 university 
 . 
 Department 
 ( 
 dept_id 
 ) 
  
 NOT 
  
 ENFORCED 
 ); 
 INSERT 
  
 INTO 
  
 university 
 . 
 College 
  
 ( 
 college_id 
 , 
  
 college_name 
 ) 
 VALUES 
  
 ( 
 101 
 , 
  
 'College of Engineering' 
 ), 
  
 ( 
 102 
 , 
  
 'College of Arts' 
 ); 
 INSERT 
  
 INTO 
  
 university 
 . 
 Department 
  
 ( 
 dept_id 
 , 
  
 dept_name 
 , 
  
 college_id 
 , 
  
 budget 
 ) 
 VALUES 
  
 ( 
 1001 
 , 
  
 'Computer Science' 
 , 
  
 101 
 , 
  
 500000 
 ), 
  
 ( 
 1002 
 , 
  
 'Mechanical Engineering' 
 , 
  
 101 
 , 
  
 400000 
 ), 
  
 ( 
 1003 
 , 
  
 'Fine Arts' 
 , 
  
 102 
 , 
  
 200000 
 ), 
  
 ( 
 1004 
 , 
  
 'Research' 
 , 
  
 101 
 , 
  
 50000 
 ); 
 INSERT 
  
 INTO 
  
 university 
 . 
 Course 
  
 ( 
 course_id 
 , 
  
 course_name 
 , 
  
 dept_id 
 , 
  
 credits 
 ) 
 VALUES 
  
 ( 
 1 
 , 
  
 'Intro to CS' 
 , 
  
 1001 
 , 
  
 3 
 ), 
  
 ( 
 2 
 , 
  
 'Data Structures' 
 , 
  
 1001 
 , 
  
 4 
 ), 
  
 ( 
 3 
 , 
  
 'Thermodynamics' 
 , 
  
 1002 
 , 
  
 3 
 ), 
  
 ( 
 4 
 , 
  
 'Oil Painting' 
 , 
  
 1003 
 , 
  
 2 
 ); 
 

The following statement creates a graph called SchoolGraph that defines measures on some of the properties in the Department and Course nodes. You must provide an alias for properties defined by a measure.

  CREATE 
  
 OR 
  
 REPLACE 
  
 PROPERTY 
  
 GRAPH 
  
 university 
 . 
 SchoolGraph 
  
 NODE 
  
 TABLES 
  
 ( 
  
 university 
 . 
 College 
  
 KEY 
 ( 
 college_id 
 ) 
  
 PROPERTIES 
 ( 
 college_id 
 , 
  
 college_name 
 ), 
  
 university 
 . 
 Department 
  
 KEY 
 ( 
 dept_id 
 ) 
  
 PROPERTIES 
 ( 
 dept_id 
 , 
  
 dept_name 
 , 
  
 college_id 
 , 
  
 budget 
  
 OPTIONS 
 ( 
 description 
 = 
 "Department budget in USD" 
 ), 
  
 MEASURE 
 ( 
 SUM 
 ( 
 budget 
 )) 
  
 AS 
  
 total_budget 
 ), 
  
 university 
 . 
 Course 
  
 KEY 
 ( 
 course_id 
 ) 
  
 PROPERTIES 
 ( 
  
 course_id 
 , 
  
 course_name 
 , 
  
 credits 
 , 
  
 dept_id 
 , 
  
 MEASURE 
 ( 
 AVG 
 ( 
 credits 
 )) 
  
 AS 
  
 avg_credits 
 , 
  
 MEASURE 
 ( 
 SUM 
 ( 
 credits 
 )) 
  
 AS 
  
 total_credits 
 , 
  
 MEASURE 
 ( 
 COUNT 
 ( 
 course_id 
 )) 
  
 AS 
  
 course_count 
 ) 
  
 ) 
  
 EDGE 
  
 TABLES 
  
 ( 
  
 university 
 . 
 Department 
  
 AS 
  
 CollegeDept 
  
 SOURCE 
  
 KEY 
  
 ( 
 college_id 
 ) 
  
 REFERENCES 
  
 College 
  
 ( 
 college_id 
 ) 
  
 DESTINATION 
  
 KEY 
  
 ( 
 dept_id 
 ) 
  
 REFERENCES 
  
 Department 
  
 ( 
 dept_id 
 ), 
  
 university 
 . 
 Course 
  
 AS 
  
 DeptCourse 
  
 SOURCE 
  
 KEY 
  
 ( 
 dept_id 
 ) 
  
 REFERENCES 
  
 Department 
  
 ( 
 dept_id 
 ) 
  
 DESTINATION 
  
 KEY 
  
 ( 
 course_id 
 ) 
  
 REFERENCES 
  
 Course 
  
 ( 
 course_id 
 ) 
  
 ); 
 

A visualization of the SchoolGraph graph.

The total_budget measure is defined as MEASURE(SUM(budget)) . The measure locks the aggregation to the KEY , which is dept_id .

Understanding overcounting

When you join tables, data is repeated every time there is a one-to-many relationship in your data. For example, if you join the Course , Department , and College tables, a department with multiple courses appears in multiple rows in the output:

  SELECT 
  
 college_name 
 , 
  
 dept_name 
 , 
  
 course_name 
 , 
  
 budget 
 FROM 
  
 university 
 . 
 Course 
 LEFT 
  
 JOIN 
  
 university 
 . 
 Department 
  
 ON 
  
 Course 
 . 
 dept_id 
  
 = 
  
 Department 
 . 
 dept_id 
 LEFT 
  
 JOIN 
  
 university 
 . 
 College 
  
 ON 
  
 Department 
 . 
 college_id 
  
 = 
  
 College 
 . 
 college_id 
 ; 
 /*------------------------+------------------------+-----------------+----------+ 
 | college_name           | dept_name              | course_name     | budget   | 
 +------------------------+------------------------+-----------------+----------+ 
 | College of Engineering | Computer Science       | Intro to CS     | 500000.0 | 
 | College of Engineering | Computer Science       | Data Structures | 500000.0 | 
 | College of Engineering | Mechanical Engineering | Thermodynamics  | 400000.0 | 
 | College of Arts        | Fine Arts              | Oil Painting    | 200000.0 | 
 +------------------------+------------------------+-----------------+----------*/ 
 

If you try to compute the total budget per college using SUM(budget) , the Computer Science department's budget is counted twice. You can avoid this problem by querying the Department table directly, but this approach doesn't work if you want to compute multiple aggregations from different tables that contribute to your joined data. The following section shows how measures solve this problem.

Query a graph with measures

You can query a graph with measures using GQL, but your query can't directly reference any property defined by a measure. For example, the following query references nodes that have measure properties defined, but doesn't use or return any of the measure fields:

  GRAPH 
  
 university 
 . 
 SchoolGraph 
 MATCH 
  
 ( 
 c 
 : 
 College 
 ) 
 -[]- 
 ( 
 d 
 : 
 Department 
 ) 
 -[]- 
> ( 
 course 
 : 
 Course 
 ) 
 RETURN 
  
 c 
 . 
 college_name 
 , 
  
 d 
 . 
 dept_name 
 , 
  
 course 
 . 
 course_name 
 ; 
 /*------------------------+------------------------+-----------------+ 
 | college_name           | dept_name              | course_name     | 
 +------------------------+------------------------+-----------------+ 
 | College of Engineering | Computer Science       | Intro to CS     | 
 | College of Engineering | Computer Science       | Data Structures | 
 | College of Engineering | Mechanical Engineering | Thermodynamics  | 
 | College of Arts        | Fine Arts              | Oil Painting    | 
 +------------------------+------------------------+-----------------*/ 
 

Work with measures

To work with measures, you use the GRAPH_EXPAND table-valued function (TVF) to query your graph as a single flattened table. Columns in the output table are derived from the properties defined in the graph for each node and edge table. To prevent naming conflicts, columns are named by concatenating the node or edge table label and the property name—for example, Course_course_name or Department_total_budget . The following query shows some sample output from the GRAPH_EXPAND TVF:

  SELECT 
  
 College_college_name 
 , 
  
 Department_dept_name 
 , 
  
 Department_budget 
 , 
  
 Course_course_name 
 FROM 
  
 GRAPH_EXPAND 
 ( 
 "university.SchoolGraph" 
 ); 
 /*------------------------+------------------------+-------------------+--------------------+ 
 | College_college_name   | Department_dept_name   | Department_budget | Course_course_name | 
 +------------------------+------------------------+-------------------+--------------------+ 
 | College of Engineering | Computer Science       | 500000.0          | Intro to CS        | 
 | College of Engineering | Computer Science       | 500000.0          | Data Structures    | 
 | College of Engineering | Mechanical Engineering | 400000.0          | Thermodynamics     | 
 | College of Arts        | Fine Arts              | 200000.0          | Oil Painting       | 
 +------------------------+------------------------+-------------------+--------------------+ 
 

The GRAPH_EXPAND function produces the flattened table by applying a series of LEFT JOIN operations to node and edge tables. A valid input graph must have exactly one root node table , which is a table whose KEY value doesn't appear in any other table. Data that isn't reachable from the root node table through a series of joins doesn't appear in the output. In the previous example, the Course table is the root node table. The Research department is omitted from the output because it doesn't have any courses.

You can't directly select a column for a property defined by a measure. Instead, you must wrap them in the AGG() function . This function ensures that the aggregation defined on the measures is performed exactly once per key.

The following query simultaneously calculates the total budget and total number of courses for each college:

  SELECT 
  
 College_college_name 
 , 
  
 AGG 
 ( 
 Department_total_budget 
 ) 
  
 AS 
  
 college_budget 
 , 
  
 AGG 
 ( 
 Course_course_count 
 ) 
  
 AS 
  
 total_courses 
 FROM 
  
 GRAPH_EXPAND 
 ( 
 "university.SchoolGraph" 
 ) 
 GROUP 
  
 BY 
  
 College_college_name 
 ; 
 /*------------------------+----------------+---------------+ 
 | College_college_name   | college_budget | total_courses | 
 +------------------------+----------------+---------------+ 
 | College of Engineering | 900000.0       | 3             | 
 | College of Arts        | 200000.0       | 1             | 
 +------------------------+----------------+---------------*/ 
 

View GRAPH_EXPAND schema

To view the schema of the table returned by the GRAPH_EXPAND function without calling the function, use the BQ.SHOW_GRAPH_EXPAND_SCHEMA system procedure :

  DECLARE 
  
 schema 
  
 STRING 
  
 DEFAULT 
  
 '' 
 ; 
 CALL 
  
 BQ 
 . 
 SHOW_GRAPH_EXPAND_SCHEMA 
 ( 
 'university.SchoolGraph' 
 , 
  
 schema 
 ); 
 SELECT 
  
 schema 
 ; 
 

This procedure populates the schema variable with the name, type, and mode of each column. It also indicates whether the property referenced by the column is a measure, and lists any description or synonyms that you defined on it.

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: