Noise injection

  • Noise injection protects user privacy by adding random noise to aggregated data in database queries, providing reasonably accurate results without traditional difference checks.

  • Using noise injection simplifies troubleshooting, requires no new query syntax, and provides clear result accuracy.

  • Noise injection does not rely on existing difference checks but outputs data based on approximate user thresholds (20+ for impressions, 10+ for clicks/conversions).

  • Ads Data Hub injects noise by clamping outlier contributions, aggregating clamped contributions, adding noise to each aggregate result, and eliminating rows with too few users.

  • You can choose between implicit clamping (automatic bounds) and explicit clamping (user-defined bounds) to limit outlier contributions, with explicit clamping generally resulting in less noise if reasonable bounds can be estimated.

Noise injection is a technique used to protect user privacy when querying a database. It works by adding random noise to an aggregating SELECT clause of a query. This noise protects user privacy while providing reasonably accurate results, eliminating the need for difference checks, and reducing the required aggregation threshold for output. Most existing queries can be executed in noise mode, with some limitations .

Learn the benefits of using noise injection

Difference checks do not apply:When running queries with noise injection, Ads Data Hub does not filter rows due to similarity to previous result sets. This means that you can still get a holistic view of the data while protecting user privacy.

Troubleshooting is simplified:Rows are only omitted due to aggregation requirements, making it simpler to troubleshoot and adapt queries.

There is no new syntax to learn:You don't need to learn any new query syntax or be versed in privacy concepts to use noise instead of difference checks.

Result accuracy is reported:A successful job shows the total percentage of data that could have been affected by noise.

Learn how noise impacts privacy requirements

Difference checks:Noise injection does not rely on existing difference checks in Ads Data Hub. When you use noise injection, difference checks are disabled.

Aggregation requirement:Noise injection outputs impression data represented by approximately 20 or more unique users, and click or conversion data represented by approximately 10 or more unique users.

Static checks:No impact.

Budgets and query limits:Queries executed using noise share the data access budget used with difference checks. As with difference checks, if you execute the same query on the same dataset many times, you might lose access to frequently queried dates in the dataset. This can happen if you run sliding window queries, or if you make the same request multiple times.

Noise mode imposes additional, stricter limits on recomputing the same aggregate results within or across queries. As with the data access budget, you can lose access to frequently queried dates in the dataset; but limitations due to recomputing the same aggregate results will only restrict queries in noise mode, not queries in difference check mode. For more information, see Repeated results .

Learn more about privacy checks .

Understand how noise injection affects results

Ads Data Hub injects noise to mitigate disclosure risk—the risk that someone can learn information about an individual user. It balances privacy against utility.

Noise injection in Ads Data Hub transforms the query results as follows:

  • It clamps outlier users' contributions in aggregate results. It sums each user's contribution in each aggregation and then caps each contribution with minimum and maximum clamping bounds.
  • It aggregates the clamped per-user contributions.
  • It adds noise to each aggregate result—the result of each aggregation function call in each row. The scale of this random noise is proportional to the clamped bounds.
  • It computes a noisy user count for each row and eliminates rows with too few users. This is similar to k-anonymity in difference check mode, but because of the noise, jobs running on the same dataset can drop different rows. Also, noise mode drops fewer rows because the aggregation requirement is lower (approximately 20 versus exactly 50).

The final result is a dataset where each row has noisy aggregate results and small groups have been eliminated. This masks an individual user's effect on returned results.

About aggregation clamping

Noise injection in Ads Data Hub uses implicit or explicit aggregation clamping to limit the contribution of outliers. You can choose which type of clamping to use, depending on your use case.

Implicit clamping

You don't need any special SQL syntax to use implicit clamping, it is applied by default. Implicit bounds are derived from the data itself and are determined for each aggregation. If some aggregations have a wider range of values than others, implicit bounding can infer different bounds for different aggregations as appropriate. This typically results in lower errors. Note that COUNT(DISTINCT user_id) automatically clamps the per user contribution to 1 .

Explicit clamping

Explicit clamping clamps the total contribution from each user to a specified range. Explicit bounds are uniformly applied to all aggregations and must be literal values . Explicit clamping may provide better results when bounds are generally known. For example, bounding ages between 0 and 100 reflects public information because the age of most people is generally within this range.

Ads Data Hub provides supplemental ADH.ANON aggregate functions for explicit clamping. To use explicit clamping, set the bounds for each supported aggregate function by adding integers representing the lower bound and the upper bound. For example:

  SELECT 
 campaign_name 
 , 
 -- Set lower and upper bounds to 0 and 1, respectively 
 ADH 
 . 
 ANON_COUNT 
 ( 
 * 
 , 
  
 contribution_bounds_per_group 
  
 = 
>  
 ( 
 0 
 , 
 1 
 )) 
 FROM 
  
 data 
 GROUP 
  
 BY 
  
 1 
 

Run a query using noise injection

  1. Open a report.
  2. Click the Privacy noise settingstoggle to the Use noiseposition.
  3. Run the query .
  4. Review the impact of the added noise.
  5. Optional: Adapt the query to reduce noise impact.

Review noise impact

Once a job completes successfully, Ads Data Hub displays the reliability of the result in the privacy summary. Reliability is based on the percentage of cells in the output that may be highly impacted by noise. A value in the result table is considered impacted if the scale of the added noise is greater than 5% of the result in the cell.

For impacted output data sets, the privacy summary lists the ten noisiest columns from highest to lowest impact and their corresponding contribution to noise. This is the breakdown of the noise impact labels.

% of impacted results Indicator color Impact
<5%
Green Low impact
5%-15%
Yellow Medium impact
15%-25%
Orange High impact
>25%
Red Very high impact

You can also preview the privacy summary for recent report jobs on the Homepage. To preview the privacy for a particular job, hold the pointer over the privacy tip icon privacy_tip in the job card under Recent activity.

Adapt queries

Aggregations are more likely to be impacted by noise when few users contribute to the result. This can happen when aggregations are computed from small user sets or when some users don't affect results, which can happen, for example, with the COUNTIF function. Based on the noise report, you may want to adjust your query to reduce the percentage of impacted results.

The following are general guidelines:

  • Expand the date range.
  • Rewrite the query to reduce the granularity of the data, such as by grouping by fewer parameters or replacing COUNTIF with COUNT .
  • Remove noisy columns.
  • Try explicit clamping when reasonable bounds can be chosen.

Supported aggregate functions

The following aggregate functions are supported with noise:

  • SUM(...)
  • COUNT(*)
  • COUNT(...)
  • COUNTIF(...)
  • COUNT(DISTINCT ...)
  • APPROX_COUNT_DISTINCT(...)
  • AVG(...)

The DISTINCT keyword is only supported with the COUNT function. When used with a direct reference to the user_id column from an Ads Data Hub table or an expression that returns either user_id or NULL , such as COUNT(DISTINCT IF(..., user_id, NULL)) , the COUNT DISTINCT and APPROX_COUNT_DISTINCT(...) functions are computed by clamping the per user contribution to 1 . When COUNT DISTINCT references a non- user_id column, it is approximated using APPROX_COUNT_DISTINCT with implicit clamping.

Supplemental aggregate functions

In addition to supporting regular aggregators, Ads Data Hub introduces supplemental ADH.ANON aggregate functions that support explicit clamping. These aggregators share the syntax with the BigQuery differentially private aggregate functions , however, they don't require the WITH DIFFERENTIAL_PRIVACY clause:

  • ADH.ANON_SUM( ..., [ contribution_bounds_per_group => (lower_bound, upper_bound) ] )

  • ADH.ANON_COUNT( *, [ contribution_bounds_per_group => (lower_bound, upper_bound) ] )

  • ADH.ANON_COUNT( ..., [ contribution_bounds_per_group => (lower_bound, upper_bound) ] )

  • ADH.ANON_AVG( ..., [ contribution_bounds_per_group => (lower_bound, upper_bound) ] )

  • ADH.ANON_PERCENTILE_CONT( ..., percentile, contribution_bounds_per_row => (lower_bound, upper_bound) )

  • ADH.ANON_COUNT_DISTINCT( ..., [ max_contributions_per_group => upper_bound ] )

ADH.ANON_SUM , ADH.ANON_COUNT and ADH.ANON_AVG parameters:

  • contribution_bounds_per_group : Contributions per user are clamped for each partition defined by the GROUP BY keys. The upper and lower bounds are applied to values per group after the values are aggregated per user.
  • lower_bound : Numeric literal that represents the smallest value to include in an aggregation.
  • upper_bound : Numeric literal that represents the largest value to include in an aggregation.

ADH.ANON_PERCENTILE_CONT parameters:

  • percentile : The percentile to compute, a literal in the range [0, 1] .
  • contribution_bounds_per_row : Contributions per user are clamped on a per-row (per-record) basis. Note that explicit clamping bounds are required for the percentile and therefore it is only supported as a supplemental function.
  • lower_bound : Numeric literal that represents the smallest value to include in an aggregation.
  • upper_bound : Numeric literal that represents the largest value to include in an aggregation.

ADH.ANON_COUNT_DISTINCT parameters:

  • max_contributions_per_group : Contributions per user are clamped for each partition defined by the GROUP BY keys. The upper bound limits the maximum user contribution per group after the values are aggregated per user.
  • upper_bound : Numeric literal that represents the largest value to include in an aggregation.

Compute MIN and MAX

The MIN and MAX functions are not directly supported in noise aggregations, but there are often alternative methods to compute these results.

If you have a MIN or MAX of values that can be used as grouping keys, such as event date, you can first GROUP BY that value, then compute MIN / MAX afterward. This returns the minimum or maximum value that passes aggregation thresholding.

Example:

  WITH 
  
 campaign_date_ranges 
  
 AS 
  
 ( 
  
 SELECT 
  
 campaign_id 
 , 
  
 MIN 
 ( 
 event_date 
 ) 
  
 AS 
  
 min_date 
 , 
  
 MAX 
 ( 
 event_date 
 ) 
  
 AS 
  
 max_date 
  
 FROM 
  
 ( 
  
 # 
  
 Aggregation 
  
 thresholding 
  
 will 
  
 be 
  
 applied 
  
 here 
  
 SELECT 
  
 DISTINCT 
  
 campaign_id 
 , 
  
 DATE 
 ( 
 query_id 
 . 
 time_usec 
 , 
  
 @ 
 time_zone 
 ) 
  
 AS 
  
 event_date 
  
 FROM 
  
 adh 
 . 
 google_ads_impressions 
  
 ) 
 ) 
 SELECT 
  
 campaign_id 
 , 
  
 num_impressions 
 , 
  
 min_date 
 , 
  
 max_date 
 FROM 
  
 ( 
  
 # 
  
 Noise 
  
 and 
  
 aggregation 
  
 thresholding 
  
 will 
  
 be 
  
 applied 
  
 here 
  
 SELECT 
  
 campaign_id 
 , 
  
 COUNT 
 ( 
 * 
 ) 
  
 AS 
  
 num_impressions 
  
 FROM 
  
 adh 
 . 
 google_ads_impressions 
 ) 
 JOIN 
  
 campaign_date_ranges 
  
 USING 
 ( 
 campaign_id 
 ) 
 

Alternatively, if you have a MIN or MAX of granular values with known bounds, you can use PERCENTILE_CONT with explicit bounds for an approximate result.

Example:

  SELECT 
  
 campaign_id 
 , 
  
 COUNT 
 ( 
 * 
 ) 
  
 AS 
  
 num_impressions 
 , 
  
 ADH 
 . 
 ANON_PERCENTILE_CONT 
 ( 
  
 query_id 
 . 
 time_usec 
 , 
  
 0 
 , 
  
 contribution_bounds_per_row 
  
 = 
>  
 ( 
 @ 
 min_timestamp 
 , 
  
 @ 
 max_timestamp 
 )) 
  
 AS 
  
 min_timestamp 
 , 
  
 ADH 
 . 
 ANON_PERCENTILE_CONT 
 ( 
  
 query_id 
 . 
 time_usec 
 , 
  
 1 
 , 
  
 contribution_bounds_per_row 
  
 = 
>  
 ( 
 @ 
 min_timestamp 
 , 
  
 @ 
 max_timestamp 
 )) 
  
 AS 
  
 max_timestamp 
 FROM 
  
 adh 
 . 
 google_ads_impressions 
 

About integer results

Although Ads Data Hub will automatically inject noise for these aggregate functions, the function signatures don't change. Because functions like COUNT or SUM of INT64 return INT64 , any decimal part of the noised result is rounded. This is usually negligible relative to the size of the result and noise.

If you need the granularity of the decimal in your result, then avoid writing functions that return INT64 –for example, by using SUM with its input cast to FLOAT64 .

About negative results

In principle, noise with very small values can result in negative numbers, even when this should be semantically impossible for the query. To maintain expected behavior, all forms of COUNT and COUNTIF are automatically clamped at zero, so they never give negative results. If you want this same behavior with another function, such as SUM , then you can clamp results manually using GREATEST(0, SUM(...)) .

This change is usually negligible, but it does introduce a small positive bias to overall results.

Public groups

With a GROUP BY clause, the anonymized results of a query are aggregated over groups. Aggregation thresholding is applied to make sure that a sufficient number of users are present in the group so that individual user data is protected. The process of determining which groups can be released is called "partition selection".

In many cases groups may be public knowledge. For example, grouping by browser version, day of the week, or a geographical region does not depend on user data if the grouping key values are known in advance. In this instance, the partition selection can be omitted, since the presence or absence of a group in the output does not provide any new information about the users.

Ads Data Hub identifies queries eligible for public groups and does not apply aggregation thresholding to these queries. This means that no output rows are filtered out. Note that results computed from a small number of users can be heavily impacted by noise.

To be eligible for public groups, the query must be structured to ensure that all grouping keys are known in advance. The grouping columns must satisfy the following conditions:

  • They come from a public table (a table or SELECT clause with no Ads Data Hub user data).
  • They have SELECT DISTINCT applied to enforce unique values.
  • They are joined into the query with an OUTER JOIN on all of the individual columns.

Examples of public groups queries:

  SELECT 
  
 age_group_id 
 , 
  
 COUNT 
 ( 
 * 
 ) 
  
 FROM 
  
 adh 
 . 
 google_ads_impressions 
 RIGHT 
  
 OUTER 
  
 JOIN 
  
 ( 
 SELECT 
  
 DISTINCT 
  
 age_group_id 
  
 FROM 
  
 adh 
 . 
 age_group 
 ) 
 ON 
  
 demographics 
 . 
 age_group 
  
 = 
  
 age_group_id 
 GROUP 
  
 BY 
  
 age_group_id 
 
  SELECT 
  
 age_group_id 
 , 
  
 COUNT 
 ( 
 * 
 ) 
  
 FROM 
  
 adh 
 . 
 google_ads_impressions 
 RIGHT 
  
 OUTER 
  
 JOIN 
  
 ( 
 SELECT 
  
 DISTINCT 
  
 * 
  
 FROM 
  
 UNNEST 
 ([ 
 1 
 , 
  
 2 
 , 
  
 3 
 ]) 
  
 AS 
  
 age_group_id 
 ) 
 ON 
  
 demographics 
 . 
 age_group 
  
 = 
  
 age_group_id 
 GROUP 
  
 BY 
  
 age_group_id 
 

In the first example, the protected adh.google_ads_impressions table is joined with the adh.age_group table that does not contain user data on the age_group_id column. The same public table age_group_id column appears in the GROUP BY clause.

Similarly, in the second example the protected adh.google_ads_impressions table is joined with the public table, which is provided explicitly as UNNEST([1, 2, 3]) . Notice that in both examples, the grouping key age_group_id comes from the public table.

Multiple grouping items can be provided as well, for example:

  SELECT 
  
 campaign_id 
 , 
  
 COUNT 
 ( 
 * 
 ) 
  
 FROM 
  
 adh 
 . 
 google_ads_impressions 
 RIGHT 
  
 OUTER 
  
 JOIN 
  
 ( 
 SELECT 
  
 DISTINCT 
  
 campaign_id 
 , 
  
 customer_id 
  
 FROM 
  
 adh 
 . 
 google_ads_campaign 
 ) 
 USING 
  
 ( 
 campaign_id 
 , 
  
 customer_id 
 ) 
 GROUP 
  
 BY 
  
 campaign_id 
 , 
  
 customer_id 
 
  SELECT 
  
 p 
 . 
 campaign_id 
 , 
  
 p 
 . 
 browser 
 , 
  
 COUNT 
 ( 
 * 
 ) 
  
 FROM 
  
 adh 
 . 
 google_ads_impressions 
  
 AS 
  
 i 
 RIGHT 
  
 OUTER 
  
 JOIN 
  
 ( 
  
 SELECT 
  
 DISTINCT 
  
 * 
  
 FROM 
  
 UNNEST 
 ([ 
 1 
 , 
  
 2 
 ]) 
  
 AS 
  
 campaign_id 
  
 CROSS 
  
 JOIN 
  
 UNNEST 
 ([ 
 'Chrome' 
 , 
  
 'Other' 
 ]) 
  
 AS 
  
 browser 
 ) 
  
 AS 
  
 p 
  
 ON 
  
 i 
 . 
 campaign_id 
  
 = 
  
 p 
 . 
 campaign_id 
  
 AND 
  
 i 
 . 
 browser 
  
 = 
  
 p 
 . 
 browser 
 GROUP 
  
 BY 
  
 campaign_id 
 , 
  
 browser 
 ; 
 

The absence of filtering in the public groups queries can be beneficial for recurrently run queries, since the output is always returned for the same fixed grouping keys' values. This can be particularly useful, for example, for building periodical dashboards.

A caveat: if a public table provides a very large number of grouping key values, then you may get many rows with little or no data, and these rows will all be reported as having high noise impact. In this case, you should consider explicitly providing a smaller list of keys with just the values you are interested in.


Supported query patterns

Important: Most of Ads Data Hub's standard best practices still apply to queries that use noise injection. In particular, we recommend that you review the guidance on repeatedly querying the same data .

This section describes query patterns that are supported when running queries using noise injection.

User-level aggregates

Unrestricted user-level aggregates are supported in the same way that they are in difference check mode. Noise is only injected in aggregations that combine data across multiple users. Aggregations that explicitly group by user_id , or analytic functions that partition by user_id , don't receive any noise and any function is allowed. User-level aggregations that don't explicitly group by user_id –for example, GROUP BY impression_id , are treated as cross-user aggregations, so noise is added.

Grouping by external_cookie is not enough. While external_cookie can be used to join *_match tables with customer-owned tables, any single-user aggregations should explicitly group by user_id column, not just the external_cookie column.

Aggregate function example:

  WITH 
  
 user_paths 
  
 AS 
  
 ( 
  
 # 
  
 Grouping 
  
 by 
  
 user_id 
 , 
  
 no 
  
 noise 
  
 needed 
 , 
  
 all 
  
 functions 
  
 allowed 
  
 SELECT 
  
 user_id 
 , 
  
 STRING_AGG 
 ( 
 campaign_id 
 , 
  
 ">" 
  
 ORDER 
  
 BY 
  
 query_id 
 . 
 time_usec 
 ) 
  
 AS 
  
 path 
  
 FROM 
  
 adh 
 . 
 google_ads_impressions 
  
 GROUP 
  
 BY 
  
 1 
 ) 
 # 
  
 Noise 
  
 applied 
  
 here 
  
 to 
  
 num_users 
 SELECT 
  
 path 
 , 
  
 COUNT 
 ( 
 * 
 ) 
  
 AS 
  
 num_users 
 FROM 
  
 user_paths 
 GROUP 
  
 BY 
  
 1 
 ; 
 

Analytic function example:

  WITH 
  
 events 
  
 AS 
  
 ( 
  
 # 
  
 Partitioning 
  
 by 
  
 user_id 
 , 
  
 no 
  
 noise 
  
 needed 
 , 
  
 all 
  
 functions 
  
 allowed 
  
 SELECT 
  
 campaign_id 
 , 
  
 ROW_NUMBER 
 () 
  
 OVER 
 ( 
 PARTITION 
  
 BY 
  
 user_id 
  
 ORDER 
  
 BY 
  
 query_id 
 . 
 time_usec 
 ) 
  
 AS 
  
 index 
  
 FROM 
  
 adh 
 . 
 google_ads_impressions 
 ) 
 # 
  
 Noise 
  
 applied 
  
 here 
  
 to 
  
 first_impressions 
 SELECT 
  
 campaign_id 
 , 
  
 COUNT 
 ( 
 * 
 ) 
  
 AS 
  
 first_impressions 
 FROM 
  
 events 
 WHERE 
  
 index 
  
 = 
  
 1 
 GROUP 
  
 BY 
  
 1 
 ; 
 

Parallel aggregates

Each cross-user aggregation receives noise independently. You can run multiple such aggregations in a single statement, combining results into one table using a JOIN or UNION .

Example:

  WITH 
  
 result_1 
  
 AS 
  
 ( 
  
 # 
  
 Noise 
  
 applied 
  
 here 
  
 to 
  
 num_impressions 
  
 SELECT 
  
 campaign_id 
 , 
  
 COUNT 
 ( 
 * 
 ) 
  
 AS 
  
 num_impressions 
  
 FROM 
  
 adh 
 . 
 google_ads_impressions 
  
 GROUP 
  
 BY 
  
 1 
 ), 
  
 result_2 
  
 AS 
  
 ( 
  
 # 
  
 Noise 
  
 applied 
  
 here 
  
 to 
  
 num_clicks 
  
 SELECT 
  
 campaign_id 
 , 
  
 COUNT 
 ( 
 * 
 ) 
  
 AS 
  
 num_clicks 
  
 FROM 
  
 adh 
 . 
 google_ads_creative_conversions 
  
 GROUP 
  
 BY 
  
 1 
 ) 
 SELECT 
  
 * 
  
 FROM 
  
 result_1 
  
 JOIN 
  
 result_2 
  
 USING 
 ( 
 campaign_id 
 ) 
 

Note that this would be supported but should be avoided in difference check mode. This practice is not a problem with noise, as each parallel aggregate is noised and filtered independently.

Aggregated data joined with unaggregated data

Since Ads Data Hub only supports analytic windows that partition by user_id , it is a common workaround to aggregate these results separately and self-join them before aggregating again. These queries are supported in noise mode, and often perform better than they would with in difference check mode due to privacy requirements being resolved earlier.

Example:

  WITH 
  
 campaign_totals 
  
 AS 
  
 ( 
  
 # 
  
 Noise 
  
 applied 
  
 here 
  
 to 
  
 campaign_imps 
  
 SELECT 
  
 campaign_id 
 , 
  
 COUNT 
 ( 
 * 
 ) 
  
 AS 
  
 campaign_imps 
  
 FROM 
  
 adh 
 . 
 google_ads_impressions 
  
 GROUP 
  
 BY 
  
 1 
 ) 
 # 
  
 Noise 
  
 applied 
  
 here 
  
 to 
  
 imps 
 SELECT 
  
 campaign_id 
 , 
  
 demographics 
 , 
  
 campaign_imps 
 , 
  
 COUNT 
 ( 
 * 
 ) 
  
 AS 
  
 imps 
 FROM 
  
 adh 
 . 
 google_ads_impressions 
  
 JOIN 
  
 campaign_totals 
  
 USING 
 ( 
 campaign_id 
 ) 
 GROUP 
  
 BY 
  
 1 
 , 
 2 
 , 
 3 
 

Noise mode discourages reaggregating aggregate results, such as AVG(campaign_imps) .


Unsupported query patterns

This section describes query patterns that are not supported when running queries using noise injection.

Today-inclusive queries

Noise mode queries don't support querying the current day's data. (This is discouraged in difference check mode.) The current date is not selectable for queries that use noise injection.

Repeated results

In noise mode, Ads Data Hub limits how often you can repeat the same aggregation. If you reach these limits, your noise mode queries will lose access to frequently queried dates in the dataset. The following are examples of how this can occur.

Query repetitionhappens when the same query is run multiple times with the same parameters or highly similar parameters, such as overlapping date ranges. You can avoid this by using data that is already exported to your BigQuery project.

Note that if two jobs are querying overlapping date ranges, they might produce repetitions if performing the same computation on the same users. For example, the following query, executed on overlapping date ranges, creates repetitions because it partitions by date:

  SELECT 
  
 DATE 
 ( 
 TIMESTAMP_MICROS 
 ( 
 event 
 . 
 event_time 
 )) 
  
 AS 
  
 date 
 , 
 COUNT 
 ( 
 * 
 ) 
  
 AS 
  
 cnt 
 FROM 
  
 adh 
 . 
 cm_dt_clicks 
 GROUP 
  
 BY 
  
 1 
 

In this case, you should run the query on disjoint date segments.

Another example of a repetition happens when data is somewhat date independent. The following query produces repetitions when executed on overlapping dates, where both jobs cover the entire lifetime of a campaign:

  SELECT 
  
 campaign_id 
 , 
  
 COUNT 
 ( 
 * 
 ) 
  
 AS 
  
 cnt 
 FROM 
  
 adh 
 . 
 google_ads_impressions 
 GROUP 
  
 BY 
  
 1 
 

In this case, you should run this query only once since the result doesn't change.

Aggregation repetitionhappens when the same aggregation is repeated multiple times within a query:

  SELECT 
  
 COUNT 
 ( 
 * 
 ) 
  
 AS 
  
 cnt1 
 , 
  
 COUNT 
 ( 
 * 
 ) 
  
 AS 
  
 cnt2 
 FROM 
  
 table 
 

In this case, you should remove one of the repetitions.

Note that even if the aggregations are syntactically different but compute the same value, it would count as a repetition. In other words, if the values of condition1 and condition2 are the same for all users with some value of key , the following query would have a repetition:

  SELECT 
  
 key 
 , 
  
 COUNTIF 
 ( 
 condition1 
 ) 
  
 AS 
  
 cnt1 
 , 
  
 COUNTIF 
 ( 
 condition2 
 ) 
  
 AS 
  
 cnt2 
 FROM 
  
 table 
 GROUP 
  
 BY 
  
 key 
 

If you have conditions that are very similar for some groups of users, you might consider rewriting the query to have only one COUNT .

Row duplicationhappens when an Ads Data Hub table is joined with a BigQuery table in a way that each row from the Ads Data Hub table matches multiple rows in the BigQuery table. For example, the following query produces a repetition if there are multiple rows with the same campaign ID in bq_table :

  SELECT 
  
 r 
 . 
 campaign_id 
 , 
  
 COUNT 
 ( 
 * 
 ) 
  
 AS 
  
 cnt 
 FROM 
  
 adh_table 
 INNER 
  
 JOIN 
  
 bq_table 
  
 ON 
  
 l 
 . 
 campaign_id 
  
 = 
  
 r 
 . 
 campaign_id 
 

In this case, you should restructure the query so that bq_table would have only one row per join key value ( campaign_id , in this case).

Note that unnesting an array from the Ads Data Hub table could produce the same effect if most users have the same arrays of values:

  SELECT 
  
 in_market_id 
 , 
  
 COUNT 
 ( 
 * 
 ) 
 FROM 
  
 adh 
 . 
 dv360_youtube_impressions 
 , 
 UNNEST 
 ( 
 in_market 
 ) 
  
 AS 
  
 in_market_id 
 GROUP 
  
 BY 
  
 1 
 

Learn about other query best practices .

About Lookback Windows

Some query patterns generate reports over a large timeframe, periodically regenerating to include new results. These queries may need adjustments to work in noise mode because if they recompute previous results, they will be blocked. Instead, each job should only generate new results, then new results can be combined with results from previous jobs for a full report.

For example, if you are creating a report of metrics by date, refreshed daily:

  SELECT 
  
 campaign_id 
 , 
  
 DATE 
 ( 
 TIMESTAMP_MICROS 
 ( 
 query_id 
 . 
 time_usec 
 ), 
  
 @ 
 time_zone 
 ) 
  
 AS 
  
 event_date 
 , 
  
 COUNT 
 ( 
 * 
 ) 
  
 AS 
  
 impressions 
 FROM 
  
 adh 
 . 
 google_ads_impressions 
 GROUP 
  
 BY 
  
 1 
 , 
 2 
 

You shouldn't run this with a large date range as this will recompute results of previous days. Instead, you should run each job only the latest day, which has new data, then combine with results from previous jobs.

If you do need to refresh a previous result (for example to account for late arriving data), then you should avoid recomputing any single result more than 1 or 2 times. Otherwise, you may get errors due to repeated query attempts.

Direct reaggregation

Noise is applied to the first layer of cross-user aggregation in the query. Queries with multiple layers of aggregation will combine noisy results, so final aggregates may have much higher noise. These queries receive a warning on validation:

  WITH 
  
 layer_1 
  
 AS 
  
 ( 
  
 # 
  
 Noise 
  
 applied 
  
 here 
  
 to 
  
 partial_result 
  
 SELECT 
  
 campaign_id 
 , 
  
 demographics 
 , 
  
 location 
 , 
  
 COUNT 
 ( 
 * 
 ) 
  
 AS 
  
 partial_result 
  
 FROM 
  
 adh 
 . 
 google_ads_impressions 
  
 GROUP 
  
 BY 
  
 1 
 , 
 2 
 , 
 3 
  
 HAVING 
  
 partial_result 
 > 
 5 
 ) 
 # 
  
 Reaggregation 
  
 of 
  
 partial_result 
  
 with 
  
 no 
  
 user 
 - 
 level 
  
 data 
 , 
  
 will 
  
 be 
  
 rejected 
 SELECT 
  
 campaign_id 
 , 
  
 SUM 
 ( 
 partial_result 
 ) 
  
 AS 
  
 final_result 
 FROM 
  
 layer_1 
 GROUP 
  
 BY 
  
 1 
 

To get the best results from noise, compute all cross-user operations within a single aggregation. For example, take a SUM of events rather than a SUM of intermediate counts.

If multi-layer aggregation is unavoidable, you can resolve the warning by exporting results directly from the first layer instead. To do this within a single job without changing script results, create a temp table (or a table exported to your BigQuery project) with the OPTIONS(privacy_checked_export=true) syntax. For example:

  CREATE 
  
 TEMP 
  
 TABLE 
  
 layer_1 
  
 OPTIONS 
 ( 
 privacy_checked_export 
 = 
 true 
 ) 
  
 AS 
  
 ( 
  
 # 
  
 Noise 
  
 applied 
  
 here 
  
 to 
  
 partial_result 
  
 SELECT 
  
 campaign_id 
 , 
  
 demographics 
 , 
  
 location 
 , 
  
 COUNT 
 ( 
 * 
 ) 
  
 AS 
  
 partial_result 
  
 FROM 
  
 adh 
 . 
 google_ads_impressions 
  
 GROUP 
  
 BY 
  
 1 
 , 
 2 
 , 
 3 
  
 HAVING 
  
 partial_result 
 > 
 5 
 ); 
 # 
  
 Reaggregation 
  
 of 
  
 privacy 
  
 checked 
  
 data 
 , 
  
 no 
  
 noise 
  
 needed 
 SELECT 
  
 campaign_id 
 , 
  
 SUM 
 ( 
 partial_result 
 ) 
  
 AS 
  
 final_result 
 FROM 
  
 layer_1 
 GROUP 
  
 BY 
  
 1 
 

Learn more about temp tables .

If the first layer of aggregation is too granular for privacy checks, consider rewriting the query with user-level aggregates . If this is not possible, then this query is not supported in noise mode.

Unjoined user IDs

Queries in noise mode must not combine data from separate users into a single row, except when performing an aggregation with noise. As a result, joins of unaggregated Ads Data Hub data should explicitly join on the user_id column.

This query does not explicitly join on the user_id column, which results in a validation warning:

  SELECT 
  
  
 FROM 
  
 adh 
 . 
 google_ads_impressions 
 JOIN 
  
 adh 
 . 
 google_ads_creative_conversions 
  
 USING 
 ( 
 impression_id 
 ) 
 

Joins like this may not behave as expected because only rows with the same user_id value will match. This can be fixed by adjusting the USING clause to explicitly include user_id – for example, USING(impression_id, user_id) .

Note that this limitation applies only to joins between Ads Data Hub tables (with the exception of dimension tables). It does not apply to customer-owned tables. For example, the following is allowed:

  SELECT 
  
  
 FROM 
  
 adh 
 . 
 google_ads_impressions 
 JOIN 
  
 bigquery_project 
 . 
 dataset 
 . 
 table 
  
 USING 
 ( 
 any_column 
 ) 
 

Ads Data Hub-BigQuery right joins

Outer joins with customer-owned data can lead to rows with missing user identifiers, which prevents noise from working well.

Both of these queries result in validation warnings because they allow for unmatched rows with missing user identifiers on the Ads Data Hub side:

  SELECT 
  
  
 FROM 
  
 adh 
 . 
 google_ads_impressions 
 RIGHT 
  
 JOIN 
  
 bigquery_project 
 . 
 dataset 
 . 
 table 
  
 USING 
 ( 
 column 
 ) 
 
  SELECT 
  
  
 FROM 
  
 bigquery_project 
 . 
 dataset 
 . 
 table 
 LEFT 
  
 JOIN 
  
 adh 
 . 
 google_ads_impressions 
  
 USING 
 ( 
 column 
 ) 
 

Note that either join would work if the order of the tables was reversed. There is also an exception for RDID tables that join directly on device_id_md5 . For example, the following query will work with no warnings:

  SELECT 
  
  
 FROM 
  
 bigquery_project 
 . 
 dataset 
 . 
 table 
 LEFT 
  
 JOIN 
  
 adh 
 . 
 google_ads_impressions_rdid 
  
 USING 
 ( 
 device_id_md5 
 ) 
 

Filtered Row Summary

The filtered row summary spec is not supported in noise mode. This feature is most often unnecessary with noise due to lower filtering rates and the lack of filtering from difference checks.

If you observe significant data filtering in a noise result, then increase the aggregated data. You may perform a parallel aggregation over the full dataset to compare an estimate of the total, for example:

  SELECT 
  
 campaign_name 
 , 
  
 COUNT 
 ( 
 * 
 ) 
 FROM 
  
 data 
 GROUP 
  
 BY 
  
 1 
 UNION 
  
 ALL 
 SELECT 
  
 'Total' 
 , 
  
 COUNT 
 ( 
 * 
 ) 
 FROM 
  
 data 
 GROUP 
  
 BY 
  
 1 
 

Note that the total count is independently noised and total values may not add up, but the total count is often more accurate than taking the sum of noised rows.

Cross-mode created tables

Unexported tables in Ads Data Hub can only be used with the same privacy mode where they were created. You can't create a table in normal aggregation mode and use it in noise mode, or the other way around (unless that table is exported to BigQuery first).

Design a Mobile Site
View Site in Mobile | Classic
Share by: