About data insights for structured data

The Knowledge Catalog (formerly Dataplex Universal Catalog) data insights automatically generates descriptions, relationship graphs, and SQL queries from your table and dataset metadata. This information helps you quickly understand data structure, content, and relationships without extensive manual setup. To investigate further, you can ask follow-up questions in data canvas .

When exploring a new, unfamiliar table, data analysts often face the cold-start problem of how to get started writing queries. The problem can involve uncertainties about the data structure and key patterns in the data. The Knowledge Catalog data insights feature offers an automated way to explore and understand your data. This helps you uncover patterns, assess data quality, and perform statistical analysis.

How data insights work?

Data insights use Gemini to analyze your metadata and generate the following:

  • Descriptions: AI-generated summaries explaining the purpose of the dataset, the structure of the table, and the details of specific columns.

  • Sample queries: Tailored SQL queries designed specifically for the schema and content of your dataset or table.

  • Relationship graphs: Visualizations that show the connections and dependencies between different tables within your dataset.

Supported resources

Data insights are available for the following structured data types:

  • BigQuery datasets, tables, and views
  • BigLake tables (including Apache Iceberg)
  • External tables
  • Iceberg REST Catalog tables

Example of an insights run

Data insights automatically generates natural language queries and their SQL equivalents based on a table's metadata.

Consider a table called telco_churn with the following metadata:

Field name Type
CustomerID STRING
Gender STRING
Tenure INT64
InternetService STRING
StreamingTV STRING
OnlineBackup STRING
Contract STRING
TechSupport STRING
PaymentMethod STRING
MonthlyCharges FLOAT
Churn BOOLEAN

The following are some of the sample queries that data insights generates for this table:

  • Identify customers who have subscribed to all premium services and have been a customer for more than 50 months.

      SELECT 
      
     CustomerID 
     , 
      
     Contract 
     , 
      
     Tenure 
     FROM 
      
     agentville_datasets 
     . 
     telco_churn 
     WHERE 
      
     OnlineBackup 
      
     = 
      
     'Yes' 
      
     AND 
      
     TechSupport 
      
     = 
      
     'Yes' 
      
     AND 
      
     StreamingTV 
      
     = 
      
     'Yes' 
      
     AND 
      
     Tenure 
     > 
     50 
     ; 
     
    
  • Identify which internet service has the most churned customers.

      SELECT 
      
     InternetService 
     , 
      
     COUNT 
     ( 
     DISTINCT 
      
     CustomerID 
     ) 
      
     AS 
      
     total_customers 
     FROM 
      
     agentville_datasets 
     . 
     telco_churn 
     WHERE 
      
     Churn 
      
     = 
      
     TRUE 
     GROUP 
      
     BY 
      
     InternetService 
     ORDER 
      
     BY 
      
     total_customers 
      
     DESC 
     LIMIT 
      
     1 
     ; 
     
    
  • Identify churn rates by segment among high-value customers.

      SELECT 
      
     Contract 
     , 
      
     InternetService 
     , 
      
     Gender 
     , 
      
     PaymentMethod 
     , 
      
     COUNT 
     ( 
     DISTINCT 
      
     CustomerID 
     ) 
      
     AS 
      
     total_customers 
     , 
      
     SUM 
     ( 
     CASE 
      
     WHEN 
      
     Churn 
      
     = 
      
     TRUE 
      
     THEN 
      
     1 
      
     ELSE 
      
     0 
      
     END 
     ) 
      
     AS 
      
     churned_customers 
     , 
      
     ( 
     SUM 
     ( 
     CASE 
      
     WHEN 
      
     Churn 
      
     = 
      
     TRUE 
      
     THEN 
      
     1 
      
     ELSE 
      
     0 
      
     END 
     ) 
      
     / 
      
     COUNT 
     ( 
     DISTINCT 
      
     CustomerID 
     )) 
      
     * 
      
     100 
      
     AS 
      
     churn_rate 
     FROM 
      
     agentville_datasets 
     . 
     telco_churn 
     WHERE 
      
     MonthlyCharges 
     > 
     100 
     GROUP 
      
     BY 
      
     Contract 
     , 
      
     InternetService 
     , 
      
     Gender 
     , 
      
     PaymentMethod 
     ; 
     
    

Modes for generating data insights

When generating data insights, Knowledge Catalog provides two modes:

Mode
Description
Usage
Generate and publish

Persists generated data insights into Knowledge Catalog as metadata aspects. You must have the necessary permissions to publish. When you use Generate and publish , the following actions occur:

  • Stores table and column descriptions in Knowledge Catalog.
  • Captures suggested queries and questions as reusable aspects.
  • Makes published insights accessible to all users who have appropriate Knowledge Catalog access, ensuring shared organizational knowledge.
  • Lets you edit and save descriptions directly in Knowledge Catalog.

Use this mode for enterprise-wide data documentation that persists and is reusable, or when building catalog-driven governance workflows.

Generate without publishing

Creates data insights such as descriptions, natural language questions, and SQL queries on demand. Generate without publishing doesn't publish insights to Knowledge Catalog.

Use this mode for quick, ad hoc exploration to avoid cluttering the catalog.

Pricing

For details about pricing for this feature, see Gemini in BigQuery pricing overview .

Quotas and limits

For information about quotas and limits for this feature, see Quotas for Gemini in BigQuery .

Locations

You can use data insights in all BigQuery locations . To learn about where Gemini in BigQuery processes your data, see Where Gemini in BigQuery processes your data .

Limitations

  • For multi-cloud customers, data from other clouds isn't available.
  • Data insights doesn't support Geo or JSON column types.
  • Insights runs don't guarantee the presentation of queries every time. To increase the likelihood of generating more engaging queries, regenerate insights in BigQuery Studio.

What's next

Design a Mobile Site
View Site in Mobile | Classic
Share by: