Create a search data store

To create a data store and ingest data for search, go to the section for the source you plan to use:

Create a data store using website content
Import from BigQuery
Import from Cloud Storage
Sync from Google Drive
Sync from Gmail (Public preview)
Sync from Google Sites (Public preview)
Sync from Google Calendar (Public preview)
Sync from Google Groups (Public preview)
Sync people data (Public preview)
Import from Cloud SQL
Import from Spanner (Public preview)
Import from Firestore
Import from Bigtable (Public Preview)
Import from AlloyDB for PostgreSQL (Public Preview)
Upload structured JSON data with the API
Create a data store using Terraform

To sync data from a third-party data source instead, see Connect a third-party data source .

Limitations

If you have CMEK organization policies , you must create new data stores using the API, not the Google Cloud console. Creating new data stores using the Google Cloud console fails if you have CMEK organization policies enabled. For more information about CMEK support for Vertex AI Search, see Customer-managed encryption keys .

Create a data store using website content

Use the following procedure to create a data store and index websites.

To use a website data store after creating it, you must attach it to an app that has Enterprise features turned on. You can turn on Enterprise Edition for an app when you create it. This incurs additional costs. See Create a search app and About advanced features .

Console

To use the Google Cloud console to make a data store and index websites, follow these steps:

In the Google Cloud console, go to the Agent Builderpage.

Agent Builder
In the navigation menu, click Data Stores.
Click Create data store.
On the Sourcepage, select Website Content.
Choose whether to turn on Advanced website indexingfor this data store. This option can't be turned on or off later.

Advanced website indexing provides additional features such as search summarization, search with follow-ups, and extractive answers. Advanced website indexing incurs additional cost, and requires that you verify domain ownership for any website that you index. For more information, see Advanced website indexing and Pricing .
In the Sites to includefield, enter the URL patterns matching the websites that you want to include in your data store. Include one URL pattern per line, without comma separators. For example, www.example.com/docs/*
Optional: In the Sites to excludefield, enter URL patterns that you want to exclude from your data store.

To see the number of URL patterns you can include or exclude, see Website data .
Click Continue.
Select a location for your data store. Advanced website indexing must be turned on to select a location.
Enter a name for your data store.
Click Create. Vertex AI Search creates your data store and displays your data stores on the Data Storespage.
To view information about your data store, click the name of your data store in the Namecolumn. Your data store page appears.
- If you turned on Advanced website indexing, a warning appears prompting you to verify the domains in your data store.
- If you have a quota shortfall (the number of pages in the websites that you specified exceeds the "Number of documents per project" quota for your project), an additional warning appears prompting you to upgrade your quota.
To verify the domains for the URL patterns in your data store, follow the instructions on the Verify website domains page.
To upgrade your quota, follow these steps:
1. Click Upgrade quota. The IAM and Adminpage of the Google Cloud console appears.
2. Follow the instructions at Request a higher quota limit in the Google Cloud documentation. The quota to increase is Number of documentsin the Discovery Engine APIservice.
3. After submitting your request for a higher quota limit, go back to the Agent Builder page and click Data Storesin the navigation menu.
4. Click the name of your data store in the Namecolumn. The Statuscolumn indicates that indexing is in progress for the websites that had surpassed the quota. When the Statuscolumn for a URL shows Indexed, advanced website indexing features are available for that URL or URL pattern.
For more information, see Quota for web page indexing in the "Quotas and limits" page.

Python

For more information, see the Vertex AI Agent Builder Python API reference documentation .

To authenticate to Vertex AI Agent Builder, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

Create a data store

  from 
  
 google.api_core.client_options 
  
 import 
 ClientOptions 
 from 
  
 google.cloud 
  
 import 
 discoveryengine 
 # TODO(developer): Uncomment these variables before running the sample. 
 # project_id = "YOUR_PROJECT_ID" 
 # location = "YOUR_LOCATION" # Values: "global" 
 # data_store_id = "YOUR_DATA_STORE_ID" 
 def 
  
 create_data_store_sample 
 ( 
 project_id 
 : 
 str 
 , 
 location 
 : 
 str 
 , 
 data_store_id 
 : 
 str 
 , 
 ) 
 - 
> str 
 : 
 #  For more information, refer to: 
 # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store 
 client_options 
 = 
 ( 
 ClientOptions 
 ( 
 api_endpoint 
 = 
 f 
 " 
 { 
 location 
 } 
 -discoveryengine.googleapis.com" 
 ) 
 if 
 location 
 != 
 "global" 
 else 
 None 
 ) 
 # Create a client 
 client 
 = 
 discoveryengine 
 . 
 DataStoreServiceClient 
 ( 
 client_options 
 = 
 client_options 
 ) 
 # The full resource name of the collection 
 # e.g. projects/{project}/locations/{location}/collections/default_collection 
 parent 
 = 
 client 
 . 
 collection_path 
 ( 
 project 
 = 
 project_id 
 , 
 location 
 = 
 location 
 , 
 collection 
 = 
 "default_collection" 
 , 
 ) 
 data_store 
 = 
 discoveryengine 
 . 
 DataStore 
 ( 
 display_name 
 = 
 "My Data Store" 
 , 
 # Options: GENERIC, MEDIA, HEALTHCARE_FHIR 
 industry_vertical 
 = 
 discoveryengine 
 . 
 IndustryVertical 
 . 
 GENERIC 
 , 
 # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT 
 solution_types 
 = 
 [ 
 discoveryengine 
 . 
 SolutionType 
 . 
 SOLUTION_TYPE_SEARCH 
 ], 
 # TODO(developer): Update content_config based on data store type. 
 # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE 
 content_config 
 = 
 discoveryengine 
 . 
 DataStore 
 . 
 ContentConfig 
 . 
 CONTENT_REQUIRED 
 , 
 ) 
 request 
 = 
 discoveryengine 
 . 
 CreateDataStoreRequest 
 ( 
 parent 
 = 
 parent 
 , 
 data_store_id 
 = 
 data_store_id 
 , 
 data_store 
 = 
 data_store 
 , 
 # Optional: For Advanced Site Search Only 
 # create_advanced_site_search=True, 
 ) 
 # Make the request 
 operation 
 = 
 client 
 . 
 create_data_store 
 ( 
 request 
 = 
 request 
 ) 
 print 
 ( 
 f 
 "Waiting for operation to complete: 
 { 
 operation 
 . 
 operation 
 . 
 name 
 } 
 " 
 ) 
 response 
 = 
 operation 
 . 
 result 
 () 
 # After the operation is complete, 
 # get information from operation metadata 
 metadata 
 = 
 discoveryengine 
 . 
 CreateDataStoreMetadata 
 ( 
 operation 
 . 
 metadata 
 ) 
 # Handle the response 
 print 
 ( 
 response 
 ) 
 print 
 ( 
 metadata 
 ) 
 return 
 operation 
 . 
 operation 
 . 
 name

Import websites

  from 
  
 google.api_core.client_options 
  
 import 
 ClientOptions 
 from 
  
 google.cloud 
  
 import 
 discoveryengine_v1 
 as 
 discoveryengine 
 # TODO(developer): Uncomment these variables before running the sample. 
 # project_id = "YOUR_PROJECT_ID" 
 # location = "YOUR_LOCATION" # Values: "global" 
 # data_store_id = "YOUR_DATA_STORE_ID" 
 # NOTE: Do not include http or https protocol in the URI pattern 
 # uri_pattern = "cloud.google.com/generative-ai-app-builder/docs/*" 
 #  For more information, refer to: 
 # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store 
 client_options 
 = 
 ( 
 ClientOptions 
 ( 
 api_endpoint 
 = 
 f 
 " 
 { 
 location 
 } 
 -discoveryengine.googleapis.com" 
 ) 
 if 
 location 
 != 
 "global" 
 else 
 None 
 ) 
 # Create a client 
 client 
 = 
 discoveryengine 
 . 
 SiteSearchEngineServiceClient 
 ( 
 client_options 
 = 
 client_options 
 ) 
 # The full resource name of the data store 
 # e.g. projects/{project}/locations/{location}/dataStores/{data_store_id} 
 site_search_engine 
 = 
 client 
 . 
 site_search_engine_path 
 ( 
 project 
 = 
 project_id 
 , 
 location 
 = 
 location 
 , 
 data_store 
 = 
 data_store_id 
 ) 
 # Target Site to index 
 target_site 
 = 
 discoveryengine 
 . 
 TargetSite 
 ( 
 provided_uri_pattern 
 = 
 uri_pattern 
 , 
 # Options: INCLUDE, EXCLUDE 
 type_ 
 = 
 discoveryengine 
 . 
 TargetSite 
 . 
 Type 
 . 
 INCLUDE 
 , 
 exact_match 
 = 
 False 
 , 
 ) 
 # Make the request 
 operation 
 = 
 client 
 . 
 create_target_site 
 ( 
 parent 
 = 
 site_search_engine 
 , 
 target_site 
 = 
 target_site 
 , 
 ) 
 print 
 ( 
 f 
 "Waiting for operation to complete: 
 { 
 operation 
 . 
 operation 
 . 
 name 
 } 
 " 
 ) 
 response 
 = 
 operation 
 . 
 result 
 () 
 # After the operation is complete, 
 # get information from operation metadata 
 metadata 
 = 
 discoveryengine 
 . 
 CreateTargetSiteMetadata 
 ( 
 operation 
 . 
 metadata 
 ) 
 # Handle the response 
 print 
 ( 
 response 
 ) 
 print 
 ( 
 metadata 
 )

One-time ingestion	Periodic ingestion
Generally available (GA).	Public preview.
Data must be refreshed manually.	Data updates automatically every 1, 3, or 5 days. Data cannot be manually refreshed.
Vertex AI Search creates a single data store from one table in a BigQuery.	Vertex AI Search creates a data connector for a BigQuery dataset and a data store (called an entity data store) for each table specified. For each data connector, the tables must have the same data type (for example, structured) and be in the same BigQuery dataset.
Data from multiple tables can be combined in one data store by first ingesting data from one table and then more data from another source or BigQuery table.	Because manual data import is not supported, the data in an entity data store can only be sourced from one BigQuery table.
Data source access control is supported.	Data source access control is not supported. The imported data can contain access controls but these controls won't be respected.
You can create a data store using either the Google Cloud console or the API.	You must use the console to create data connectors and their entity data stores.
CMEK-compliant.	Not CMEK-compliant.

One-time ingestion	Periodic ingestion
Generally available (GA).	Public preview.
Data must be refreshed manually.	Data updates automatically every one, three, or five days. Data cannot be manually refreshed.
Vertex AI Search creates a single data store from one folder or file in Cloud Storage.	Vertex AI Search creates a data connector , and associates a data store (called an entity data store) with it for the file or folder that is specified. Each Cloud Storage data connector can have a single entity data store.
Data from multiple files, folders, and buckets can be combined in one data store by first ingesting data from one Cloud Storage location and then more data from another location.	Because manual data import is not supported, the data in an entity data store can only be sourced from one Cloud Storage file or folder.
Data source access control is supported. For more information, see Data source access control .	Data source access control is not supported. The imported data can contain access controls but these controls won't be respected.
You can create a data store using either the Google Cloud console or the API.	You must use the console to create data connectors and their entity data stores.
CMEK-compliant .	Not CMEK-compliant.

Create a search data store

Limitations

Create a data store using website content

Console

Python

Create a data store

Import websites

Next steps

Import from BigQuery

Import once from BigQuery

Console

REST

C#

Create a data store

Import documents

Go

Create a data store

Import documents

Java

Create a data store

Import documents

Node.js

Create a data store

Import documents

Python

Create a data store

Import documents

Ruby

Create a data store

Import documents

Connect to BigQuery with periodic syncing

Console

Next steps

Import from Cloud Storage

Import once from Cloud Storage

Console

REST

C#

Create a data store

Import documents

Go

Create a data store

Import documents

Java

Create a data store

Import documents

Node.js

Create a data store

Import documents

Python

Create a data store

Import documents

Ruby

Create a data store

Import documents

Connect to Cloud Storage with periodic syncing

Console

Next steps

Connect to Google Drive

Console

Use advanced drive indexing (Private preview)

Console

Next steps

Connect to Gmail

Console

Next steps

Connect to Google Sites

Console

Next steps

Connect to Google Calendar

Console

Next steps

Connect to Google Groups

Console

Next steps

Sync people data from Google Workspace

Prerequisites

Create a people search data store

Console

Next steps