Create a search data store

To create a data store and ingest data for search, go to the section for the source you plan to use:

To sync data from a third-party data source instead, see Connect a third-party data source .

Limitations

If you have CMEK organization policies , you must create new data stores using the API, not the Google Cloud console. Creating new data stores using the Google Cloud console fails if you have CMEK organization policies enabled. For more information about CMEK support for Vertex AI Search, see Customer-managed encryption keys .

Create a data store using website content

Use the following procedure to create a data store and index websites.

To use a website data store after creating it, you must attach it to an app that has Enterprise features turned on. You can turn on Enterprise Edition for an app when you create it. This incurs additional costs. See Create a search app and About advanced features .

Console

To use the Google Cloud console to make a data store and index websites, follow these steps:

  1. In the Google Cloud console, go to the Agent Builderpage.

    Agent Builder

  2. In the navigation menu, click Data Stores.

  3. Click Create data store.

  4. On the Sourcepage, select Website Content.

  5. Choose whether to turn on Advanced website indexingfor this data store. This option can't be turned on or off later.

    Advanced website indexing provides additional features such as search summarization, search with follow-ups, and extractive answers. Advanced website indexing incurs additional cost, and requires that you verify domain ownership for any website that you index. For more information, see Advanced website indexing and Pricing .

  6. In the Sites to includefield, enter the URL patterns matching the websites that you want to include in your data store. Include one URL pattern per line, without comma separators. For example, www.example.com/docs/*

  7. Optional: In the Sites to excludefield, enter URL patterns that you want to exclude from your data store.

    To see the number of URL patterns you can include or exclude, see Website data .

  8. Click Continue.

  9. Select a location for your data store. Advanced website indexing must be turned on to select a location.

  10. Enter a name for your data store.

  11. Click Create. Vertex AI Search creates your data store and displays your data stores on the Data Storespage.

  12. To view information about your data store, click the name of your data store in the Namecolumn. Your data store page appears.

    • If you turned on Advanced website indexing, a warning appears prompting you to verify the domains in your data store.
    • If you have a quota shortfall (the number of pages in the websites that you specified exceeds the "Number of documents per project" quota for your project), an additional warning appears prompting you to upgrade your quota.
  13. To verify the domains for the URL patterns in your data store, follow the instructions on the Verify website domains page.

  14. To upgrade your quota, follow these steps:

    1. Click Upgrade quota. The IAM and Adminpage of the Google Cloud console appears.
    2. Follow the instructions at Request a higher quota limit in the Google Cloud documentation. The quota to increase is Number of documentsin the Discovery Engine APIservice.
    3. After submitting your request for a higher quota limit, go back to the Agent Builder page and click Data Storesin the navigation menu.
    4. Click the name of your data store in the Namecolumn. The Statuscolumn indicates that indexing is in progress for the websites that had surpassed the quota. When the Statuscolumn for a URL shows Indexed, advanced website indexing features are available for that URL or URL pattern.

    For more information, see Quota for web page indexing in the "Quotas and limits" page.

Python

For more information, see the Vertex AI Agent Builder Python API reference documentation .

To authenticate to Vertex AI Agent Builder, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

Create a data store

  from 
 google.api_core.client_options 
 import 
 ClientOptions 
 from 
 google.cloud 
 import 
 discoveryengine 
 # TODO(developer): Uncomment these variables before running the sample. 
 # project_id = "YOUR_PROJECT_ID" 
 # location = "YOUR_LOCATION" # Values: "global" 
 # data_store_id = "YOUR_DATA_STORE_ID" 
 def 
 create_data_store_sample 
 ( 
 project_id 
 : 
 str 
 , 
 location 
 : 
 str 
 , 
 data_store_id 
 : 
 str 
 , 
 ) 
 - 
> str 
 : 
 #  For more information, refer to: 
 # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store 
 client_options 
 = 
 ( 
 ClientOptions 
 ( 
 api_endpoint 
 = 
 f 
 " 
 { 
 location 
 } 
 -discoveryengine.googleapis.com" 
 ) 
 if 
 location 
 != 
 "global" 
 else 
 None 
 ) 
 # Create a client 
 client 
 = 
 discoveryengine 
 . 
 DataStoreServiceClient 
 ( 
 client_options 
 = 
 client_options 
 ) 
 # The full resource name of the collection 
 # e.g. projects/{project}/locations/{location}/collections/default_collection 
 parent 
 = 
 client 
 . 
 collection_path 
 ( 
 project 
 = 
 project_id 
 , 
 location 
 = 
 location 
 , 
 collection 
 = 
 "default_collection" 
 , 
 ) 
 data_store 
 = 
 discoveryengine 
 . 
 DataStore 
 ( 
 display_name 
 = 
 "My Data Store" 
 , 
 # Options: GENERIC, MEDIA, HEALTHCARE_FHIR 
 industry_vertical 
 = 
 discoveryengine 
 . 
 IndustryVertical 
 . 
 GENERIC 
 , 
 # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT 
 solution_types 
 = 
 [ 
 discoveryengine 
 . 
 SolutionType 
 . 
 SOLUTION_TYPE_SEARCH 
 ], 
 # TODO(developer): Update content_config based on data store type. 
 # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE 
 content_config 
 = 
 discoveryengine 
 . 
 DataStore 
 . 
 ContentConfig 
 . 
 CONTENT_REQUIRED 
 , 
 ) 
 request 
 = 
 discoveryengine 
 . 
 CreateDataStoreRequest 
 ( 
 parent 
 = 
 parent 
 , 
 data_store_id 
 = 
 data_store_id 
 , 
 data_store 
 = 
 data_store 
 , 
 # Optional: For Advanced Site Search Only 
 # create_advanced_site_search=True, 
 ) 
 # Make the request 
 operation 
 = 
 client 
 . 
 create_data_store 
 ( 
 request 
 = 
 request 
 ) 
 print 
 ( 
 f 
 "Waiting for operation to complete: 
 { 
 operation 
 . 
 operation 
 . 
 name 
 } 
 " 
 ) 
 response 
 = 
 operation 
 . 
 result 
 () 
 # After the operation is complete, 
 # get information from operation metadata 
 metadata 
 = 
 discoveryengine 
 . 
 CreateDataStoreMetadata 
 ( 
 operation 
 . 
 metadata 
 ) 
 # Handle the response 
 print 
 ( 
 response 
 ) 
 print 
 ( 
 metadata 
 ) 
 return 
 operation 
 . 
 operation 
 . 
 name 
 

Import websites

  from 
 google.api_core.client_options 
 import 
 ClientOptions 
 from 
 google.cloud 
 import 
 discoveryengine_v1 
 as 
 discoveryengine 
 # TODO(developer): Uncomment these variables before running the sample. 
 # project_id = "YOUR_PROJECT_ID" 
 # location = "YOUR_LOCATION" # Values: "global" 
 # data_store_id = "YOUR_DATA_STORE_ID" 
 # NOTE: Do not include http or https protocol in the URI pattern 
 # uri_pattern = "cloud.google.com/generative-ai-app-builder/docs/*" 
 #  For more information, refer to: 
 # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store 
 client_options 
 = 
 ( 
 ClientOptions 
 ( 
 api_endpoint 
 = 
 f 
 " 
 { 
 location 
 } 
 -discoveryengine.googleapis.com" 
 ) 
 if 
 location 
 != 
 "global" 
 else 
 None 
 ) 
 # Create a client 
 client 
 = 
 discoveryengine 
 . 
 SiteSearchEngineServiceClient 
 ( 
 client_options 
 = 
 client_options 
 ) 
 # The full resource name of the data store 
 # e.g. projects/{project}/locations/{location}/dataStores/{data_store_id} 
 site_search_engine 
 = 
 client 
 . 
 site_search_engine_path 
 ( 
 project 
 = 
 project_id 
 , 
 location 
 = 
 location 
 , 
 data_store 
 = 
 data_store_id 
 ) 
 # Target Site to index 
 target_site 
 = 
 discoveryengine 
 . 
 TargetSite 
 ( 
 provided_uri_pattern 
 = 
 uri_pattern 
 , 
 # Options: INCLUDE, EXCLUDE 
 type_ 
 = 
 discoveryengine 
 . 
 TargetSite 
 . 
 Type 
 . 
 INCLUDE 
 , 
 exact_match 
 = 
 False 
 , 
 ) 
 # Make the request 
 operation 
 = 
 client 
 . 
 create_target_site 
 ( 
 parent 
 = 
 site_search_engine 
 , 
 target_site 
 = 
 target_site 
 , 
 ) 
 print 
 ( 
 f 
 "Waiting for operation to complete: 
 { 
 operation 
 . 
 operation 
 . 
 name 
 } 
 " 
 ) 
 response 
 = 
 operation 
 . 
 result 
 () 
 # After the operation is complete, 
 # get information from operation metadata 
 metadata 
 = 
 discoveryengine 
 . 
 CreateTargetSiteMetadata 
 ( 
 operation 
 . 
 metadata 
 ) 
 # Handle the response 
 print 
 ( 
 response 
 ) 
 print 
 ( 
 metadata 
 )