Create a custom recommendations data store

To create a data store and ingest data for custom recommendations, go to the section for which source you plan to use:

BigQuery
Cloud Storage
Upload structured JSON data with the API

BigQuery

You can create data stores from BigQuery tables in two ways:

One-time ingestion: You import data from a BigQuery table into a data store. The data in the data store does not change unless you manually refresh the data .
Periodic ingestion: You import data from one or more BigQuery tables, and you set a sync frequency that determines how often the data stores are updated with the most recent data from the BigQuery dataset.

The following table compares the two ways that you can import BigQuery data into Vertex AI Search data stores.

One-time ingestion	Periodic ingestion
Generally available (GA).	Public preview.
Data must be refreshed manually.	Data updates automatically every 1, 3, or 5 days. Data cannot be manually refreshed.
Vertex AI Search creates a single data store from one table in a BigQuery.	Vertex AI Search creates a data connector for a BigQuery dataset and a data store (called an entity data store) for each table specified. For each data connector, the tables must have the same data type (for example, structured) and be in the same BigQuery dataset.
Data from multiple tables can be combined in one data store by first ingesting data from one table and then more data from another source or BigQuery table.	Because manual data import is not supported, the data in an entity data store can only be sourced from one BigQuery table.
Data source access control is supported.	Data source access control is not supported. The imported data can contain access controls but these controls won't be respected.
You can create a data store using either the Google Cloud console or the API.	You must use the console to create data connectors and their entity data stores.
CMEK-compliant.	CMEK-compliant.

Import once from BigQuery

To ingest data from a BigQuery table, use the following steps to create a data store and ingest data using either the Google Cloud console or the API.

Before importing your data, review Prepare data for ingesting .

Console

To use the Google Cloud console to ingest data from BigQuery, follow these steps:

In the Google Cloud console, go to the AI Applicationspage.

AI Applications
Go to the Data Storespage.
Click Create data store.
On the Sourcepage, select BigQuery.
Select the data type you are going to import from the What kind of data are you importingsection.
Select One timein the Synchronization frequencysection.
In the BigQuery pathfield, click Browse, select a table that you have prepared for ingesting , and then click Select. Alternatively, enter the table location directly in the BigQuery pathfield.
Click Continue.
If you are doing one-time import of structured data:
1. Map fields to key properties.
2. If there are important fields missing from the schema, use Add new fieldto add them.
  
  For more information, see About auto-detect and edit .
3. Click Continue.
Choose a region for your data store.
Enter a name for your data store.
Click Create.
To check the status of your ingestion, go to the Data Storespage and click your data store name to see details about it on its Datapage. When the status column on the Activitytab changes from In progressto Import completed, the ingestion is complete.

Depending on the size of your data, ingestion can take several minutes to several hours.

REST

To use the command line to create a data store and import data from BigQuery, follow these steps.

Create a data store.

 curl  
-X  
POST  
 \ 
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
print-access-token ) 
 " 
  
 \ 
-H  
 "Content-Type: application/json" 
  
 \ 
-H  
 "X-Goog-User-Project: PROJECT_ID 
" 
  
 \ 
 "https://discoveryengine.googleapis.com/v1/projects/ PROJECT_ID 
/locations/global/collections/default_collection/dataStores?dataStoreId= DATA_STORE_ID 
" 
  
 \ 
-d  
 '{ 
 "displayName": " DATA_STORE_DISPLAY_NAME 
", 
 "industryVertical": "GENERIC", 
 "solutionTypes": ["SOLUTION_TYPE_RECOMMENDATION"] 
 }'

Replace the following:

PROJECT_ID : the ID of your Google Cloud project.
DATA_STORE_ID : the ID of the Vertex AI Search data store that you want to create. This ID can contain only lowercase letters, digits, underscores, and hyphens.
DATA_STORE_DISPLAY_NAME : the display name of the Vertex AI Search data store that you want to create.

Import data from BigQuery.

If you defined a schema, make sure the data conforms to that schema.
```
 curl  
-X  
POST  
 \ 
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
print-access-token ) 
 " 
  
 \ 
-H  
 "Content-Type: application/json" 
  
 \ 
 "https://discoveryengine.googleapis.com/v1/projects/ PROJECT_ID 
/locations/global/collections/default_collection/dataStores/ DATA_STORE_ID 
/branches/0/documents:import" 
  
 \ 
-d  
 '{ 
 "bigquerySource": { 
 "projectId": " PROJECT_ID 
", 
 "datasetId":" DATASET_ID 
", 
 "tableId": " TABLE_ID 
", 
 "dataSchema": " DATA_SCHEMA 
", 
 "aclEnabled": " BOOLEAN 
" 
 }, 
 "reconciliationMode": " RECONCILIATION_MODE 
", 
 "autoGenerateIds": " AUTO_GENERATE_IDS 
", 
 "idField": " ID_FIELD 
", 
 "errorConfig": { 
 "gcsPrefix": " ERROR_DIRECTORY 
" 
 } 
 }' 
 
```
Replace the following:
- PROJECT_ID : the ID of your Google Cloud project.
- DATA_STORE_ID : the ID of the Vertex AI Search data store.
- DATASET_ID : the ID of the BigQuery dataset.
- TABLE_ID : the ID of the BigQuery table.
  - If the BigQuery table is not under PROJECT_ID , you need to give the service account service-<project number>@gcp-sa-discoveryengine.iam.gserviceaccount.com "BigQuery Data Viewer" permission for the BigQuery table. For example, if you are importing a BigQuery table from source project "123" to destination project "456", give service-456@gcp-sa-discoveryengine.iam.gserviceaccount.com permissions for the BigQuery table under project "123".
- DATA_SCHEMA : optional. Values are document and custom . The default is document .
  - document : the BigQuery table that you use must conform to the default BigQuery schema provided in Prepare data for ingesting . You can define the ID of each document yourself, while wrapping all the data in the jsonData string.
  - custom : Any BigQuery table schema is accepted, and Vertex AI Search automatically generates the IDs for each document that is imported.
- ERROR_DIRECTORY : optional. A Cloud Storage directory for error information about the import—for example, gs://<your-gcs-bucket>/directory/import_errors . Google recommends leaving this field empty to let Vertex AI Search automatically create a temporary directory.
- RECONCILIATION_MODE : optional. Values are FULL and INCREMENTAL . Default is INCREMENTAL . Specifying INCREMENTAL causes an incremental refresh of data from BigQuery to your data store. This does an upsert operation, which adds new documents and replaces existing documents with updated documents with the same ID. Specifying FULL causes a full rebase of the documents in your data store. In other words, new and updated documents are added to your data store, and documents that are not in BigQuery are removed from your data store. The FULL mode is helpful if you want to automatically delete documents that you no longer need.
- AUTO_GENERATE_IDS : optional. Specifies whether to automatically generate document IDs. If set to true , document IDs are generated based on a hash of the payload. Note that generated document IDs might not remain consistent over multiple imports. If you auto-generate IDs over multiple imports, Google highly recommends setting reconciliationMode to FULL to maintain consistent document IDs.
  
  Specify autoGenerateIds only when bigquerySource.dataSchema is set to custom . Otherwise an INVALID_ARGUMENT error is returned. If you don't specify autoGenerateIds or set it to false , you must specify idField . Otherwise the documents fail to import.
- ID_FIELD : optional. Specifies which fields are the document IDs. For BigQuery source files, idField indicates the name of the column in the BigQuery table that contains the document IDs.
  
  Specify idField only when: (1) bigquerySource.dataSchema is set to custom , and (2) auto_generate_ids is set to false or is unspecified. Otherwise an INVALID_ARGUMENT error is returned.
  
  The value of the BigQuery column name must be of string type, must be between 1 and 63 characters, and must conform to RFC-1034 . Otherwise, the documents fail to import.

C#

For more information, see the AI Applications C# API reference documentation .

To authenticate to AI Applications, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

Create a data store

  using 
  
  Google.Cloud.DiscoveryEngine.V1 
 
 ; 
 using 
  
  Google.LongRunning 
 
 ; 
 public 
  
 sealed 
  
 partial 
  
 class 
  
 GeneratedDataStoreServiceClientSnippets 
 { 
  
 /// <summary>Snippet for CreateDataStore</summary> 
  
 /// <remarks> 
  
 /// This snippet has been automatically generated and should be regarded as a code template only. 
  
 /// It will require modifications to work: 
  
 /// - It may require correct/in-range values for request initialization. 
  
 /// - It may require specifying regional endpoints when creating the service client as shown in 
  
 ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint. 
  
 /// </remarks> 
  
 public 
  
 void 
  
 CreateDataStoreRequestObject 
 () 
  
 { 
  
 // Create client 
  
  DataStoreServiceClient 
 
  
 dataStoreServiceClient 
  
 = 
  
  DataStoreServiceClient 
 
 . 
  Create 
 
 (); 
  
 // Initialize request argument(s) 
  
  CreateDataStoreRequest 
 
  
 request 
  
 = 
  
 new 
  
  CreateDataStoreRequest 
 
  
 { 
  
 ParentAsCollectionName 
  
 = 
  
  CollectionName 
 
 . 
  FromProjectLocationCollection 
 
 ( 
 "[PROJECT]" 
 , 
  
 "[LOCATION]" 
 , 
  
 "[COLLECTION]" 
 ), 
  
 DataStore 
  
 = 
  
 new 
  
  DataStore 
 
 (), 
  
 DataStoreId 
  
 = 
  
 "" 
 , 
  
 CreateAdvancedSiteSearch 
  
 = 
  
 false 
 , 
  
 CmekConfigNameAsCmekConfigName 
  
 = 
  
  CmekConfigName 
 
 . 
  FromProjectLocation 
 
 ( 
 "[PROJECT]" 
 , 
  
 "[LOCATION]" 
 ), 
  
 SkipDefaultSchemaCreation 
  
 = 
  
 false 
 , 
  
 }; 
  
 // Make the request 
  
 Operation<DataStore 
 , 
  
 CreateDataStoreMetadata 
>  
 response 
  
 = 
  
 dataStoreServiceClient 
 . 
  CreateDataStore 
 
 ( 
 request 
 ); 
  
 // Poll until the returned long-running operation is complete 
  
 Operation<DataStore 
 , 
  
 CreateDataStoreMetadata 
>  
 completedResponse 
  
 = 
  
 response 
 . 
 PollUntilCompleted 
 (); 
  
 // Retrieve the operation result 
  
  DataStore 
 
  
 result 
  
 = 
  
 completedResponse 
 . 
 Result 
 ; 
  
 // Or get the name of the operation 
  
 string 
  
 operationName 
  
 = 
  
 response 
 . 
 Name 
 ; 
  
 // This name can be stored, then the long-running operation retrieved later by name 
  
 Operation<DataStore 
 , 
  
 CreateDataStoreMetadata 
>  
 retrievedResponse 
  
 = 
  
 dataStoreServiceClient 
 . 
  PollOnceCreateDataStore 
 
 ( 
 operationName 
 ); 
  
 // Check if the retrieved long-running operation has completed 
  
 if 
  
 ( 
 retrievedResponse 
 . 
 IsCompleted 
 ) 
  
 { 
  
 // If it has completed, then access the result 
  
  DataStore 
 
  
 retrievedResult 
  
 = 
  
 retrievedResponse 
 . 
 Result 
 ; 
  
 } 
  
 } 
 }

Import documents

  using 
  
  Google.Cloud.DiscoveryEngine.V1 
 
 ; 
 using 
  
  Google.LongRunning 
 
 ; 
 using 
  
  Google.Protobuf.WellKnownTypes 
 
 ; 
 public 
  
 sealed 
  
 partial 
  
 class 
  
 GeneratedDocumentServiceClientSnippets 
 { 
  
 /// <summary>Snippet for ImportDocuments</summary> 
  
 /// <remarks> 
  
 /// This snippet has been automatically generated and should be regarded as a code template only. 
  
 /// It will require modifications to work: 
  
 /// - It may require correct/in-range values for request initialization. 
  
 /// - It may require specifying regional endpoints when creating the service client as shown in 
  
 ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint. 
  
 /// </remarks> 
  
 public 
  
 void 
  
 ImportDocumentsRequestObject 
 () 
  
 { 
  
 // Create client 
  
  DocumentServiceClient 
 
  
 documentServiceClient 
  
 = 
  
  DocumentServiceClient 
 
 . 
  Create 
 
 (); 
  
 // Initialize request argument(s) 
  
  ImportDocumentsRequest 
 
  
 request 
  
 = 
  
 new 
  
  ImportDocumentsRequest 
 
  
 { 
  
 ParentAsBranchName 
  
 = 
  
  BranchName 
 
 . 
  FromProjectLocationDataStoreBranch 
 
 ( 
 "[PROJECT]" 
 , 
  
 "[LOCATION]" 
 , 
  
 "[DATA_STORE]" 
 , 
  
 "[BRANCH]" 
 ), 
  
 InlineSource 
  
 = 
  
 new 
  
 ImportDocumentsRequest 
 . 
 Types 
 . 
 InlineSource 
 (), 
  
 ErrorConfig 
  
 = 
  
 new 
  
  ImportErrorConfig 
 
 (), 
  
 ReconciliationMode 
  
 = 
  
  ImportDocumentsRequest 
 
 . 
  Types 
 
 . 
  ReconciliationMode 
 
 . 
  Unspecified 
 
 , 
  
 UpdateMask 
  
 = 
  
 new 
  
  FieldMask 
 
 (), 
  
 AutoGenerateIds 
  
 = 
  
 false 
 , 
  
 IdField 
  
 = 
  
 "" 
 , 
  
 ForceRefreshContent 
  
 = 
  
 false 
 , 
  
 }; 
  
 // Make the request 
  
 Operation<ImportDocumentsResponse 
 , 
  
 ImportDocumentsMetadata 
>  
 response 
  
 = 
  
 documentServiceClient 
 . 
  ImportDocuments 
 
 ( 
 request 
 ); 
  
 // Poll until the returned long-running operation is complete 
  
 Operation<ImportDocumentsResponse 
 , 
  
 ImportDocumentsMetadata 
>  
 completedResponse 
  
 = 
  
 response 
 . 
 PollUntilCompleted 
 (); 
  
 // Retrieve the operation result 
  
  ImportDocumentsResponse 
 
  
 result 
  
 = 
  
 completedResponse 
 . 
 Result 
 ; 
  
 // Or get the name of the operation 
  
 string 
  
 operationName 
  
 = 
  
 response 
 . 
 Name 
 ; 
  
 // This name can be stored, then the long-running operation retrieved later by name 
  
 Operation<ImportDocumentsResponse 
 , 
  
 ImportDocumentsMetadata 
>  
 retrievedResponse 
  
 = 
  
 documentServiceClient 
 . 
  PollOnceImportDocuments 
 
 ( 
 operationName 
 ); 
  
 // Check if the retrieved long-running operation has completed 
  
 if 
  
 ( 
 retrievedResponse 
 . 
 IsCompleted 
 ) 
  
 { 
  
 // If it has completed, then access the result 
  
  ImportDocumentsResponse 
 
  
 retrievedResult 
  
 = 
  
 retrievedResponse 
 . 
 Result 
 ; 
  
 } 
  
 } 
 }

Create a custom recommendations data store Stay organized with collections Save and categorize content based on your preferences.

BigQuery

Import once from BigQuery

Console

REST

C#

Create a data store

Import documents

Go

Create a data store

Import documents

Java

Create a data store

Import documents

Node.js

Create a data store

Import documents

Python

Create a data store

Import documents

Ruby

Create a data store

Import documents

Connect to BigQuery with periodic syncing

Console

Next steps

Cloud Storage

Import once from Cloud Storage

Console

REST

C#

Create a data store

Import documents

Go

Create a data store

Import documents

Java

Create a data store

Import documents

Node.js

Create a data store

Import documents

Python

Create a data store

Import documents

Ruby

Create a data store

Import documents

Connect to Cloud Storage with periodic syncing

Console

Next steps

Upload structured JSON data with the API

REST

Next steps

Create a data store using Terraform

Create a custom recommendations data store