If you haveCMEK organization policies, you must
create new data stores using the API, not the Google Cloud console. Creating new
data stores using the
Google Cloud console fails if you have CMEK organization policies enabled. For
more information about CMEK support for Vertex AI Search, seeCustomer-managed encryption keys.
Create a data store using website content
Use the following procedure to create a data store and index websites.
To use a website data store after creating it, you must attach it to an app that
has Enterprise features turned on. You can turn on Enterprise Edition for an app
when you create it. This incurs additional costs. SeeCreate a search appandAbout advanced features.
Console
To use the Google Cloud console to make a data store and index websites, follow
these steps:
In the Google Cloud console, go to theAgent Builderpage.
Choose whether to turn onAdvanced website indexingfor this data store.
This option can't be turned on or off later.
Advanced website indexing provides additional features such as search
summarization, search with follow-ups, and extractive answers. Advanced
website indexing incurs additional cost, and requires that you verify domain
ownership for any website that you index. For more information, seeAdvanced website indexingandPricing.
In theSites to includefield, enter the URL patterns matching the
websites that you want to include in your data store. Include one URL
pattern per line, without comma separators. For example,www.example.com/docs/*
Optional: In theSites to excludefield, enter URL patterns that you
want to exclude from your data store.
To see the number of URL patterns you can include or exclude, seeWebsite data.
ClickContinue.
Select a location for your data store. Advanced website indexing must be
turned on to select a location.
Enter a name for your data store.
ClickCreate. Vertex AI Search creates your data store and
displays your data stores on theData Storespage.
To view information about your data store, click the name of your data store
in theNamecolumn. Your data store page appears.
If you turned onAdvanced website indexing, a warning appears prompting
you to verify the domains in your data store.
If you have a quota shortfall (the number of pages in the websites that
you specified exceeds the "Number of documents per project"quotafor your project), an additional warning
appears prompting you to upgrade your quota.
To verify the domains for the URL patterns in your data store, follow the
instructions on theVerify website domainspage.
To upgrade your quota, follow these steps:
ClickUpgrade quota. TheIAM and Adminpage of the Google Cloud console appears.
Follow the instructions atRequest a higher quota
limitin the Google Cloud documentation. The
quota to increase isNumber of documentsin theDiscovery Engine
APIservice.
After submitting your request for a higher quota limit, go back to theAgent Builderpage and clickData Storesin the navigation menu.
Click the name of your data store in theNamecolumn. TheStatuscolumn indicates that indexing is in progress for the websites that had surpassed the quota. When theStatuscolumn for a URL showsIndexed,advanced website indexingfeatures are available for that URL or URL pattern.
fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"defcreate_data_store_sample(project_id:str,location:str,data_store_id:str,)->str:# For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DataStoreServiceClient(client_options=client_options)# The full resource name of the collection# e.g. projects/{project}/locations/{location}/collections/default_collectionparent=client.collection_path(project=project_id,location=location,collection="default_collection",)data_store=discoveryengine.DataStore(display_name="My Data Store",# Options: GENERIC, MEDIA, HEALTHCARE_FHIRindustry_vertical=discoveryengine.IndustryVertical.GENERIC,# Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHATsolution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],# TODO(developer): Update content_config based on data store type.# Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITEcontent_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,)request=discoveryengine.CreateDataStoreRequest(parent=parent,data_store_id=data_store_id,data_store=data_store,# Optional: For Advanced Site Search Only# create_advanced_site_search=True,)# Make the requestoperation=client.create_data_store(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.CreateDataStoreMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)returnoperation.operation.name
Import websites
fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine_v1asdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"# NOTE: Do not include http or https protocol in the URI pattern# uri_pattern = "cloud.google.com/generative-ai-app-builder/docs/*"# For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.SiteSearchEngineServiceClient(client_options=client_options)# The full resource name of the data store# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}site_search_engine=client.site_search_engine_path(project=project_id,location=location,data_store=data_store_id)# Target Site to indextarget_site=discoveryengine.TargetSite(provided_uri_pattern=uri_pattern,# Options: INCLUDE, EXCLUDEtype_=discoveryengine.TargetSite.Type.INCLUDE,exact_match=False,)# Make the requestoperation=client.create_target_site(parent=site_search_engine,target_site=target_site,)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.CreateTargetSiteMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)
Next steps
To attach your website data store to an app, create an app with Enterprise
features enabled and select your data store following the steps inCreate a search app.
To preview how your search results appear after your app and data store are
set up, seeGet search results.
Import from BigQuery
You can create data stores from BigQuery tables in two ways:
One-time ingestion: You import data from a BigQuery table into a
data store. The data in the data store does not change unless you manuallyrefresh the data.
Periodic ingestion: You import data from one or more BigQuery
tables, and you set a sync frequency that determines how often the data
stores are updated with the most recent data from the BigQuery
dataset.
The following table compares the two ways that you can import BigQuery
data into Vertex AI Search data stores.
One-time ingestion
Periodic ingestion
Generally available (GA).
Public preview.
Data must be refreshed manually.
Data updates automatically every 1, 3, or 5 days. Data cannot be
manually refreshed.
Vertex AI Search creates a single data store from onetablein a BigQuery.
Vertex AI Search creates adata connectorfor
a BigQuerydatasetand a data store (called anentitydata store) for each table specified. For each data
connector, the tables must have the same data type (for example,
structured) and be in the same BigQuery dataset.
Data from multiple tables can be combined in one data store by first
ingesting data from one table and then more data from another source or
BigQuery table.
Because manual data import is not supported, the data in an entity
data store can only be sourced from one BigQuery table.
Data source access control is supported.
Data source access control is not supported. The imported data can
contain access controls but these controls won't be respected.
You can create a data store using either the
Google Cloud console or the API.
You must use the console to create data connectors and their entity
data stores.
CMEK-compliant.
Not CMEK-compliant.
Import once from BigQuery
To ingest data from a BigQuery table, use the following steps to create
a data store and ingest data using either the Google Cloud console or the API.
In theBigQuery pathfield, clickBrowse, select a table that you
haveprepared for ingesting, and then clickSelect.
Alternatively, enter the table location directly in theBigQuery pathfield.
ClickContinue.
If you are doing one-time import of structured data:
Map fields to key properties.
If there are important fields missing from the schema, useAdd new
fieldto add them.
To check the status of your ingestion, go to theData Storespage
and click your data store name to see details about it on itsDatapage.
When the status column on theActivitytab changes fromIn progresstoImport completed, the ingestion is complete.
Depending on the size of your data, ingestion can take several
minutes to several hours.
REST
To use the command line to create a data store and import data from
BigQuery, follow these steps.
DATA_STORE_ID: the ID of the Vertex AI Search data store that you want to create. This ID can contain only lowercase
letters, digits, underscores, and hyphens.
DATA_STORE_DISPLAY_NAME: the display name of the Vertex AI
Search data store that you want to create.
Optional: If you're uploading unstructured data and want to configure document
parsing or to turn on document chunking for RAG, specify thedocumentProcessingConfigobject and include it in your data store creation request. Configuring an
OCR parser for PDFs is recommended if you're ingesting scanned PDFs. For how
to configure parsing or chunking options, seeParse and chunk
documents.
Import data from BigQuery.
If you defined a schema, make sure the data conforms to that schema.
DATA_STORE_ID: the ID of the Vertex AI Search data store.
DATASET_ID: the ID of the BigQuery
dataset.
TABLE_ID: the ID of the BigQuery table.
If the BigQuery table is not underPROJECT_ID, you need to give the service accountservice-<project
number>@gcp-sa-discoveryengine.iam.gserviceaccount.com"BigQuery Data Viewer" permission for the
BigQuery table. For example, if you are importing
a BigQuery table from source project "123" to
destination project "456", giveservice-456@gcp-sa-discoveryengine.iam.gserviceaccount.compermissions for the BigQuery table under
project "123".
DATA_SCHEMA: Optional. Values aredocumentandcustom. The default isdocument.
document: the BigQuery table
that you use must conform to the default BigQuery
schema provided inPrepare data for ingesting.
You can define the ID of each document yourself,
while wrapping all the data in the jsonData string.
custom: Any BigQuery table
schema is accepted, and Vertex AI Search automatically
generates the IDs for each document that is imported.
ERROR_DIRECTORY: Optional. A Cloud Storage directory
for error information about the import—for example,gs://<your-gcs-bucket>/directory/import_errors. Google recommends
leaving this field empty to let Vertex AI Search
automatically create a temporary directory.
RECONCILIATION_MODE: Optional. Values areFULLandINCREMENTAL. Default isINCREMENTAL. SpecifyingINCREMENTALcauses an incremental refresh of data from BigQuery
to your data store. This does an upsert operation, which adds new
documents and replaces existing documents with updated documents
with the same ID. SpecifyingFULLcauses a full rebase of the
documents in your data store. In other words, new and updated
documents are added to your data store, and documents that are not
in BigQuery are removed from your data store. TheFULLmode is helpful if you want to automatically delete documents
that you no longer need.
AUTO_GENERATE_IDS: Optional. Specifies whether to
automatically generate document IDs. If set totrue, document IDs
are generated based on a hash of the payload. Note that generated
document IDs might not remain consistent over multiple imports. If
you auto-generate IDs over multiple imports, Google highly
recommends settingreconciliationModetoFULLto maintain
consistent document IDs.
SpecifyautoGenerateIdsonly whenbigquerySource.dataSchemais
set tocustom. Otherwise anINVALID_ARGUMENTerror is
returned. If you don't specifyautoGenerateIdsor set it tofalse, you must specifyidField. Otherwise the documents fail to
import.
ID_FIELD: Optional. Specifies which fields are the
document IDs. For BigQuery source files,idFieldindicates the name of the column in the BigQuery
table that contains the document IDs.
SpecifyidFieldonly when: (1)bigquerySource.dataSchemais set
tocustom, and (2)auto_generate_idsis set tofalseor is
unspecified. Otherwise anINVALID_ARGUMENTerror is returned.
The value of the BigQuery column name must be of
string type, must be between 1 and 63 characters, and must conform
toRFC-1034. Otherwise, the
documents fail to import.
usingGoogle.Cloud.DiscoveryEngine.V1;usingGoogle.LongRunning;publicsealedpartialclassGeneratedDataStoreServiceClientSnippets{/// <summary>Snippet for CreateDataStore</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in/// https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidCreateDataStoreRequestObject(){// Create clientDataStoreServiceClientdataStoreServiceClient=DataStoreServiceClient.Create();// Initialize request argument(s)CreateDataStoreRequestrequest=newCreateDataStoreRequest{ParentAsCollectionName=CollectionName.FromProjectLocationCollection("[PROJECT]","[LOCATION]","[COLLECTION]"),DataStore=newDataStore(),DataStoreId="",CreateAdvancedSiteSearch=false,SkipDefaultSchemaCreation=false,};// Make the requestOperation<DataStore,CreateDataStoreMetadata>response=dataStoreServiceClient.CreateDataStore(request);// Poll until the returned long-running operation is completeOperation<DataStore,CreateDataStoreMetadata>completedResponse=response.PollUntilCompleted();// Retrieve the operation resultDataStoreresult=completedResponse.Result;// Or get the name of the operationstringoperationName=response.Name;// This name can be stored, then the long-running operation retrieved later by nameOperation<DataStore,CreateDataStoreMetadata>retrievedResponse=dataStoreServiceClient.PollOnceCreateDataStore(operationName);// Check if the retrieved long-running operation has completedif(retrievedResponse.IsCompleted){// If it has completed, then access the resultDataStoreretrievedResult=retrievedResponse.Result;}}}
Import documents
usingGoogle.Cloud.DiscoveryEngine.V1;usingGoogle.LongRunning;usingGoogle.Protobuf.WellKnownTypes;publicsealedpartialclassGeneratedDocumentServiceClientSnippets{/// <summary>Snippet for ImportDocuments</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in/// https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidImportDocumentsRequestObject(){// Create clientDocumentServiceClientdocumentServiceClient=DocumentServiceClient.Create();// Initialize request argument(s)ImportDocumentsRequestrequest=newImportDocumentsRequest{ParentAsBranchName=BranchName.FromProjectLocationDataStoreBranch("[PROJECT]","[LOCATION]","[DATA_STORE]","[BRANCH]"),InlineSource=newImportDocumentsRequest.Types.InlineSource(),ErrorConfig=newImportErrorConfig(),ReconciliationMode=ImportDocumentsRequest.Types.ReconciliationMode.Unspecified,UpdateMask=newFieldMask(),AutoGenerateIds=false,IdField="",};// Make the requestOperation<ImportDocumentsResponse,ImportDocumentsMetadata>response=documentServiceClient.ImportDocuments(request);// Poll until the returned long-running operation is completeOperation<ImportDocumentsResponse,ImportDocumentsMetadata>completedResponse=response.PollUntilCompleted();// Retrieve the operation resultImportDocumentsResponseresult=completedResponse.Result;// Or get the name of the operationstringoperationName=response.Name;// This name can be stored, then the long-running operation retrieved later by nameOperation<ImportDocumentsResponse,ImportDocumentsMetadata>retrievedResponse=documentServiceClient.PollOnceImportDocuments(operationName);// Check if the retrieved long-running operation has completedif(retrievedResponse.IsCompleted){// If it has completed, then access the resultImportDocumentsResponseretrievedResult=retrievedResponse.Result;}}}
packagemainimport("context"discoveryengine"cloud.google.com/go/discoveryengine/apiv1"discoveryenginepb"cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in:// https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=discoveryengine.NewDataStoreClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&discoveryenginepb.CreateDataStoreRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#CreateDataStoreRequest.}op,err:=c.CreateDataStore(ctx,req)iferr!=nil{// TODO: Handle error.}resp,err:=op.Wait(ctx)iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp}
Import documents
packagemainimport("context"discoveryengine"cloud.google.com/go/discoveryengine/apiv1"discoveryenginepb"cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in:// https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=discoveryengine.NewDocumentClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&discoveryenginepb.ImportDocumentsRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#ImportDocumentsRequest.}op,err:=c.ImportDocuments(ctx,req)iferr!=nil{// TODO: Handle error.}resp,err:=op.Wait(ctx)iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp}
importcom.google.cloud.discoveryengine.v1.CollectionName;importcom.google.cloud.discoveryengine.v1.CreateDataStoreRequest;importcom.google.cloud.discoveryengine.v1.DataStore;importcom.google.cloud.discoveryengine.v1.DataStoreServiceClient;publicclassSyncCreateDataStore{publicstaticvoidmain(String[]args)throwsException{syncCreateDataStore();}publicstaticvoidsyncCreateDataStore()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DataStoreServiceClientdataStoreServiceClient=DataStoreServiceClient.create()){CreateDataStoreRequestrequest=CreateDataStoreRequest.newBuilder().setParent(CollectionName.of("[PROJECT]","[LOCATION]","[COLLECTION]").toString()).setDataStore(DataStore.newBuilder().build()).setDataStoreId("dataStoreId929489618").setCreateAdvancedSiteSearch(true).setSkipDefaultSchemaCreation(true).build();DataStoreresponse=dataStoreServiceClient.createDataStoreAsync(request).get();}}}
Import documents
importcom.google.cloud.discoveryengine.v1.BranchName;importcom.google.cloud.discoveryengine.v1.DocumentServiceClient;importcom.google.cloud.discoveryengine.v1.ImportDocumentsRequest;importcom.google.cloud.discoveryengine.v1.ImportDocumentsResponse;importcom.google.cloud.discoveryengine.v1.ImportErrorConfig;importcom.google.protobuf.FieldMask;publicclassSyncImportDocuments{publicstaticvoidmain(String[]args)throwsException{syncImportDocuments();}publicstaticvoidsyncImportDocuments()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DocumentServiceClientdocumentServiceClient=DocumentServiceClient.create()){ImportDocumentsRequestrequest=ImportDocumentsRequest.newBuilder().setParent(BranchName.ofProjectLocationDataStoreBranchName("[PROJECT]","[LOCATION]","[DATA_STORE]","[BRANCH]").toString()).setErrorConfig(ImportErrorConfig.newBuilder().build()).setUpdateMask(FieldMask.newBuilder().build()).setAutoGenerateIds(true).setIdField("idField1629396127").build();ImportDocumentsResponseresponse=documentServiceClient.importDocumentsAsync(request).get();}}}
/*** This snippet has been automatically generated and should be regarded as a code template only.* It will require modifications to work.* It may require correct/in-range values for request initialization.* TODO(developer): Uncomment these variables before running the sample.*//*** Required. The parent resource name, such as* `projects/{project}/locations/{location}/collections/{collection}`.*/// const parent = 'abc123'/*** Required. The DataStore google.cloud.discoveryengine.v1.DataStore to* create.*/// const dataStore = {}/*** Required. The ID to use for the* DataStore google.cloud.discoveryengine.v1.DataStore, which will become* the final component of the* DataStore google.cloud.discoveryengine.v1.DataStore's resource name.* This field must conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)* standard with a length limit of 63 characters. Otherwise, an* INVALID_ARGUMENT error is returned.*/// const dataStoreId = 'abc123'/*** A boolean flag indicating whether user want to directly create an advanced* data store for site search.* If the data store is not configured as site* search (GENERIC vertical and PUBLIC_WEBSITE content_config), this flag will* be ignored.*/// const createAdvancedSiteSearch = true/*** A boolean flag indicating whether to skip the default schema creation for* the data store. Only enable this flag if you are certain that the default* schema is incompatible with your use case.* If set to true, you must manually create a schema for the data store before* any documents can be ingested.* This flag cannot be specified if `data_store.starting_schema` is specified.*/// const skipDefaultSchemaCreation = true// Imports the Discoveryengine libraryconst{DataStoreServiceClient}=require('@google-cloud/discoveryengine').v1;// Instantiates a clientconstdiscoveryengineClient=newDataStoreServiceClient();asyncfunctioncallCreateDataStore(){// Construct requestconstrequest={parent,dataStore,dataStoreId,};// Run requestconst[operation]=awaitdiscoveryengineClient.createDataStore(request);const[response]=awaitoperation.promise();console.log(response);}callCreateDataStore();
Import documents
/*** This snippet has been automatically generated and should be regarded as a code template only.* It will require modifications to work.* It may require correct/in-range values for request initialization.* TODO(developer): Uncomment these variables before running the sample.*//*** The Inline source for the input content for documents.*/// const inlineSource = {}/*** Cloud Storage location for the input content.*/// const gcsSource = {}/*** BigQuery input source.*/// const bigquerySource = {}/*** FhirStore input source.*/// const fhirStoreSource = {}/*** Spanner input source.*/// const spannerSource = {}/*** Cloud SQL input source.*/// const cloudSqlSource = {}/*** Firestore input source.*/// const firestoreSource = {}/*** AlloyDB input source.*/// const alloyDbSource = {}/*** Cloud Bigtable input source.*/// const bigtableSource = {}/*** Required. The parent branch resource name, such as* `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}`.* Requires create/update permission.*/// const parent = 'abc123'/*** The desired location of errors incurred during the Import.*/// const errorConfig = {}/*** The mode of reconciliation between existing documents and the documents to* be imported. Defaults to* ReconciliationMode.INCREMENTAL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL.*/// const reconciliationMode = {}/*** Indicates which fields in the provided imported documents to update. If* not set, the default is to update all fields.*/// const updateMask = {}/*** Whether to automatically generate IDs for the documents if absent.* If set to `true`,* Document.id google.cloud.discoveryengine.v1.Document.id s are* automatically generated based on the hash of the payload, where IDs may not* be consistent during multiple imports. In which case* ReconciliationMode.FULL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.FULL* is highly recommended to avoid duplicate contents. If unset or set to* `false`, Document.id google.cloud.discoveryengine.v1.Document.id s have* to be specified using* id_field google.cloud.discoveryengine.v1.ImportDocumentsRequest.id_field,* otherwise, documents without IDs fail to be imported.* Supported data sources:* * GcsSource google.cloud.discoveryengine.v1.GcsSource.* GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema* must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.* * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.* BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema* must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.* * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.* * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.* * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.* * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.*/// const autoGenerateIds = true/*** The field indicates the ID field or column to be used as unique IDs of* the documents.* For GcsSource google.cloud.discoveryengine.v1.GcsSource it is the key of* the JSON field. For instance, `my_id` for JSON `{"my_id": "some_uuid"}`.* For others, it may be the column name of the table where the unique ids are* stored.* The values of the JSON field or the table column are used as the* Document.id google.cloud.discoveryengine.v1.Document.id s. The JSON field* or the table column must be of string type, and the values must be set as* valid strings conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)* with 1-63 characters. Otherwise, documents without valid IDs fail to be* imported.* Only set this field when* auto_generate_ids google.cloud.discoveryengine.v1.ImportDocumentsRequest.auto_generate_ids* is unset or set as `false`. Otherwise, an INVALID_ARGUMENT error is thrown.* If it is unset, a default value `_id` is used when importing from the* allowed data sources.* Supported data sources:* * GcsSource google.cloud.discoveryengine.v1.GcsSource.* GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema* must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.* * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.* BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema* must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.* * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.* * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.* * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.* * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.*/// const idField = 'abc123'// Imports the Discoveryengine libraryconst{DocumentServiceClient}=require('@google-cloud/discoveryengine').v1;// Instantiates a clientconstdiscoveryengineClient=newDocumentServiceClient();asyncfunctioncallImportDocuments(){// Construct requestconstrequest={parent,};// Run requestconst[operation]=awaitdiscoveryengineClient.importDocuments(request);const[response]=awaitoperation.promise();console.log(response);}callImportDocuments();
fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"defcreate_data_store_sample(project_id:str,location:str,data_store_id:str,)->str:# For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DataStoreServiceClient(client_options=client_options)# The full resource name of the collection# e.g. projects/{project}/locations/{location}/collections/default_collectionparent=client.collection_path(project=project_id,location=location,collection="default_collection",)data_store=discoveryengine.DataStore(display_name="My Data Store",# Options: GENERIC, MEDIA, HEALTHCARE_FHIRindustry_vertical=discoveryengine.IndustryVertical.GENERIC,# Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHATsolution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],# TODO(developer): Update content_config based on data store type.# Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITEcontent_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,)request=discoveryengine.CreateDataStoreRequest(parent=parent,data_store_id=data_store_id,data_store=data_store,# Optional: For Advanced Site Search Only# create_advanced_site_search=True,)# Make the requestoperation=client.create_data_store(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.CreateDataStoreMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)returnoperation.operation.name
Import documents
fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"# bigquery_dataset = "YOUR_BIGQUERY_DATASET"# bigquery_table = "YOUR_BIGQUERY_TABLE"# For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DocumentServiceClient(client_options=client_options)# The full resource name of the search engine branch.# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}parent=client.branch_path(project=project_id,location=location,data_store=data_store_id,branch="default_branch",)request=discoveryengine.ImportDocumentsRequest(parent=parent,bigquery_source=discoveryengine.BigQuerySource(project_id=project_id,dataset_id=bigquery_dataset,table_id=bigquery_table,data_schema="custom",),# Options: `FULL`, `INCREMENTAL`reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,)# Make the requestoperation=client.import_documents(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.ImportDocumentsMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)
require"google/cloud/discovery_engine/v1"### Snippet for the create_data_store call in the DataStoreService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client#create_data_store.#defcreate_data_store# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::DiscoveryEngine::V1::CreateDataStoreRequest.new# Call the create_data_store method.result=client.create_data_storerequest# The returned object is of type Gapic::Operation. You can use it to# check the status of an operation, cancel it, or wait for results.# Here is how to wait for a response.result.wait_until_done!timeout:60ifresult.response?presult.responseelseputs"No response received."endend
Import documents
require"google/cloud/discovery_engine/v1"### Snippet for the import_documents call in the DocumentService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::DiscoveryEngine::V1::DocumentService::Client#import_documents.#defimport_documents# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::DiscoveryEngine::V1::DocumentService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::DiscoveryEngine::V1::ImportDocumentsRequest.new# Call the import_documents method.result=client.import_documentsrequest# The returned object is of type Gapic::Operation. You can use it to# check the status of an operation, cancel it, or wait for results.# Here is how to wait for a response.result.wait_until_done!timeout:60ifresult.response?presult.responseelseputs"No response received."endend
The following procedure describes how to create a data connector that associates
a BigQuery dataset with a Vertex AI Search data
connector and how to specify a table on the dataset for each data store you want
to create. Data stores that are children of data connectors are calledentitydata stores.
Data from the dataset is synced periodically to the entity data stores. You can
specify synchronization daily, every three days, or every five days.
Console
To use the Google Cloud console to create a connector that periodically syncs data
from a BigQuery dataset to Vertex AI Search, follow these
steps:
In the Google Cloud console, go to theAgent Builderpage.
Select theSync frequency, how often you want the
Vertex AI Search connector to sync with the BigQuery
dataset. You can change the frequency later.
In theBigQuery dataset pathfield, clickBrowse, select the dataset
that contains the tables that you haveprepared for
ingesting. Alternatively, enter the table location directly
in theBigQuery pathfield. The format for the path isprojectname.datasetname.
In theTables to syncfield, clickBrowse, and then select a table
that contains the data that you want for your data store.
If there are additional tables in the dataset that that you want to use for
data stores, clickAdd tableand specify those tables too.
ClickContinue.
Choose a region for your data store, enter a name for your data connector,
and clickCreate.
You have now created a data connector, which will periodically sync data
with the BigQuery dataset. And, you have created one or more entity
data stores. The data stores have the same names as the BigQuery
tables.
To check the status of your ingestion, go to theData Storespage
and click your data connector name to see details about it on itsDatapage >Data ingestion activitytab. When the status column on theActivitytab changes fromIn progresstosucceeded, the first
ingestion is complete.
Depending on the size of your data, ingestion can take several
minutes to several hours.
After you set up your data source and import data the first time, the data store
syncs data from that source at a frequency that you select during setup.
About an hour after the data connector is created, the first sync occurs.
The next sync then occurs around 24 hours, 72 hours,
or 120 hours later.
Next steps
To attach your data store to an app, create an app and select your data store
following the steps inCreate a search app.
To preview how your search results appear after your app and data store are
set up, seeGet search results.
Import from Cloud Storage
You can create data stores from Cloud Storage tables in two ways:
One-time ingestion: You import data from a Cloud Storage folder or file
into a data store. The data in the data store doesn't change unless you
manuallyrefresh the data.
Periodic ingestion: You import data from a Cloud Storage folder or
file, and you set a sync frequency that determines how often the data
store is updated with the most recent data from that Cloud Storage
location.
The following table compares the two ways that you can import Cloud Storage
data into Vertex AI Search data stores.
One-time ingestion
Periodic ingestion
Generally available (GA).
Public preview.
Data must be refreshed manually.
Data updates automatically every one, three, or five days. Data cannot be
manually refreshed.
Vertex AI Search creates a single data store from one
folder or file in Cloud Storage.
Vertex AI Search creates adata connector, and
associates a data store (called anentitydata store) with it for
the file or folder that is specified. Each Cloud Storage data connector
can have a single entity data store.
Data from multiple files, folders, and buckets can be combined in one
data store by first ingesting data from one Cloud Storage location and
then more data from another location.
Because manual data import is not supported, the data in an entity
data store can only be sourced from one Cloud Storage file or folder.
Optional: If you selected unstructured documents, you can select parsing and
chunking options for your documents. To compare parsers, seeParse
documents. For information about chunking seeChunk documents for
RAG.
To select a parser, expandDocument processing optionsand specify the
parser options that you want to use.
ClickCreate.
To check the status of your ingestion, go to theData Storespage
and click your data store name to see details about it on itsDatapage.
When the status column on theActivitytab changes fromIn progresstoImport completed, the ingestion is complete.
Depending on the size of your data, ingestion can take several
minutes or several hours.
REST
To use the command line to create a data store and ingest data from
Cloud Storage, follow these steps.
DATA_STORE_ID: the ID of the Vertex AI Search data store that you want to create. This ID can contain only lowercase
letters, digits, underscores, and hyphens.
DATA_STORE_DISPLAY_NAME: the display name of the Vertex AI
Search data store that you want to create.
Optional: To configure document parsing or to turn on document
chunking for RAG, specify thedocumentProcessingConfigobject and include it in your data store creation request. Configuring an
OCR parser for PDFs is recommended if you're ingesting scanned PDFs. For how
to configure parsing or chunking options, seeParse and chunk
documents.
DATA_STORE_ID: the ID of the Vertex AI Search data store.
INPUT_FILE_PATTERN: A file pattern in Cloud Storage
containing your documents.
For structured data or for unstructured data with metadata,
an example of the input file pattern isgs://<your-gcs-bucket>/directory/object.jsonand an example of
pattern matching one or more files isgs://<your-gcs-bucket>/directory/*.json.
For unstructured documents, an example isgs://<your-gcs-bucket>/directory/*.pdf. Each file that is matched
by the pattern becomes a document.
If<your-gcs-bucket>is not underPROJECT_ID, you
need to give the service accountservice-<project
number>@gcp-sa-discoveryengine.iam.gserviceaccount.com"Storage
Object Viewer" permissions for the Cloud Storage bucket. For
example, if you are importing a Cloud Storage bucket from
source project "123" to destination project "456", giveservice-456@gcp-sa-discoveryengine.iam.gserviceaccount.compermissions on the Cloud Storage bucket under project "123".
DATA_SCHEMA: Optional. Values aredocument,custom,csv, andcontent. The default isdocument.
document: Upload unstructured data with metadata for
unstructured documents. Each line of the file has to follow one
of the following formats. You can define the ID of each document:
custom: Upload JSON for structured documents. The data is
organized according to a schema. You can specify the schema;
otherwise it is auto-detected. You can put the JSON string of the
document in a consistent format directly in each line, and
Vertex AI Search automatically generates the IDs
for each document imported.
content: Upload unstructured documents (PDF, HTML, DOC, TXT,
PPTX). The ID of each document is automatically generated as the
first 128 bits of SHA256(GCS_URI) encoded as a hex string. You can
specify multiple input file patterns as long as the matched files
don't exceed the 100K files limit.
csv: Include a header row in your CSV file,
with each header mapped to a document field. Specify the path to
the CSV file using theinputUrisfield.
ERROR_DIRECTORY: Optional. A Cloud Storage directory
for error information about the import—for example,gs://<your-gcs-bucket>/directory/import_errors. Google recommends
leaving this field empty to let Vertex AI Search
automatically create a temporary directory.
RECONCILIATION_MODE: Optional. Values areFULLandINCREMENTAL. Default isINCREMENTAL. SpecifyingINCREMENTALcauses an incremental refresh of data from Cloud Storage to your
data store. This does an upsert operation, which adds new documents
and replaces existing documents with updated documents with the same
ID. SpecifyingFULLcauses a full rebase of the documents in your
data store. In other words, new and updated documents are added to
your data store, and documents that are not in Cloud Storage are
removed from your data store. TheFULLmode is helpful if you want
to automatically delete documents that you no longer need.
AUTO_GENERATE_IDS: Optional. Specifies whether to
automatically generate document IDs. If set totrue, document IDs
are generated based on a hash of the payload. Note that generated
document IDs might not remain consistent over multiple imports. If
you auto-generate IDs over multiple imports, Google highly
recommends settingreconciliationModetoFULLto maintain
consistent document IDs.
SpecifyautoGenerateIdsonly whengcsSource.dataSchemais set tocustomorcsv. Otherwise anINVALID_ARGUMENTerror is
returned. If you don't specifyautoGenerateIdsor set it tofalse, you must specifyidField. Otherwise the documents fail to
import.
ID_FIELD: Optional. Specifies which fields are the
document IDs. For Cloud Storage source documents,idFieldspecifies the name in the JSON fields that are document IDs. For
example, if{"my_id":"some_uuid"}is the document ID field in one
of your documents, specify"idField":"my_id". This identifies all
JSON fields with the name"my_id"as document IDs.
Specify this field only when: (1)gcsSource.dataSchemais set tocustomorcsv, and (2)auto_generate_idsis set tofalseor
is unspecified. Otherwise anINVALID_ARGUMENTerror is returned.
Note that the value of the Cloud Storage JSON field must be of
string type, must be between 1-63 characters, and must conform toRFC-1034. Otherwise, the
documents fail to import.
Note that the JSON field name specified byid_fieldmust be of
string type, must be between 1 and 63 characters, and must conform
toRFC-1034. Otherwise, the
documents fail to import.
usingGoogle.Cloud.DiscoveryEngine.V1;usingGoogle.LongRunning;publicsealedpartialclassGeneratedDataStoreServiceClientSnippets{/// <summary>Snippet for CreateDataStore</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in/// https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidCreateDataStoreRequestObject(){// Create clientDataStoreServiceClientdataStoreServiceClient=DataStoreServiceClient.Create();// Initialize request argument(s)CreateDataStoreRequestrequest=newCreateDataStoreRequest{ParentAsCollectionName=CollectionName.FromProjectLocationCollection("[PROJECT]","[LOCATION]","[COLLECTION]"),DataStore=newDataStore(),DataStoreId="",CreateAdvancedSiteSearch=false,SkipDefaultSchemaCreation=false,};// Make the requestOperation<DataStore,CreateDataStoreMetadata>response=dataStoreServiceClient.CreateDataStore(request);// Poll until the returned long-running operation is completeOperation<DataStore,CreateDataStoreMetadata>completedResponse=response.PollUntilCompleted();// Retrieve the operation resultDataStoreresult=completedResponse.Result;// Or get the name of the operationstringoperationName=response.Name;// This name can be stored, then the long-running operation retrieved later by nameOperation<DataStore,CreateDataStoreMetadata>retrievedResponse=dataStoreServiceClient.PollOnceCreateDataStore(operationName);// Check if the retrieved long-running operation has completedif(retrievedResponse.IsCompleted){// If it has completed, then access the resultDataStoreretrievedResult=retrievedResponse.Result;}}}
Import documents
usingGoogle.Cloud.DiscoveryEngine.V1;usingGoogle.LongRunning;usingGoogle.Protobuf.WellKnownTypes;publicsealedpartialclassGeneratedDocumentServiceClientSnippets{/// <summary>Snippet for ImportDocuments</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in/// https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidImportDocumentsRequestObject(){// Create clientDocumentServiceClientdocumentServiceClient=DocumentServiceClient.Create();// Initialize request argument(s)ImportDocumentsRequestrequest=newImportDocumentsRequest{ParentAsBranchName=BranchName.FromProjectLocationDataStoreBranch("[PROJECT]","[LOCATION]","[DATA_STORE]","[BRANCH]"),InlineSource=newImportDocumentsRequest.Types.InlineSource(),ErrorConfig=newImportErrorConfig(),ReconciliationMode=ImportDocumentsRequest.Types.ReconciliationMode.Unspecified,UpdateMask=newFieldMask(),AutoGenerateIds=false,IdField="",};// Make the requestOperation<ImportDocumentsResponse,ImportDocumentsMetadata>response=documentServiceClient.ImportDocuments(request);// Poll until the returned long-running operation is completeOperation<ImportDocumentsResponse,ImportDocumentsMetadata>completedResponse=response.PollUntilCompleted();// Retrieve the operation resultImportDocumentsResponseresult=completedResponse.Result;// Or get the name of the operationstringoperationName=response.Name;// This name can be stored, then the long-running operation retrieved later by nameOperation<ImportDocumentsResponse,ImportDocumentsMetadata>retrievedResponse=documentServiceClient.PollOnceImportDocuments(operationName);// Check if the retrieved long-running operation has completedif(retrievedResponse.IsCompleted){// If it has completed, then access the resultImportDocumentsResponseretrievedResult=retrievedResponse.Result;}}}
packagemainimport("context"discoveryengine"cloud.google.com/go/discoveryengine/apiv1"discoveryenginepb"cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in:// https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=discoveryengine.NewDataStoreClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&discoveryenginepb.CreateDataStoreRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#CreateDataStoreRequest.}op,err:=c.CreateDataStore(ctx,req)iferr!=nil{// TODO: Handle error.}resp,err:=op.Wait(ctx)iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp}
Import documents
packagemainimport("context"discoveryengine"cloud.google.com/go/discoveryengine/apiv1"discoveryenginepb"cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in:// https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=discoveryengine.NewDocumentClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&discoveryenginepb.ImportDocumentsRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#ImportDocumentsRequest.}op,err:=c.ImportDocuments(ctx,req)iferr!=nil{// TODO: Handle error.}resp,err:=op.Wait(ctx)iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp}
importcom.google.cloud.discoveryengine.v1.CollectionName;importcom.google.cloud.discoveryengine.v1.CreateDataStoreRequest;importcom.google.cloud.discoveryengine.v1.DataStore;importcom.google.cloud.discoveryengine.v1.DataStoreServiceClient;publicclassSyncCreateDataStore{publicstaticvoidmain(String[]args)throwsException{syncCreateDataStore();}publicstaticvoidsyncCreateDataStore()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DataStoreServiceClientdataStoreServiceClient=DataStoreServiceClient.create()){CreateDataStoreRequestrequest=CreateDataStoreRequest.newBuilder().setParent(CollectionName.of("[PROJECT]","[LOCATION]","[COLLECTION]").toString()).setDataStore(DataStore.newBuilder().build()).setDataStoreId("dataStoreId929489618").setCreateAdvancedSiteSearch(true).setSkipDefaultSchemaCreation(true).build();DataStoreresponse=dataStoreServiceClient.createDataStoreAsync(request).get();}}}
Import documents
importcom.google.cloud.discoveryengine.v1.BranchName;importcom.google.cloud.discoveryengine.v1.DocumentServiceClient;importcom.google.cloud.discoveryengine.v1.ImportDocumentsRequest;importcom.google.cloud.discoveryengine.v1.ImportDocumentsResponse;importcom.google.cloud.discoveryengine.v1.ImportErrorConfig;importcom.google.protobuf.FieldMask;publicclassSyncImportDocuments{publicstaticvoidmain(String[]args)throwsException{syncImportDocuments();}publicstaticvoidsyncImportDocuments()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DocumentServiceClientdocumentServiceClient=DocumentServiceClient.create()){ImportDocumentsRequestrequest=ImportDocumentsRequest.newBuilder().setParent(BranchName.ofProjectLocationDataStoreBranchName("[PROJECT]","[LOCATION]","[DATA_STORE]","[BRANCH]").toString()).setErrorConfig(ImportErrorConfig.newBuilder().build()).setUpdateMask(FieldMask.newBuilder().build()).setAutoGenerateIds(true).setIdField("idField1629396127").build();ImportDocumentsResponseresponse=documentServiceClient.importDocumentsAsync(request).get();}}}
/*** This snippet has been automatically generated and should be regarded as a code template only.* It will require modifications to work.* It may require correct/in-range values for request initialization.* TODO(developer): Uncomment these variables before running the sample.*//*** Required. The parent resource name, such as* `projects/{project}/locations/{location}/collections/{collection}`.*/// const parent = 'abc123'/*** Required. The DataStore google.cloud.discoveryengine.v1.DataStore to* create.*/// const dataStore = {}/*** Required. The ID to use for the* DataStore google.cloud.discoveryengine.v1.DataStore, which will become* the final component of the* DataStore google.cloud.discoveryengine.v1.DataStore's resource name.* This field must conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)* standard with a length limit of 63 characters. Otherwise, an* INVALID_ARGUMENT error is returned.*/// const dataStoreId = 'abc123'/*** A boolean flag indicating whether user want to directly create an advanced* data store for site search.* If the data store is not configured as site* search (GENERIC vertical and PUBLIC_WEBSITE content_config), this flag will* be ignored.*/// const createAdvancedSiteSearch = true/*** A boolean flag indicating whether to skip the default schema creation for* the data store. Only enable this flag if you are certain that the default* schema is incompatible with your use case.* If set to true, you must manually create a schema for the data store before* any documents can be ingested.* This flag cannot be specified if `data_store.starting_schema` is specified.*/// const skipDefaultSchemaCreation = true// Imports the Discoveryengine libraryconst{DataStoreServiceClient}=require('@google-cloud/discoveryengine').v1;// Instantiates a clientconstdiscoveryengineClient=newDataStoreServiceClient();asyncfunctioncallCreateDataStore(){// Construct requestconstrequest={parent,dataStore,dataStoreId,};// Run requestconst[operation]=awaitdiscoveryengineClient.createDataStore(request);const[response]=awaitoperation.promise();console.log(response);}callCreateDataStore();
Import documents
/*** This snippet has been automatically generated and should be regarded as a code template only.* It will require modifications to work.* It may require correct/in-range values for request initialization.* TODO(developer): Uncomment these variables before running the sample.*//*** The Inline source for the input content for documents.*/// const inlineSource = {}/*** Cloud Storage location for the input content.*/// const gcsSource = {}/*** BigQuery input source.*/// const bigquerySource = {}/*** FhirStore input source.*/// const fhirStoreSource = {}/*** Spanner input source.*/// const spannerSource = {}/*** Cloud SQL input source.*/// const cloudSqlSource = {}/*** Firestore input source.*/// const firestoreSource = {}/*** AlloyDB input source.*/// const alloyDbSource = {}/*** Cloud Bigtable input source.*/// const bigtableSource = {}/*** Required. The parent branch resource name, such as* `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}`.* Requires create/update permission.*/// const parent = 'abc123'/*** The desired location of errors incurred during the Import.*/// const errorConfig = {}/*** The mode of reconciliation between existing documents and the documents to* be imported. Defaults to* ReconciliationMode.INCREMENTAL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL.*/// const reconciliationMode = {}/*** Indicates which fields in the provided imported documents to update. If* not set, the default is to update all fields.*/// const updateMask = {}/*** Whether to automatically generate IDs for the documents if absent.* If set to `true`,* Document.id google.cloud.discoveryengine.v1.Document.id s are* automatically generated based on the hash of the payload, where IDs may not* be consistent during multiple imports. In which case* ReconciliationMode.FULL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.FULL* is highly recommended to avoid duplicate contents. If unset or set to* `false`, Document.id google.cloud.discoveryengine.v1.Document.id s have* to be specified using* id_field google.cloud.discoveryengine.v1.ImportDocumentsRequest.id_field,* otherwise, documents without IDs fail to be imported.* Supported data sources:* * GcsSource google.cloud.discoveryengine.v1.GcsSource.* GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema* must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.* * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.* BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema* must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.* * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.* * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.* * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.* * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.*/// const autoGenerateIds = true/*** The field indicates the ID field or column to be used as unique IDs of* the documents.* For GcsSource google.cloud.discoveryengine.v1.GcsSource it is the key of* the JSON field. For instance, `my_id` for JSON `{"my_id": "some_uuid"}`.* For others, it may be the column name of the table where the unique ids are* stored.* The values of the JSON field or the table column are used as the* Document.id google.cloud.discoveryengine.v1.Document.id s. The JSON field* or the table column must be of string type, and the values must be set as* valid strings conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)* with 1-63 characters. Otherwise, documents without valid IDs fail to be* imported.* Only set this field when* auto_generate_ids google.cloud.discoveryengine.v1.ImportDocumentsRequest.auto_generate_ids* is unset or set as `false`. Otherwise, an INVALID_ARGUMENT error is thrown.* If it is unset, a default value `_id` is used when importing from the* allowed data sources.* Supported data sources:* * GcsSource google.cloud.discoveryengine.v1.GcsSource.* GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema* must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.* * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.* BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema* must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.* * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.* * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.* * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.* * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.*/// const idField = 'abc123'// Imports the Discoveryengine libraryconst{DocumentServiceClient}=require('@google-cloud/discoveryengine').v1;// Instantiates a clientconstdiscoveryengineClient=newDocumentServiceClient();asyncfunctioncallImportDocuments(){// Construct requestconstrequest={parent,};// Run requestconst[operation]=awaitdiscoveryengineClient.importDocuments(request);const[response]=awaitoperation.promise();console.log(response);}callImportDocuments();
fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"defcreate_data_store_sample(project_id:str,location:str,data_store_id:str,)->str:# For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DataStoreServiceClient(client_options=client_options)# The full resource name of the collection# e.g. projects/{project}/locations/{location}/collections/default_collectionparent=client.collection_path(project=project_id,location=location,collection="default_collection",)data_store=discoveryengine.DataStore(display_name="My Data Store",# Options: GENERIC, MEDIA, HEALTHCARE_FHIRindustry_vertical=discoveryengine.IndustryVertical.GENERIC,# Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHATsolution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],# TODO(developer): Update content_config based on data store type.# Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITEcontent_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,)request=discoveryengine.CreateDataStoreRequest(parent=parent,data_store_id=data_store_id,data_store=data_store,# Optional: For Advanced Site Search Only# create_advanced_site_search=True,)# Make the requestoperation=client.create_data_store(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.CreateDataStoreMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)returnoperation.operation.name
Import documents
fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"# Examples:# - Unstructured documents# - `gs://bucket/directory/file.pdf`# - `gs://bucket/directory/*.pdf`# - Unstructured documents with JSONL Metadata# - `gs://bucket/directory/file.json`# - Unstructured documents with CSV Metadata# - `gs://bucket/directory/file.csv`# gcs_uri = "YOUR_GCS_PATH"# For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DocumentServiceClient(client_options=client_options)# The full resource name of the search engine branch.# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}parent=client.branch_path(project=project_id,location=location,data_store=data_store_id,branch="default_branch",)request=discoveryengine.ImportDocumentsRequest(parent=parent,gcs_source=discoveryengine.GcsSource(# Multiple URIs are supportedinput_uris=[gcs_uri],# Options:# - `content` - Unstructured documents (PDF, HTML, DOC, TXT, PPTX)# - `custom` - Unstructured documents with custom JSONL metadata# - `document` - Structured documents in the discoveryengine.Document format.# - `csv` - Unstructured documents with CSV metadatadata_schema="content",),# Options: `FULL`, `INCREMENTAL`reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,)# Make the requestoperation=client.import_documents(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.ImportDocumentsMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)
require"google/cloud/discovery_engine/v1"### Snippet for the create_data_store call in the DataStoreService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client#create_data_store.#defcreate_data_store# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::DiscoveryEngine::V1::CreateDataStoreRequest.new# Call the create_data_store method.result=client.create_data_storerequest# The returned object is of type Gapic::Operation. You can use it to# check the status of an operation, cancel it, or wait for results.# Here is how to wait for a response.result.wait_until_done!timeout:60ifresult.response?presult.responseelseputs"No response received."endend
Import documents
require"google/cloud/discovery_engine/v1"### Snippet for the import_documents call in the DocumentService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::DiscoveryEngine::V1::DocumentService::Client#import_documents.#defimport_documents# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::DiscoveryEngine::V1::DocumentService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::DiscoveryEngine::V1::ImportDocumentsRequest.new# Call the import_documents method.result=client.import_documentsrequest# The returned object is of type Gapic::Operation. You can use it to# check the status of an operation, cancel it, or wait for results.# Here is how to wait for a response.result.wait_until_done!timeout:60ifresult.response?presult.responseelseputs"No response received."endend
The following procedure describes how to create a data connector that associates
a Cloud Storage location with a Vertex AI Search data
connector and how to specify a folder or file in that location for the data
store that you want to create. Data stores that are children of data connectors
are calledentitydata stores.
Data is synced periodically to the entity data store. You can specify
synchronization daily, every three days, or every five days.
Console
In the Google Cloud console, go to theAgent Builderpage.
Select theSynchronization frequency, how often you want the
Vertex AI Search connector to sync with the Cloud Storage
location. You can change the frequency later.
In theSelect a folder or file you want to importsection, selectFolderorFile.
ClickBrowseand choose the data you haveprepared for ingesting, and then clickSelect.
Alternatively, enter the location directly in thegs://field.
ClickContinue.
Choose a region for your data connector.
Enter a name for your data connector.
Optional: If you selected unstructured documents, you can select parsing and
chunking options for your documents. To compare parsers, seeParse
documents. For information about chunking seeChunk documents for
RAG.
To select a parser, expandDocument processing optionsand specify the
parser options that you want to use.
ClickCreate.
You have now created a data connector, which will periodically sync data
with the Cloud Storage location. You have also created an entity
data store, which is namedgcs_store.
To check the status of your ingestion, go to theData Storespage and
click your data connector name to see details about it on itsDatapage
Data ingestion activitytab. When the status column on theData
ingestion activitytab changes fromIn progresstosucceeded, the
first ingestion is complete.
Depending on the size of your data, ingestion can take several
minutes to several hours.
After you set up your data source and import data the first time, data is
synced from that source at a frequency that you select during setup.
About an hour after the data connector is created, the first sync occurs.
The next sync then occurs around 24 hours, 72 hours,
or 120 hours later.
Next steps
To attach your data store to an app, create an app and select your data store
following the steps inCreate a search app.
To preview how your search results appear after your app and data store are
set up, seeGet search results.
Sync from Google Drive
To sync data from Google Drive, use the following steps to create
a data store and ingest data using the Google Cloud console.
Data from Google Drive continuously syncs to Vertex AI Search after
you create your data store.
Before you begin:
You must be signed into the Google Cloud console with the
same account that you use for the Google Drive instance that you plan to
connect. Vertex AI Search uses your Google Workspace customer ID
to connect to Google Drive.
Set up access control for Google Drive. For information
about setting up access control, seeUse data source access control.
Console
To use the console to make Google Drive data searchable, follow these
steps:
In the Google Cloud console, go to theAgent Builderpage.
ClickCreate. Depending on the size of your data, ingestion can take
several minutes to several hours. Wait at least an hour before using your
data store for searching.
Next steps
To attach your data store to an app, create an app and select your data store
following the steps inCreate a search app.
To preview how your search results appear after your app and data store are
set up, seeGet search results.
Import from Cloud SQL
To ingest data from Cloud SQL, use the following steps to set up
Cloud SQL access, create a data store, and ingest data.
Set up staging bucket access for Cloud SQL instances
When ingesting data from Cloud SQL, data is first staged to a
Cloud Storage bucket. Follow these steps to give a Cloud SQL
instance access to
Cloud Storage buckets.
Click the Cloud SQL instance that you plan to import from.
Copy the identifier for the instance's service account, which looks like an
email address—for example,p9876-abcd33f@gcp-sa-cloud-sql.iam.gserviceaccount.com.
To give Vertex AI Search access to Cloud SQL data that's in a
different project, follow these steps:
Replace the followingPROJECT_NUMBERvariable with your
Vertex AI Search project number, and then copy the contents of the
code block. This is your Vertex AI Search service account
identifier:
Specify the project ID, instance ID, database ID, and table ID of the data
that you plan to import.
ClickBrowseand choose an intermediate Cloud Storage location to
export data to, and then clickSelect. Alternatively, enter the location
directly in thegs://field.
Select whether to turn on serverless export. Serverless export incurs
additional cost. For information about serverless export, seeMinimize the
performance impact of exportsin
the Cloud SQL documentation.
ClickContinue.
Choose a region for your data store.
Enter a name for your data store.
ClickCreate.
To check the status of your ingestion, go to theData Storespage
and click your data store name to see details about it on itsDatapage.
When the status column on theActivitytab changes fromIn progresstoImport completed, the ingestion is complete.
Depending on the size of your data, ingestion can take several
minutes or several hours.
REST
To use the command line to create a data store and ingest data from
Cloud SQL, follow these steps:
PROJECT_ID: The ID of your Vertex AI Search
project.
DATA_STORE_ID: The ID of the data store. The ID can
contain only lowercase letters, digits, underscores, and hyphens.
SQL_PROJECT_ID: The ID of your Cloud SQL
project.
INSTANCE_ID: The ID of your Cloud SQL instance.
DATABASE_ID: The ID of your Cloud SQL database.
TABLE_ID: The ID of your Cloud SQL table.
STAGING_DIRECTORY: Optional. A Cloud Storage
directory—for example,gs://<your-gcs-bucket>/directory/import_errors.
RECONCILIATION_MODE: Optional. Values areFULLandINCREMENTAL. Default isINCREMENTAL. SpecifyingINCREMENTALcauses an incremental refresh of data from Cloud SQL to your
data store. This does an upsert operation, which adds new documents and
replaces existing documents with updated documents with the same ID.
SpecifyingFULLcauses a full rebase of the documents in your data
store. In other words, new and updated documents are added to your data
store, and documents that are not in Cloud SQL are removed
from your data store. TheFULLmode is helpful if you want to
automatically delete documents that you no longer need.
fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"defcreate_data_store_sample(project_id:str,location:str,data_store_id:str,)->str:# For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DataStoreServiceClient(client_options=client_options)# The full resource name of the collection# e.g. projects/{project}/locations/{location}/collections/default_collectionparent=client.collection_path(project=project_id,location=location,collection="default_collection",)data_store=discoveryengine.DataStore(display_name="My Data Store",# Options: GENERIC, MEDIA, HEALTHCARE_FHIRindustry_vertical=discoveryengine.IndustryVertical.GENERIC,# Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHATsolution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],# TODO(developer): Update content_config based on data store type.# Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITEcontent_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,)request=discoveryengine.CreateDataStoreRequest(parent=parent,data_store_id=data_store_id,data_store=data_store,# Optional: For Advanced Site Search Only# create_advanced_site_search=True,)# Make the requestoperation=client.create_data_store(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.CreateDataStoreMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)returnoperation.operation.name
Import documents
fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"# sql_project_id = "YOUR_SQL_PROJECT_ID"# sql_instance_id = "YOUR_SQL_INSTANCE_ID"# sql_database_id = "YOUR_SQL_DATABASE_ID"# sql_table_id = "YOUR_SQL_TABLE_ID"# For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DocumentServiceClient(client_options=client_options)# The full resource name of the search engine branch.# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}parent=client.branch_path(project=project_id,location=location,data_store=data_store_id,branch="default_branch",)request=discoveryengine.ImportDocumentsRequest(parent=parent,cloud_sql_source=discoveryengine.CloudSqlSource(project_id=sql_project_id,instance_id=sql_instance_id,database_id=sql_database_id,table_id=sql_table_id,),# Options: `FULL`, `INCREMENTAL`reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,)# Make the requestoperation=client.import_documents(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.ImportDocumentsMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)
Next steps
To attach your data store to an app, create an app and select your data store
following the steps inCreate a search app.
To preview how your search results appear after your app and data store are
set up, seeGet search results.
Import from Spanner
To ingest data from Spanner, use the following steps to create
a data store and ingest data using either the Google Cloud console or the API.
Set up Spanner access from a different project
If your Spanner data is in the same project as
Vertex AI Search, skip toImport data from
Spanner.
To give Vertex AI Search access to Spanner data that is
in a different project, follow these steps:
Replace the followingPROJECT_NUMBERvariable with your
Vertex AI Search project number, and then copy the contents of this
code block. This is your Vertex AI Search service account
identifier:
Switch to your Spanner project on theIAM & Adminpage
and clickGrant Access.
ForNew principals, enter the identifier for the service account and
select one of the following:
If you won't use data boost during import, select theCloud Spanner >
Cloud Spanner Database Readerrole.
If you plan to use data boost during import, select theCloud Spanner >
Cloud Spanner Database Adminrole, or a custom role with the permissions ofCloud Spanner Database Readerandspanner.databases.useDataBoost.
For information about Data Boost, seeData Boost overviewin the
Spanner documentation.
Specify the project ID, instance ID, database ID, and table ID of the data
that you plan to import.
Select whether to turn on Data Boost. For information about Data Boost, seeData Boost overviewin the
Spanner documentation.
ClickContinue.
Choose a region for your data store.
Enter a name for your data store.
ClickCreate.
To check the status of your ingestion, go to theData Storespage
and click your data store name to see details about it on itsDatapage.
When the status column on theActivitytab changes fromIn progresstoImport completed, the ingestion is complete.
Depending on the size of your data, ingestion can take several
minutes or several hours.
REST
To use the command line to create a data store and ingest data from
Spanner, follow these steps:
PROJECT_ID: The ID of your Vertex AI Search project.
DATA_STORE_ID: The ID of the data store.
SPANNER_PROJECT_ID: The ID of your Spanner
project.
INSTANCE_ID: The ID of your Spanner instance.
DATABASE_ID: The ID of your Spanner database.
TABLE_ID: The ID of your Spanner table.
DATA_BOOST_BOOLEAN: Optional. Whether to turn on Data Boost.
For information about Data Boost, seeData Boost
overviewin the
Spanner documentation.
RECONCILIATION_MODE: Optional. Values areFULLandINCREMENTAL. Default isINCREMENTAL. SpecifyingINCREMENTALcauses an incremental refresh of data from
Spanner to your data store. This does an upsert
operation, which adds new documents and replaces existing documents
with updated documents with the same ID. SpecifyingFULLcauses a
full rebase of the documents in your data store. In other words, new
and updated documents are added to your data store, and documents that
are not in Spanner are removed from your data store. TheFULLmode is helpful if you want to automatically delete documents
that you no longer need.
AUTO_GENERATE_IDS: Optional. Specifies whether to
automatically generate document IDs. If set totrue, document IDs
are generated based on a hash of the payload. Note that generated
document IDs might not remain consistent over multiple imports. If
you auto-generate IDs over multiple imports, Google highly
recommends settingreconciliationModetoFULLto maintain
consistent document IDs.
ID_FIELD: Optional. Specifies which fields are the
document IDs.
fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"defcreate_data_store_sample(project_id:str,location:str,data_store_id:str,)->str:# For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DataStoreServiceClient(client_options=client_options)# The full resource name of the collection# e.g. projects/{project}/locations/{location}/collections/default_collectionparent=client.collection_path(project=project_id,location=location,collection="default_collection",)data_store=discoveryengine.DataStore(display_name="My Data Store",# Options: GENERIC, MEDIA, HEALTHCARE_FHIRindustry_vertical=discoveryengine.IndustryVertical.GENERIC,# Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHATsolution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],# TODO(developer): Update content_config based on data store type.# Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITEcontent_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,)request=discoveryengine.CreateDataStoreRequest(parent=parent,data_store_id=data_store_id,data_store=data_store,# Optional: For Advanced Site Search Only# create_advanced_site_search=True,)# Make the requestoperation=client.create_data_store(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.CreateDataStoreMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)returnoperation.operation.name
Import documents
fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"# spanner_project_id = "YOUR_SPANNER_PROJECT_ID"# spanner_instance_id = "YOUR_SPANNER_INSTANCE_ID"# spanner_database_id = "YOUR_SPANNER_DATABASE_ID"# spanner_table_id = "YOUR_SPANNER_TABLE_ID"# For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DocumentServiceClient(client_options=client_options)# The full resource name of the search engine branch.# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}parent=client.branch_path(project=project_id,location=location,data_store=data_store_id,branch="default_branch",)request=discoveryengine.ImportDocumentsRequest(parent=parent,spanner_source=discoveryengine.SpannerSource(project_id=spanner_project_id,instance_id=spanner_instance_id,database_id=spanner_database_id,table_id=spanner_table_id,),# Options: `FULL`, `INCREMENTAL`reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,)# Make the requestoperation=client.import_documents(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.ImportDocumentsMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)
Next steps
To attach your data store to an app, create an app and select your data store
following the steps inCreate a search app.
To preview how your search results appear after your app and data store are
set up, seeGet search results.
Import from Firestore
To ingest data from Firestore, use the following steps to create
a data store and ingest data using either the Google Cloud console or the API.
If your Firestore data is in a different project than your
Vertex AI Search project, go toSet up Firestore
access.
Set up Firestore access from a different project
To give Vertex AI Search access to Firestore data that's
in a different project, follow these steps:
Replace the followingPROJECT_NUMBERvariable with your
Vertex AI Search project number, and then copy the contents of this
code block. This is your Vertex AI Search service account
identifier:
Specify the project ID, database ID, and collection ID of the data that you
plan to import.
ClickContinue.
Choose a region for your data store.
Enter a name for your data store.
ClickCreate.
To check the status of your ingestion, go to theData Storespage
and click your data store name to see details about it on itsDatapage.
When the status column on theActivitytab changes fromIn progresstoImport completed, the ingestion is complete.
Depending on the size of your data, ingestion can take several
minutes or several hours.
REST
To use the command line to create a data store and ingest data from
Firestore, follow these steps:
PROJECT_ID: The ID of your Vertex AI Search project.
DATA_STORE_ID: The ID of the data store. The ID can
contain only lowercase letters, digits, underscores, and hyphens.
FIRESTORE_PROJECT_ID: The ID of your
Firestore project.
DATABASE_ID: The ID of your Firestore
database.
COLLECTION_ID: The ID of your Firestore
collection.
RECONCILIATION_MODE: Optional. Values areFULLandINCREMENTAL. Default isINCREMENTAL. SpecifyingINCREMENTALcauses an incremental refresh of data from Firestore to your
data store. This does an upsert operation, which adds new documents and
replaces existing documents with updated documents with the same ID.
SpecifyingFULLcauses a full rebase of the documents in your data
store. In other words, new and updated documents are added to your data
store, and documents that are not in Firestore are removed
from your data store. TheFULLmode is helpful if you want to
automatically delete documents that you no longer need.
AUTO_GENERATE_IDS: Optional. Specifies whether to
automatically generate document IDs. If set totrue, document IDs
are generated based on a hash of the payload. Note that generated
document IDs might not remain consistent over multiple imports. If
you auto-generate IDs over multiple imports, Google highly
recommends settingreconciliationModetoFULLto maintain
consistent document IDs.
ID_FIELD: Optional. Specifies which fields are the
document IDs.
fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"defcreate_data_store_sample(project_id:str,location:str,data_store_id:str,)->str:# For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DataStoreServiceClient(client_options=client_options)# The full resource name of the collection# e.g. projects/{project}/locations/{location}/collections/default_collectionparent=client.collection_path(project=project_id,location=location,collection="default_collection",)data_store=discoveryengine.DataStore(display_name="My Data Store",# Options: GENERIC, MEDIA, HEALTHCARE_FHIRindustry_vertical=discoveryengine.IndustryVertical.GENERIC,# Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHATsolution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],# TODO(developer): Update content_config based on data store type.# Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITEcontent_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,)request=discoveryengine.CreateDataStoreRequest(parent=parent,data_store_id=data_store_id,data_store=data_store,# Optional: For Advanced Site Search Only# create_advanced_site_search=True,)# Make the requestoperation=client.create_data_store(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.CreateDataStoreMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)returnoperation.operation.name
Import documents
fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"# firestore_project_id = "YOUR_FIRESTORE_PROJECT_ID"# firestore_database_id = "YOUR_FIRESTORE_DATABASE_ID"# firestore_collection_id = "YOUR_FIRESTORE_COLLECTION_ID"# For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DocumentServiceClient(client_options=client_options)# The full resource name of the search engine branch.# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}parent=client.branch_path(project=project_id,location=location,data_store=data_store_id,branch="default_branch",)request=discoveryengine.ImportDocumentsRequest(parent=parent,firestore_source=discoveryengine.FirestoreSource(project_id=firestore_project_id,database_id=firestore_database_id,collection_id=firestore_collection_id,),# Options: `FULL`, `INCREMENTAL`reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,)# Make the requestoperation=client.import_documents(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.ImportDocumentsMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)
Next steps
To attach your data store to an app, create an app and select your data store
following the steps inCreate a search app.
To preview how your search results appear after your app and data store are
set up, seeGet search results.
Import from Bigtable
To ingest data from Bigtable, use the following steps to create
a data store and ingest data using the API.
Set up Bigtable access
To give Vertex AI Search access to Bigtable data that's
in a different project, follow these steps:
Replace the followingPROJECT_NUMBERvariable with your
Vertex AI Search project number, then copy the contents of this
code block. This is your Vertex AI Search service account
identifier:
PROJECT_ID: The ID of your Vertex AI Search
project.
DATA_STORE_ID: The ID of the data store. The ID can
contain only lowercase letters, digits, underscores, and hyphens.
BIGTABLE_PROJECT_ID: The ID of your
Bigtable project.
INSTANCE_ID: The ID of your Bigtable
instance.
TABLE_ID: The ID of your Bigtable
table.
KEY_FIELD_NAME: Optional but recommended. The field name to
use for the row key value after ingesting to Vertex AI Search.
KEY: Required. A string value for the column family key.
ENCODING: Optional. The encoding mode of the values when the
type is not STRING.This can be overridden for a specific column by
listing that column incolumnsand specifying an encoding for it.
COLUMN_TYPE: Optional. The type of values in this column
family.
QUALIFIER: Required. Qualifier of the column.
FIELD_NAME: Optional but recommended. The field name to use
for this column after ingesting to Vertex AI Search.
COLUMN_ENCODING: Optional. The encoding mode of the values
for a specific column when the type is not STRING.
RECONCILIATION_MODE: Optional. Values areFULLandINCREMENTAL. Default isINCREMENTAL. SpecifyingINCREMENTALcauses an incremental refresh of data from Bigtable to
your data store. This does an upsert operation, which adds new
documents and replaces existing documents with updated documents with
the same ID. SpecifyingFULLcauses a full rebase of the documents in
your data store. In other words, new and updated documents are added to
your data store, and documents that are not in Bigtable
are removed from your data store. TheFULLmode is helpful if you
want to automatically delete documents that you no longer need.
AUTO_GENERATE_IDS: Optional. Specifies whether to
automatically generate document IDs. If set totrue, document IDs
are generated based on a hash of the payload. Note that generated
document IDs might not remain consistent over multiple imports. If
you auto-generate IDs over multiple imports, Google highly
recommends settingreconciliationModetoFULLto maintain
consistent document IDs.
SpecifyautoGenerateIdsonly whenbigquerySource.dataSchemais
set tocustom. Otherwise anINVALID_ARGUMENTerror is
returned. If you don't specifyautoGenerateIdsor set it tofalse, you must specifyidField. Otherwise the documents fail to
import.
ID_FIELD: Optional. Specifies which fields are the
document IDs.
fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"defcreate_data_store_sample(project_id:str,location:str,data_store_id:str,)->str:# For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DataStoreServiceClient(client_options=client_options)# The full resource name of the collection# e.g. projects/{project}/locations/{location}/collections/default_collectionparent=client.collection_path(project=project_id,location=location,collection="default_collection",)data_store=discoveryengine.DataStore(display_name="My Data Store",# Options: GENERIC, MEDIA, HEALTHCARE_FHIRindustry_vertical=discoveryengine.IndustryVertical.GENERIC,# Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHATsolution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],# TODO(developer): Update content_config based on data store type.# Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITEcontent_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,)request=discoveryengine.CreateDataStoreRequest(parent=parent,data_store_id=data_store_id,data_store=data_store,# Optional: For Advanced Site Search Only# create_advanced_site_search=True,)# Make the requestoperation=client.create_data_store(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.CreateDataStoreMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)returnoperation.operation.name
Import documents
fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"# bigtable_project_id = "YOUR_BIGTABLE_PROJECT_ID"# bigtable_instance_id = "YOUR_BIGTABLE_INSTANCE_ID"# bigtable_table_id = "YOUR_BIGTABLE_TABLE_ID"# For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DocumentServiceClient(client_options=client_options)# The full resource name of the search engine branch.# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}parent=client.branch_path(project=project_id,location=location,data_store=data_store_id,branch="default_branch",)bigtable_options=discoveryengine.BigtableOptions(families={"family_name_1":discoveryengine.BigtableOptions.BigtableColumnFamily(type_=discoveryengine.BigtableOptions.Type.STRING,encoding=discoveryengine.BigtableOptions.Encoding.TEXT,columns=[discoveryengine.BigtableOptions.BigtableColumn(qualifier="qualifier_1".encode("utf-8"),field_name="field_name_1",),],),"family_name_2":discoveryengine.BigtableOptions.BigtableColumnFamily(type_=discoveryengine.BigtableOptions.Type.INTEGER,encoding=discoveryengine.BigtableOptions.Encoding.BINARY,),})request=discoveryengine.ImportDocumentsRequest(parent=parent,bigtable_source=discoveryengine.BigtableSource(project_id=bigtable_project_id,instance_id=bigtable_instance_id,table_id=bigtable_table_id,bigtable_options=bigtable_options,),# Options: `FULL`, `INCREMENTAL`reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,)# Make the requestoperation=client.import_documents(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.ImportDocumentsMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)
Next steps
To attach your data store to an app, create an app and select your data store
following the steps inCreate a search app.
To preview how your search results appear after your app and data store are
set up, seeGet search results.
Import from AlloyDB for PostgreSQL
To ingest data from AlloyDB for PostgreSQL, use the following steps to create
a data store and ingest data using either the Google Cloud console or the API.
Set up AlloyDB for PostgreSQL access from a different project
To give Vertex AI Search access to AlloyDB for PostgreSQL data that's
in a different project, follow these steps:
Replace the followingPROJECT_NUMBERvariable with your
Vertex AI Search project number, and then copy the contents of this
code block. This is your Vertex AI Search service account
identifier:
Specify the project ID, location ID, cluster ID, database ID, and table ID
of the data that you plan to import.
ClickContinue.
Choose a region for your data store.
Enter a name for your data store.
ClickCreate.
To check the status of your ingestion, go to theData Storespage
and click your data store name to see details about it on itsDatapage.
When the status column on theActivitytab changes fromIn progresstoImport completed, the ingestion is complete.
Depending on the size of your data, ingestion can take several
minutes or several hours.
REST
To use the command line to create a data store and ingest data from
AlloyDB for PostgreSQL, follow these steps:
PROJECT_ID: The ID of your Vertex AI Search project.
DATA_STORE_ID: The ID of the data store. The ID can
contain only lowercase letters, digits, underscores, and hyphens.
ALLOYDB_PROJECT_ID: The ID of your
AlloyDB for PostgreSQL project.
LOCATION_ID: The ID of your AlloyDB for PostgreSQL
location.
CLUSTER_ID: The ID of your AlloyDB for PostgreSQL
cluster.
DATABASE_ID: The ID of your AlloyDB for PostgreSQL
database.
TABLE_ID: The ID of your AlloyDB for PostgreSQL
table.
RECONCILIATION_MODE: Optional. Values areFULLandINCREMENTAL. Default isINCREMENTAL. SpecifyingINCREMENTALcauses an incremental refresh of data from AlloyDB for PostgreSQL to your
data store. This does an upsert operation, which adds new documents and
replaces existing documents with updated documents with the same ID.
SpecifyingFULLcauses a full rebase of the documents in your data
store. In other words, new and updated documents are added to your data
store, and documents that are not in AlloyDB for PostgreSQL are removed
from your data store. TheFULLmode is helpful if you want to
automatically delete documents that you no longer need.
AUTO_GENERATE_IDS: Optional. Specifies whether to
automatically generate document IDs. If set totrue, document IDs
are generated based on a hash of the payload. Note that generated
document IDs might not remain consistent over multiple imports. If
you auto-generate IDs over multiple imports, Google highly
recommends settingreconciliationModetoFULLto maintain
consistent document IDs.
ID_FIELD: Optional. Specifies which fields are the
document IDs.
fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"defcreate_data_store_sample(project_id:str,location:str,data_store_id:str,)->str:# For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DataStoreServiceClient(client_options=client_options)# The full resource name of the collection# e.g. projects/{project}/locations/{location}/collections/default_collectionparent=client.collection_path(project=project_id,location=location,collection="default_collection",)data_store=discoveryengine.DataStore(display_name="My Data Store",# Options: GENERIC, MEDIA, HEALTHCARE_FHIRindustry_vertical=discoveryengine.IndustryVertical.GENERIC,# Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHATsolution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],# TODO(developer): Update content_config based on data store type.# Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITEcontent_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,)request=discoveryengine.CreateDataStoreRequest(parent=parent,data_store_id=data_store_id,data_store=data_store,# Optional: For Advanced Site Search Only# create_advanced_site_search=True,)# Make the requestoperation=client.create_data_store(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.CreateDataStoreMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)returnoperation.operation.name
Import documents
fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine_v1asdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"# alloy_db_project_id = "YOUR_ALLOY_DB_PROJECT_ID"# alloy_db_location_id = "YOUR_ALLOY_DB_LOCATION_ID"# alloy_db_cluster_id = "YOUR_ALLOY_DB_CLUSTER_ID"# alloy_db_database_id = "YOUR_ALLOY_DB_DATABASE_ID"# alloy_db_table_id = "YOUR_ALLOY_DB_TABLE_ID"# For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DocumentServiceClient(client_options=client_options)# The full resource name of the search engine branch.# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}parent=client.branch_path(project=project_id,location=location,data_store=data_store_id,branch="default_branch",)request=discoveryengine.ImportDocumentsRequest(parent=parent,alloy_db_source=discoveryengine.AlloyDbSource(project_id=alloy_db_project_id,location_id=alloy_db_location_id,cluster_id=alloy_db_cluster_id,database_id=alloy_db_database_id,table_id=alloy_db_table_id,),# Options: `FULL`, `INCREMENTAL`reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,)# Make the requestoperation=client.import_documents(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.ImportDocumentsMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)
Next steps
To attach your data store to an app, create an app and select your data store
following the steps inCreate a search app.
To preview how your search results appear after your app and data store are
set up, seeGet search results.
Upload structured JSON data with the API
To directly upload a JSON document or object using the API, follow these steps.
DATA_STORE_ID: the ID of the Vertex AI Search data store that you want to create. This ID can contain only lowercase
letters, digits, underscores, and hyphens.
DATA_STORE_DISPLAY_NAME: the display name of the Vertex AI
Search data store that you want to create.
Import structured data.
There are a few approaches that you can use to upload data, including:
DOCUMENT_ID: a unique ID for the document.
This ID can be up to 63 characters long and contain only lowercase
letters, digits, underscores, and hyphens.
JSON_DOCUMENT_STRING: the JSON document as a
single string. This must conform to the JSON schema that you
provided in the previous step—for example:
ReplaceJSON_DOCUMENT_OBJECTwith the JSON document as a
JSON object. This must conform to the JSON schema that you provided
in the previous step—for example:
To attach your data store to an app, create an app and select your data store
following the steps inCreate a search app.
To preview how your search results appear after your app and data store are
set up, seeGet search results.
Troubleshoot data ingestion
If you are having problems with data ingestion, review these tips:
If you're usingcustomer-managed encryption keysand data import fails
(with error messageThe caller does not have permission), then make sure
that the CryptoKey Encrypter/Decrypter IAM role
(roles/cloudkms.cryptoKeyEncrypterDecrypter) on the key has been granted to
the Cloud Storage service agent. For more information, seeBefore you beginin "Customer-managed encryption
keys".
If you are using advanced website indexing and theDocument usagefor the
data store is much lower than you expect, then review the URL patterns that you
specified for indexing and make sure that the URL patterns specified cover the
pages that you want to index and expand them if needed. For example, if
you used*.en.example.com/*, you might need to add*.example.com/*to the
sites you want indexed.
Create a data store using Terraform
You can use Terraform to create an empty data store. After the empty data store
is created, you can ingest data into the data store using the Google Cloud console
or API commands.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-11-06 UTC."],[],[]]