Once you've deployed a VPC Network Peering or Private Service Connect index endpoint, querying it differs slightly depending on how it was deployed:
Deployed with Private Service Connect automation
For IndexEndpoints 
 deployed with Private Service Connect automation 
,
the Python SDK will automatically map the Private Service Connect
network to the appropriate endpoint. If not using the Python SDK, you must
directly connect to the created IP address for your endpoint, following the
instructions for querying a Private Service Connect manual deployment 
.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python . For more information, see the Python API reference documentation .
Deployed with Private Service Connect manual configuration
For Private Service Connect IndexEndpoints 
 deployed with a manually configured connection 
,
your endpoint is accessed using the IP address of the compute address forwarded
to your endpoint's Private Service Connect service attachment.
If not already known, you can obtain the IP address forwarded to the service
attachment URI using the  gcloud ai index-endpoints describe 
 
and  gcloud compute forwarding-rules list 
 
commands.
Make the following replacements:
- INDEX_ENDPOINT_ID : Fully qualified index endpoint ID.
- REGION : The region where your index endpoint is deployed.
SERVICE_ATTACHMENT_URI = ` gcloud ai index-endpoints describe INDEX_ENDPOINT_ID \ --region = REGION \ --format = "value(deployedIndexes.privateEndpoints.serviceAttachment)" ` gcloud compute forwarding-rules list --filter = "TARGET: ${ SERVICE_ATTACHMENT_URI } "
The output will include the internal IP address to use when querying the IndexEndpoint 
.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python . For more information, see the Python API reference documentation .
Command-line
To query a DeployedIndex 
, connect to its TARGET_IP 
at port 10000 
and call the Match 
or BatchMatch 
method. Additionally, you can query using an specific embedding ID.
The following examples use the open source tool grpc_cli 
to send gRPC
  requests to the deployed index server.
Match 
method.  ./grpc_cli call ${TARGET_IP}:10000 google.cloud.aiplatform.container.v1.MatchService.Match 'deployed_index_id: "${DEPLOYED_INDEX_ID}", float_val: [-0.1,..]' 
 
In the second example, you combine two separate queries into the same BatchMatch 
request.
 ./grpc_cli call ${TARGET_IP}:10000 google.cloud.aiplatform.container.v1.MatchService.BatchMatch 'requests: [{deployed_index_id: "${DEPLOYED_INDEX_ID}", requests: [{deployed_index_id: "${DEPLOYED_INDEX_ID}", float_val: [-0.1,..]}, {deployed_index_id: "${DEPLOYED_INDEX_ID}", float_val: [-0.2,..]}]}]' 
 
You must make calls to these APIs from a client running in the same VPC that the service was peered with .
To run a query using an embedding_id 
, use the following example.
 ./grpc_cli call ${TARGET_IP}:10000  google.cloud.aiplatform.container.v1.MatchService.Match "deployed_index_id:'"test_index1"',embedding_id: '"606431"'" 
 
In this example, you send a query using token and numeric restricts .
 ./grpc_cli call ${TARGET_IP}:10000 google.cloud.aiplatform.container.v1.MatchService.Match 'deployed_index_id: "${DEPLOYED_INDEX_ID}", float_val: [1, 1], "sparse_embedding": {"values": [111.0,111.1,111.2], "dimensions": [10,20,30]}, numeric_restricts: [{name: "double-ns", value_double: 0.3, op: LESS_EQUAL}, {name: "double-ns", value_double: -1.2, op: GREATER}, {name: "double-ns", value_double: 0., op: NOT_EQUAL}], restricts: [{name: "color", allow_tokens: ["red"]}]' 
 
To learn more, see Client libraries explained .
Console
Use these instructions to query a VPC index from the console.
- In the Vertex AI section of the Google Cloud console, go to the Deploy and Use section. Select Vector Search
- Select the VPC index you want to query. The Index info page opens.
- Scroll down to the Deployed indexes section and select the deployed index you want to query. The Deployed index info page opens.
- From the Query index section, select your query parameters. You can choose to query by a vector, or a specific data point.
- Execute the query using the open source tool grpc_cli, or by using the Vertex AI SDK for Python.
Deployed with VPC Network Peering
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python . For more information, see the Python API reference documentation .
 Note: 
The Python SDK automatically looks up the IP address for an IndexEndpoint 
deployed with VPC Network Peering.
Command-line
Each DeployedIndex 
has a TARGET_IP 
, which you can retrieve in your list of IndexEndpoints 
 
.
To query a DeployedIndex 
, connect to its TARGET_IP 
at port 10000 
and call the Match 
or BatchMatch 
method. Additionally, you can query using an specific embedding ID.
The following examples use the open source tool grpc_cli 
to send gRPC
  requests to the deployed index server.
Match 
method.  ./grpc_cli call ${TARGET_IP}:10000 google.cloud.aiplatform.container.v1.MatchService.Match 'deployed_index_id: "${DEPLOYED_INDEX_ID}", float_val: [-0.1,..]' 
 
In the second example, you combine two separate queries into the same BatchMatch 
request.
 ./grpc_cli call ${TARGET_IP}:10000 google.cloud.aiplatform.container.v1.MatchService.BatchMatch 'requests: [{deployed_index_id: "${DEPLOYED_INDEX_ID}", requests: [{deployed_index_id: "${DEPLOYED_INDEX_ID}", float_val: [-0.1,..]}, {deployed_index_id: "${DEPLOYED_INDEX_ID}", float_val: [-0.2,..]}]}]' 
 
You must make calls to these APIs from a client running in the same VPC that the service was peered with .
To run a query using an embedding_id 
, use the following example.
 ./grpc_cli call ${TARGET_IP}:10000  google.cloud.aiplatform.container.v1.MatchService.Match "deployed_index_id:'"test_index1"',embedding_id: '"606431"'" 
 
In this example, you send a query using token and numeric restricts .
 ./grpc_cli call ${TARGET_IP}:10000 google.cloud.aiplatform.container.v1.MatchService.Match 'deployed_index_id: "${DEPLOYED_INDEX_ID}", float_val: [1, 1], "sparse_embedding": {"values": [111.0,111.1,111.2], "dimensions": [10,20,30]}, numeric_restricts: [{name: "double-ns", value_double: 0.3, op: LESS_EQUAL}, {name: "double-ns", value_double: -1.2, op: GREATER}, {name: "double-ns", value_double: 0., op: NOT_EQUAL}], restricts: [{name: "color", allow_tokens: ["red"]}]' 
 
To learn more, see Client libraries explained .
Console
Use these instructions to query a VPC index from the console.
- In the Vertex AI section of the Google Cloud console, go to the Deploy and Use section. Select Vector Search
- Select the VPC index you want to query. The Index info page opens.
- Scroll down to the Deployed indexes section and select the deployed index you want to query. The Deployed index info page opens.
- From the Query index section, select your query parameters. You can choose to query by a vector, or a specific data point.
- Execute the query using the open source tool grpc_cli, or by using the Vertex AI SDK for Python.
Query-time settings that impact performance
The following query-time parameters can affect latency, availability, and cost when using Vector Search. This guidance applies to most cases. However, always experiment with your configurations to make sure that they work for your use case.
For parameter definitions, see Index configuration parameters .
approximateNeighborsCount 
Tells the algorithm the number of approximate results to retrieve from each shard.
The value of approximateNeighborsCount 
should always be greater than
        the value of setNeighborsCount 
. If the value of setNeighborsCount 
is small, 10 times that value is
        recommended for approximateNeighborsCount 
. For larger setNeighborsCount 
values, a smaller multiplier can be used.
The corresponding REST API name for this field is  approximate_neighbor_count 
 
.
Increasing the value of approximateNeighborsCount 
can
        affect performance in the following ways:
- Recall: Increased
- Latency: Potentially increased
- Availability: No impact
- Cost: Can increase because more data is processed during a search
Decreasing the value of approximateNeighborsCount 
can
        affect performance in the following ways:
- Recall: Decreased
- Latency: Potentially decreases
- Availability: No impact
- Cost: Can decrease cost because less data is processed during a search
setNeighborCount 
Specifies the number of results that you want the query to return.
The corresponding REST API name for this field is  neighbor_count 
 
.
Values less than or equal to 300 remain performant in most use cases. For larger values, test for your specific use case.
fractionLeafNodesToSearch 
leafNodeEmbeddingCount 
in
      that the more embeddings per leaf node, the more data examined per leaf. The corresponding REST API name for this field is  fraction_leaf_nodes_to_search_override 
 
.
 Increasing 
the value of fractionLeafNodesToSearch 
can
        affect performance in the following ways:
- Recall: Increased
- Latency: Increased
- Availability: No impact
- Cost: Can increase because higher latency occupies more machine resources
 Decreasing 
the value of fractionLeafNodesToSearch 
can
        affect performance in the following ways:
- Recall: Decreased
- Latency: Decreased
- Availability: No impact
- Cost: Can decrease because lower latency occupies fewer machine resources
What's next
- Learn how to Update and rebuild your index
- Learn how to Filter vector matches
- Learn how to Monitor an index

