You can generate and manage vector embeddings for an entire table column, which gives you a scalable solution for large-scale vector embedding creation. This solution is particularly useful for facilitating semantic search and Retrieval Augmented Generation (RAG) on text content, including the following:
- Creating initial vector embedding for a new table
- Generating embeddings after a large data import
- Refreshing embeddings after significant data changes
- Maintaining embeddings incrementally
Understand auto vector embeddings
Auto vector embeddings in AlloyDB provide a scalable way to automate the generation and maintenance of vector embeddings for your data. Instead of manually generating embeddings for each new or updated piece of text, you can configure auto vector embeddings to handle this process for you. This is particularly useful for applications that rely on up-to-date embeddings for semantic search, Retrieval Augmented Generation (RAG), and other AI-powered features.
With auto vector embeddings, you can do the following:
- Initialize embeddings for an entire table: generate embeddings for all existing data in a table column with a single command.
- Keep embeddings in sync: automatically update embeddings when the source data changes, ensuring that your AI applications always work with the most current information.
- Generate embeddings at scale: efficiently create embeddings for large tables with millions of rows.
- Configure and manage embeddings for multiple columns in the same table by calling the management functions for each embedding column.
This feature simplifies the development and maintenance of AI applications by abstracting away the complexity of vector embedding creation and maintenance.
Before you begin
Before you can generate and manage vector embeddings for large tables, do the following:
- Connect to your database using
psqlor AlloyDB for PostgreSQL Studio as thepostgresuser. - Verify that the
google_ml_integrationextension is installed . - Verify that the google_ml_integration.enable_model_support flag is set to
on. -
Verify that the
google_ml_integrationextension is version 1.5.6 or higher and that thegoogle_ml_integration.enable_faster_embedding_generationflag is set toon.To check your extension version with the following command:
SELECT extversion FROM pg_extension WHERE extname = 'google_ml_integration' ;If you need to update the extension, use the
ALTER EXTENSION google_ml_integration UPDATE;command. -
Before you can generate embeddings from an AlloyDB database, you must configure AlloyDB to work with Vertex AI. For more information, see Integrate your database with Vertex AI .
-
For managing and monitoring auto embedding generation, users have
Selectaccess to thegoogle_ml.embed_gen_progressandgoogle_ml.embed_gen_settingstables by default.To let a user manage auto embedding generation, grant
INSERT,UPDATE, andDELETEpermissions on thegoogle_ml.embed_gen_progressandgoogle_ml.embed_gen_settingstables:GRANT INSERT , UPDATE , DELETE ON google_ml . embed_gen_progress TO ' USER_NAME ' ;Replace the following:
- USER_NAME : the name of the user for whom the permissions are granted.
-
Finish the initial setup described in Generate text embeddings .
-
Verify that the
google_ml_integrationextension is version 1.5.6 or higher. -
Verify that
AUTOCOMMITis set toONin the PostgreSQL client that you use. -
Confirm that the embedding model that you use has sufficient quota for auto vector embeddings. Insufficient quota can result in a slow or failed auto embedding operation. For example, here are the limits for Vertex AI embedding models:
- Text embedding limits : each request can have up to 250 input texts generating one embedding per input text and 20,000 tokens per request. Only the first 2,048 tokens in each input text are used to compute the embeddings.
- Requests per minute
:
The relevant quota is
Regional online prediction requests per base model per minute per region per base_model. - For the
text-embedding-005model, thebase_modeldimension in the Google Cloud console istextembedding-gecko. -
For the
gemini-embedding-001model, thebase_modeldimension isgemini-embedding. -
Gemini Embedding model token limit : Unlike other embedding models which were primarily limited by RPM quotas, the Gemini Embedding model series is limited to 5,000,000 tokens per minute per project. The relevant quota is
Embed content input tokens per minute per region per base_model.
Prepare your table
Before you can generate auto embeddings, you must create a column in your table to store the resulting vectors. This column typically uses the vector(DIMENSION)
type and must have a DEFAULT NULL
value.
For example, to add a column for 768-dimension embeddings to a table named user_reviews
:
ALTER
TABLE
user_reviews
ADD
COLUMN
IF
NOT
EXISTS
content_embeddings
vector
(
768
)
DEFAULT
NULL
;
Initialize embeddings for a table
The functions for managing auto vector embeddings are available in both the ai
and google_ml
schemas. The ai
schema provides a simplified interface for the latest AI features in AlloyDB.
Use the ai.initialize_embeddings()
SQL function to generate embeddings for a table's content column. This is a blocking call, meaning the database session waits for the operation to complete before returning a result and allowing you to issue new commands in that session. However, other connections to the database are not blocked and can continue to work with the table:
- If the function returns success, the vector embedding creation is complete.
- The function automatically attempts to recover from transient issues like model quota errors. A failure returns to you only if these recovery attempts are unsuccessful. For persistent problems, such as a misconfigured
batch_sizethat causes the request to exceed size limits, or if the operation was canceled manually, you must manually reissue the call.
This function supports Google-provided models—for example, Vertex AI's text-embedding-005
—as well as custom models that you registered.
Perform batch generation
By default, AlloyDB uses batching to generate embeddings for multiple text inputs in a single request, which improves efficiency. If you don't provide a specific batch size, AlloyDB applies an automatically determined default value.
Hint batch size
The batch_size
parameter in ai.initialize_embeddings
lets you guide AlloyDB's query optimizer by suggesting a preferred batch size for directly supported models. AlloyDB might dynamically reduce this size based on model limits or quotas, but your hint helps influence the query execution plan.
CALL
ai
.
initialize_embeddings
(
model_id
=
>
'text-embedding-005'
,
table_name
=
>
'user_reviews'
,
content_column
=
>
'content'
,
embedding_column
=
>
'content_embeddings'
,
batch_size
=
>
50
);
Use a custom embedding model with batch support
If you want to use a custom or externally supported model that supports batching, define the batch transform functions and specify them as model_batch_in_transform_fn
and model_batch_out_transform_fn
when you create a model. You can also specify a batch_size
in the initialize_embeddings
call. For models that support batching, we recommend that you use a batch_size
greater than 1 for better performance.
-
Define your custom model's input, output, and batch transform functions.
-- Scalar input transform functions CREATE OR REPLACE FUNCTION acme_text_input_transform ( model_id TEXT , input TEXT ) RETURNS JSON ; CREATE OR REPLACE FUNCTION acme_text_output_transform ( model_id TEXT , model_output JSON ) RETURNS real []; CREATE OR REPLACE FUNCTION acme_generate_headers ( model_id TEXT , input TEXT ) RETURNS JSON ; -- Batch input transform functions CREATE OR REPLACE FUNCTION acme_text_batch_input_transform ( model_id TEXT , input TEXT []) RETURNS JSON ; CREATE OR REPLACE FUNCTION acme_text_batch_output_transform ( model_id TEXT , model_output JSON ) RETURNS real [][]; -
To create your model, specify the batch transform functions.
CALL ai . create_model ( model_id = > 'custom-embedding-model' , model_request_url = > 'https://acme.com/models/text/embeddings/v1' , model_type = > 'text_embedding' , model_in_transform_fn = > 'acme_text_input_transform' , model_out_transform_fn = > 'acme_text_output_transform' , generate_headers_fn = > 'acme_generate_headers' , model_batch_in_transform_fn = > 'acme_text_batch_input_transform' , model_batch_out_transform_fn = > 'acme_text_batch_output_transform' ); -
Generate vector embeddings with your custom model.
CALL ai . initialize_embeddings ( model_id = > 'custom-embedding-model' , table_name = > 'user_reviews' , content_column = > 'content' , embedding_column = > 'content_embeddings' , batch_size = > 10 );
You can also use the auto embedding feature with custom models that don't natively support batching. To do this, you must still define the batch transform functions model_batch_in_transform_fn
and model_batch_out_transform_fn
. For a non-batching model, define these functions to process a single input at a time from the input array. When you call ai.initialize_embeddings
for this model, set the batch_size
to 1
.
Use custom dimensionality with batch support
To use a custom batch input transform function with the OUTPUT_DIMENSIONALITY
parameter, you can define a function that specifies the selected dimensions in the request parameters. This is useful for optimizing embedding generation for models that support variable output sizes.
For example, the following function defines a custom batch input transform for a model that requires an output dimensionality of 768:
CREATE
OR
REPLACE
FUNCTION
google_ml
.
vertexai_text_embedding_batch_input_transform_with_768_dims
(
model_id
VARCHAR
(
100
),
input_list
TEXT
[])
RETURNS
JSON
LANGUAGE
SQL
AS
$$
SELECT
pg_catalog
.
json_build_object
(
'instances'
,
pg_catalog
.
json_agg
(
pg_catalog
.
json_build_object
(
'content'
,
content
)),
'parameters'
,
pg_catalog
.
json_build_object
(
'outputDimensionality'
,
768
)
)
FROM
unnest
(
input_list
)
AS
content
;
$$
;
Use JSONB optimization with custom models
You can use the JSONB
data type in your custom output transform functions for improved performance. When you create a custom model with batch support, the google_ml_integration
extension automatically looks for and uses a JSONB
variant of your transform functions.
This optimization can significantly improve the performance of auto embedding generation because JSONB
is a more efficient binary format for storing and processing JSON data in PostgreSQL.
To take advantage of this feature, you need to provide another version of your transform function that accepts a JSONB
argument instead of JSON
.
For example, if you have a batch output transform function with the following signature:
CREATE
OR
REPLACE
FUNCTION
my_batch_output_transform
(
model_id
TEXT
,
model_output
JSON
)
RETURNS
real
[][];
You can create a JSONB variant like this:
CREATE
OR
REPLACE
FUNCTION
my_batch_output_transform
(
model_id
TEXT
,
model_output
JSONB
)
RETURNS
real
[][];
The extension automatically detects the presence of the JSONB
version of the function and uses it for batch processing. If you already created a model registration that uses a JSON
transform function, you don't need to update your ai.create_model
call. As long as the new JSONB
function uses the exact same name as the existing JSON
function, the extension transparently detects and uses the JSONB
variant for batch processing.
Refresh embeddings incrementally
When you refresh an embedding, it is regenerated based on the latest value in the input content column.
To give you control over consistency and performance, AlloyDB supports various modes for incremental embedding refreshes. You can select a mode using the incremental_refresh_mode
enum argument in ai.initialize_embeddings()
. The following is a list of possible modes:
-
transactional: embeddings refresh as part of the transaction updating the content column. This process, which often uses a mechanism similar to a database trigger to automatically generate embeddings when the content column is updated, can introduce overhead and slow down update operations. The introduced overhead is a trade-off for maintaining transactional semantics and ensure that embeddings are in sync with the content. This mode relies on the scalar transform functions of your model, so you must definemodel_in_transform_fnandmodel_out_transform_fnwhen you create the model. To use thetransactionalmode, you must have the owner role on the table.CALL ai . initialize_embeddings ( model_id = > 'text-embedding-005' , table_name = > 'user_reviews' , content_column = > 'content' , embedding_column = > 'content_embeddings' , batch_size = > 10 , incremental_refresh_mode = > 'transactional' );In
transactionalmode, theai.refresh_embeddings()function is disabled because embeddings are kept in sync automatically using triggers. To regenerate embeddings for the entire table or to recover from an interruptedai.initialize_embeddings()call in this mode, you must first drop the configuration using theai.drop_embedding_config()function and then reissue theai.initialize_embeddings()call.This automated mode is useful for demos, small datasets, or tables where the volume of inserts and updates is low compared to the initial load, for example, a few hundred daily updates on a million-row table. It is appropriate when maintaining immediate data consistency is more important than the additional latency introduced during update operations.
-
manual: This is the default mode. In this mode, a new boolean tracking column is added to your table to track stale embeddings. Calling theai.refresh_embeddings()function performs a periodic incremental refresh by only generating embeddings for new or updated rows. We recommend this mode for users who require more control over performance, especially when handling a large number of inserts or updates after the initial load. It is useful for scenarios where minimizing write latency is a priority and where having temporarily stale or null embeddings is acceptable until a a periodic incremental refresh is triggered usingai.refresh_embeddings. To update stale or new rows, use theai.refresh_embeddings()function as described in the Refresh all embeddings for a table section.
Choose a refresh mode
The following table compares the two incremental refresh modes to help you choose the best approach for your application.
| Mode | Description | Consistency and performance | Best for |
|---|---|---|---|
transactional
|
Embeddings are updated as part of the database transaction using a mechanism similar to a trigger. Note that ai.refresh_embeddings()
is disabled in this mode. |
Immediate consistency.High write latency overhead as each INSERT
or UPDATE
triggers an embedding generation. |
Demos, small datasets, or tables where data consistency is critical. |
manual
(default) |
Uses a boolean tracking column to monitor stale or new rows. You must call ai.refresh_embeddings()
to trigger a periodic incremental refresh. |
Eventual consistency.Minimal write latency as embeddings are generated in bulk at a time of your choosing. | Large datasets, production environments with high write frequency, or performance-sensitive applications. |
Refresh all embeddings for a table
After you successfully run ai.initialize_embeddings()
for a table using the manual
incremental refresh mode, you can perform a periodic incremental refresh of your embeddings, which is triggered using ai.refresh_embeddings
. You can use a refresh operation to update embeddings for rows that are modified concurrently during the initial initialize_embeddings
call or to perform a periodic incremental refresh.
If the embedding creation process is interrupted in manual mode, for example, by pg_cancel
, call the ai.refresh_embeddings()
function to complete the generation for the remaining rows.
The refresh function reuses the settings from the initial call, so you only need to specify the table and embedding column. You can also provide an optional batch_size
to override the default value.
CALL
ai
.
refresh_embeddings
(
table_name
=
>
'user_reviews'
,
embedding_column
=
>
'content_embeddings'
,
batch_size
=
>
50
-- Optional override
);
Work with table data during vector embedding creation
While ai.initialize_embeddings()
is a blocking call for the session that it runs in, other connections can continue to work with the table. The auto vector embedding process updates rows in batches using standard row-level locking. This means that concurrent Data Modification Language (DML) operations — such as UPDATE
or DELETE
— from other connections are only briefly blocked if they attempt to modify the same rows targeted by the active embedding task. Non-modifying SELECT
queries are not blocked.
Delete auto vector embedding settings
If you need to remove the auto vector embedding configuration for a specific table and embedding column combination, use the ai.drop_embedding_config()
function. This function can be useful for cleanup or when you reconfigure embedding management for a column.
CALL
ai
.
drop_embedding_config
(
table_name
=
>
'user_reviews'
,
embedding_column
=
>
'content_embeddings'
);
Work with partitioned tables
The auto vector embedding feature supports partitioned tables. This lets you manage embeddings for large, partitioned datasets efficiently. Here are the common use cases for working with partitioned tables.
Initialize embeddings on a partitioned table
You can only initialize embeddings on the root partition of a table. This is a one-time operation for the entire partitioned table.
CALL
ai
.
initialize_embeddings
(
model_id
=
>
'text-embeddings-005'
,
table_name
=
>
'documents'
,
-- This is the root partitioned table
content_column
=
>
'content'
,
embedding_column
=
>
'content_embeddings'
);
Refresh embeddings on a partitioned table
After initialization, you can refresh embeddings on any partition, including the root partition, sub-partitions, or individual leaf partitions. For large datasets, you can improve performance by refreshing embeddings for distinct partitions in parallel from different database connections:
- To refresh the entire table, run the following:
CALL
ai
.
refresh_embeddings
(
table_name
=
>
'documents'
,
-- This is the root partitioned table
embedding_column
=
>
'content_embeddings'
);
- To refresh a single partition, run the following:
CALL
ai
.
refresh_embeddings
(
table_name
=
>
'documents_eu'
,
embedding_column
=
>
'content_embeddings'
);
Refresh embeddings for newly added or attached partitions
The auto-embedding feature supports generating embeddings for partitions that are incorporated into your main table after the initial setup. The specific steps depend on whether you are adding a completely new partition or attaching a pre-existing table.
- Newly added partition: if you add a new partition to the table, you can generate embeddings for it by calling
ai.refresh_embeddingson the new partition.
-- Add a new partition
CREATE
TABLE
documents_africa
PARTITION
OF
documents
FOR
VALUES
IN
(
'africa'
);
-- Refresh embeddings for the new partition
CALL
ai
.
refresh_embeddings
(
table_name
=
>
'documents_africa'
,
embedding_column
=
>
'content_embeddings'
);
- Newly attached partition: to attach an existing table as a partition, first use the
ai.embedding_prepare_partitionprocedure to ensure its schema is compatible with the partitioned table. The auto-embedding feature supports adding or attaching partitions at any level of a hierarchical partition setup. Theai.embedding_prepare_partitionprocedure ensures the schema is compatible with any parent table in the hierarchy.
To attach an existing table as a partition, first use the ai.embedding_prepare_partition
procedure to ensure its schema is compatible with the partitioned table:
-- Prepare the table to be attached
CALL
ai
.
embedding_prepare_partition
(
parent_table
=
>
'documents'
,
child_table
=
>
'documents_misc'
);
-- Attach the partition
ALTER
TABLE
documents
ATTACH
partition
documents_misc
DEFAULT
;
-- Refresh embeddings for the newly attached partition
CALL
ai
.
refresh_embeddings
(
table_name
=
>
'documents_misc'
,
embedding_column
=
>
'content_embeddings'
);
The auto-embedding feature supports adding or attaching partitions at any level of a hierarchical partition setup. The ai.embedding_prepare_partition
procedure ensures the schema is compatible with any parent table in the hierarchy:
-- Prepare a sub-partition for a non-root parent table
CALL
ai
.
embedding_prepare_partition
(
parent_table
=
>
'documents_eu'
,
-- An existing partition
child_table
=
>
'documents_eu_germany'
);
-- Attach the new sub-partition
ALTER
TABLE
documents_eu
ATTACH
PARTITION
documents_eu_germany
FOR
VALUES
IN
(
'germany'
);
-- Refresh embeddings for the new sub-partition
CALL
ai
.
refresh_embeddings
(
table_name
=
>
'documents_eu_germany'
,
embedding_column
=
>
'content_embeddings'
);
Monitor embedding generation progress
You can monitor the real-time status of active initialize_embeddings
and refresh_embeddings
calls by querying the ai.embedding_progress_view
. This view provides details on the progress of the operation, including the percentage completed, elapsed time, and estimated time remaining.
To check the progress, run the following query:
SELECT
table_name
,
content_column
,
embedding_column
,
model_id
,
percent_progress
,
status
,
elapsed_time
,
rows_processed
,
partition_root
FROM
ai
.
embedding_progress_view
;
The view provides the following information:
| Column | Description |
|---|---|
table_name
|
The name of the table or partition being processed. |
content_column
|
The column containing the source content for the embedding. |
embedding_column
|
The column where the embeddings are being stored. |
model_id
|
The model being used for generation. |
percent_progress
|
The percentage of the operation that is complete. |
status
|
The current status of the operation (e.g., running, success). |
elapsed_time
|
The time that has passed since the operation began. |
rows_processed
|
The number of rows processed so far. |
partition_root
|
The name of the root partitioned table. |
Examples for generating embeddings in auto
This section provides examples for generating embeddings in auto using registered model endpoints.
OpenAI embedding model
To generate embeddings using the registered text-embedding-3-small
model endpoint
provided by OpenAI, run the following statement:
CALL
ai
.
initialize_embeddings
(
model_id
=
>
'text-embedding-3-small'
,
table_name
=
>
'user_reviews'
,
chunk_column
=
>
'content'
,
embedding_column
=
>
'content_embeddings'
);
Custom embedding models
For your own or externally supported models, you must define input and output transform functions and register them with ai.create_model
. If you plan to use the auto embedding feature, you must specify both scalar transform functions—for example, acme_text_input_transform
, acme_text_output_transform
—and batch transform functions—for example, acme_text_batch_input_transform
, acme_text_batch_output_transform
.

