In a Spanner database, Spanner automatically creates an
index for each table's primary key. For example, you don't need to do anything
to index the primary key of Singers
, because it's automatically indexed for you.
You can also create secondary indexes
for other columns. Adding a secondary
index on a column makes it more efficient to look up data in that column. For
example, if you need to quickly look up an album by title, you should
create a secondary index on AlbumTitle
,
so that Spanner does not need to scan the entire table.
If the lookup in the previous example is done within a read-write transaction,
then the more efficient lookup also avoids holding locks on the entire table,
which allows concurrent inserts and updates to the table for rows outside of the AlbumTitle
lookup range.
In addition to the benefits they bring to lookups, secondary indexes can also help Spanner execute scans more efficiently, enabling index scans rather than full table scans.
Spanner stores the following data in each secondary index:
- All key columns from the base table
- All columns that are included in the index
- All columns specified in the optional
STORING
clause (GoogleSQL-dialect databases) orINCLUDE
clause (PostgreSQL-dialect databases) of the index definition.
Over time, Spanner analyzes your tables to ensure that your secondary indexes are used for the appropriate queries.
Add a secondary index
The most efficient time to add a secondary index is when you create the table. To create a table and its indexes at the same time, send the DDL statements for the new table and the new indexes in a single request to Spanner.
In Spanner, you can also add a new secondary index to an existing table while the database continues to serve traffic. Like any other schema changes in Spanner, adding an index to an existing database does not require taking the database offline and does not lock entire columns or tables.
Whenever a new index is added to an existing table, Spanner automatically backfills , or populates, the index to reflect an up-to-date view of the data being indexed. Spanner manages this backfill process for you, and the process runs in the background using node resources at low priority. Index backfill speed adapts to changing node resources during index creation, and backfilling doesn't significantly affect the performance of the database.
Index creation can take from several minutes to many hours. Because index creation is a schema update, it is bound by the same performance constraints as any other schema update. The time needed to create a secondary index depends on several factors:
- The size of the dataset
- The compute capacity of the instance
- The load on the instance
To view the progress made for an index backfill process, refer to the progress section .
Be aware that using the commit timestamp column as the first part of the secondary index can create hotspots and reduce write performance.
Use the CREATE INDEX
statement to define a secondary index
in your schema. Here are some examples:
To index all Singers
in the database by their first and last name:
GoogleSQL
CREATE
INDEX
SingersByFirstLastName
ON
Singers
(
FirstName
,
LastName
);
PostgreSQL
CREATE
INDEX
SingersByFirstLastName
ON
Singers
(
FirstName
,
LastName
);
To create an index of all Songs
in the database by the value of SongName
:
GoogleSQL
CREATE
INDEX
SongsBySongName
ON
Songs
(
SongName
);
PostgreSQL
CREATE
INDEX
SongsBySongName
ON
Songs
(
SongName
);
To index only the songs for a particular singer, use the INTERLEAVE IN
clause to interleave the index in
the table Singers
:
GoogleSQL
CREATE
INDEX
SongsBySingerSongName
ON
Songs
(
SingerId
,
SongName
),
INTERLEAVE
IN
Singers
;
PostgreSQL
CREATE
INDEX
SongsBySingerSongName
ON
Songs
(
SingerId
,
SongName
)
INTERLEAVE
IN
Singers
;
To index only the songs on a particular album:
GoogleSQL
CREATE
INDEX
SongsBySingerAlbumSongName
ON
Songs
(
SingerId
,
AlbumId
,
SongName
),
INTERLEAVE
IN
Albums
;
PostgreSQL
CREATE
INDEX
SongsBySingerAlbumSongName
ON
Songs
(
SingerId
,
AlbumId
,
SongName
)
INTERLEAVE
IN
Albums
;
To index by descending order of SongName
:
GoogleSQL
CREATE
INDEX
SongsBySingerAlbumSongNameDesc
ON
Songs
(
SingerId
,
AlbumId
,
SongName
DESC
),
INTERLEAVE
IN
Albums
;
PostgreSQL
CREATE
INDEX
SongsBySingerAlbumSongNameDesc
ON
Songs
(
SingerId
,
AlbumId
,
SongName
DESC
)
INTERLEAVE
IN
Albums
;
Note that the previous DESC
annotation applies only to SongName
. To index by
descending order of other index keys, annotate them with DESC
as well: SingerId DESC, AlbumId DESC
.
Also note that PRIMARY_KEY
is a reserved word and cannot be used as the name
of an index. It is the name given to the pseudo-index
that is created when a table with PRIMARY KEY specification is created
For more details and best practices for choosing non-interleaved indexes and interleaved indexes, see Index options and Use an interleaved index on a column whose value monotonically increases or decreases .
Indexes and Interleaving
Spanner indexes can be interleaved with other tables in order to colocate index rows with those of another table. Similar to Spanner table interleaving, the primary key columns of the index's parent must be a prefix of the indexed columns, matching in type and sort order. Unlike interleaved tables, column name matching is not required. Each row of an interleaved index is physically stored together with the associated parent row.
For example, consider the following schema:
CREATE
TABLE
Singers
(
SingerId
INT64
NOT
NULL
,
FirstName
STRING
(
1024
),
LastName
STRING
(
1024
),
SingerInfo
PROTO<Singer>
(
MAX
)
)
PRIMARY
KEY
(
SingerId
),
DIRECTORY
;
CREATE
TABLE
Albums
(
SingerId
INT64
NOT
NULL
,
AlbumId
INT64
NOT
NULL
,
AlbumTitle
STRING
(
MAX
),
PublisherId
INT64
NOT
NULL
)
PRIMARY
KEY
(
SingerId
,
AlbumId
),
INTERLEAVE
IN
PARENT
Singers
ON
DELETE
CASCADE
;
CREATE
TABLE
Songs
(
SingerId
INT64
NOT
NULL
,
AlbumId
INT64
NOT
NULL
,
TrackId
INT64
NOT
NULL
,
PublisherId
INT64
NOT
NULL
,
SongName
STRING
(
MAX
)
)
PRIMARY
KEY
(
SingerId
,
AlbumId
,
TrackId
),
INTERLEAVE
IN
PARENT
Albums
ON
DELETE
CASCADE
;
CREATE
TABLE
Publishers
(
Id
INT64
NOT
NULL
,
PublisherName
STRING
(
MAX
)
)
PRIMARY
KEY
(
Id
);
To index all Singers
in the database by their first and last name, you must
create an index. Here's how to define the index SingersByFirstLastName
:
CREATE
INDEX
SingersByFirstLastName
ON
Singers
(
FirstName
,
LastName
);
If you want to create an index of Songs
on (SingerId, AlbumId, SongName)
,
you could do the following:
CREATE
INDEX
SongsBySingerAlbumSongName
ON
Songs
(
SingerId
,
AlbumId
,
SongName
);
Or you could create an index that is interleaved with an ancestor of Songs
,
such as the following:
CREATE
INDEX
SongsBySingerAlbumSongName
ON
Songs
(
SingerId
,
AlbumId
,
SongName
),
INTERLEAVE
IN
Albums
;
Further, you could also create an index of Songs
on (PublisherId, SingerId, AlbumId, SongName)
that's interleaved with a table
that isn't an ancestor of Songs
, like Publishers
. Note that the primary key
for the Publishers
table ( id
), is not a prefix of the indexed columns in the
following example. This is still allowed because Publishers.Id
and Songs.PublisherId
share the same type, sort order, and nullability.
CREATE
INDEX
SongsByPublisherSingerAlbumSongName
ON
Songs
(
PublisherId
,
SingerId
,
AlbumId
,
SongName
),
INTERLEAVE
IN
Publishers
;
Check index backfill progress
Console
-
In the Spanner navigation menu, click the Operationstab. The Operationspage shows a list of running operations.
-
Find the backfill operation in the list. If it's still running, the progress indicator in the End timecolumn shows the percentage of the operation that is complete, as shown in the following image:
gcloud
Use gcloud spanner operations describe
to check the progress of an operation.
-
Get the operation ID:
gcloud spanner operations list --instance = INSTANCE-NAME \ --database = DATABASE-NAME --type = DATABASE_UPDATE_DDL
Replace the following:
- INSTANCE-NAME with the Spanner instance name.
- DATABASE-NAME with the name of the database.
Usage notes:
-
To limit the list, specify the
--filter
flag. For example:-
--filter="metadata.name:example-db"
only lists the operations on a specific database. -
--filter="error:*"
only lists the backup operations that failed.
For information on filter syntax, see gcloud topic filters . For information on filtering backup operations, see the
filter
field in ListBackupOperationsRequest . -
-
The
--type
flag is not case sensitive.
The output looks similar to the following:
OPERATION_ID STATEMENTS DONE @TYPE _auto_op_123456 CREATE INDEX SingersByFirstLastName ON Singers ( FirstName , LastName ) False UpdateDatabaseDdlMetadata CREATE INDEX SongsBySingerAlbumSongName ON Songs ( SingerId , AlbumId , SongName ), INTERLEAVE IN Albums _auto_op_234567 True CreateDatabaseMetadata
-
Run
gcloud spanner operations describe
:gcloud spanner operations describe \ --instance = INSTANCE-NAME \ --database = DATABASE-NAME \ projects/ PROJECT-NAME /instances/ INSTANCE-NAME /databases/ DATABASE-NAME /operations/ OPERATION_ID
Replace the following:
- INSTANCE-NAME : The Spanner instance name.
- DATABASE-NAME : The Spanner database name.
- PROJECT-NAME : The project name.
- OPERATION-ID : The operation ID of the operation that you want to check.
The
progress
section in the output shows the percentage of the operation that's complete. The output looks similar to the following::done : true ... progress : - endTime : '2021-01-22T21:58:42.912540Z' progressPercent : 100 startTime : '2021-01-22T21:58:11.053996Z' - progressPercent : 67 startTime : '2021-01-22T21:58:11.053996Z' ...
REST v1
Get the operation ID:
gcloud spanner operations list --instance = INSTANCE-NAME
--database = DATABASE-NAME --type = DATABASE_UPDATE_DDL
Replace the following:
- INSTANCE-NAME with the Spanner instance name.
- DATABASE-NAME with the name of the database.
Before using any of the request data, make the following replacements:
- PROJECT-ID : the project ID.
- INSTANCE-ID : the instance ID.
- DATABASE-ID : the database ID.
- OPERATION-ID : the operation ID.
HTTP method and URL:
GET https://spanner.googleapis.com/v1/projects/ PROJECT-ID /instances/ INSTANCE-ID /databases/ DATABASE-ID /operations/ OPERATION-ID
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{ ... "progress": [ { "progressPercent": 100, "startTime": "2023-05-27T00:52:27.366688Z", "endTime": "2023-05-27T00:52:30.184845Z" }, { "progressPercent": 100, "startTime": "2023-05-27T00:52:30.184845Z", "endTime": "2023-05-27T00:52:40.750959Z" } ], ... "done": true, "response": { "@type": "type.googleapis.com/google.protobuf.Empty" } }
For gcloud
and REST, you can find the progress of each index backfill
statement in the progress
section. For each statement in the statement array,
there is a corresponding field in the progress array. This progress array order
corresponds to the order of the statements array. Once available, the startTime
, progressPercent
, and endTime
fields are populated accordingly.
Note that the output doesn't show an estimated time for when the backfill
progress will complete.
If the operation takes too long, you can cancel it. For more information, see Cancel index creation .
Scenarios when viewing index backfill progress
There are different scenarios that you can encounter when trying to check the progress of an index backfill. Index creation statements that require an index backfill are part of schema update operations, and there can be several statements that are part of a schema update operation.
The first scenario is the simplest, which is when the index creation statement
is the first statement in the schema update operation. Since the index creation
statement is the first statement, it is the first one processed and executed due
to the order of execution
.
Immediately, the startTime
field of the index creation statement will
populate with the start time of the schema update operation. Next, the index
creation statement's progressPercent
field is populated when the progress of
the index backfill is over 0%. Finally, the endTime
field is populated once
the statement is committed.
The second scenario is when the index creation statement is not the first
statement in the schema update operation. No fields related to the index
creation statement will populate until the previous statement(s) have been
committed due to the order of execution
.
Similar to the previous scenario, once the previous statements are committed, the startTime
field of the index creation statement populates first, followed by
the progressPercent
field. Lastly, the endTime
field populates once the
statement finishes committing.
Cancel index creation
You can use the Google Cloud CLI to cancel index creation. To retrieve a list of
schema-update operations for a Spanner database, use the gcloud spanner operations list
command, and include
the --filter
option:
gcloud
spanner
operations
list
\
--instance =
INSTANCE
\
--database =
DATABASE
\
--filter =
"@TYPE:UpdateDatabaseDdlMetadata"
Find the OPERATION_ID
for the operation you want to cancel, then use the gcloud spanner operations cancel
command to cancel
it:
gcloud
spanner
operations
cancel
OPERATION_ID
\
--instance =
INSTANCE
\
--database =
DATABASE
View existing indexes
To view information about existing indexes in a database, you can use the Google Cloud console or the Google Cloud CLI:
Console
-
Go to the Spanner Instancespage in the Google Cloud console.
-
Click the name of the instance you want to view.
-
In the left pane, click the database you want to view, then click the table you want to view.
-
Click the Indexestab. The Google Cloud console shows a list of indexes.
-
Optional: To get details about an index, such as the columns that it includes, click the name of the index.
gcloud
Use the gcloud spanner databases ddl describe
command:
gcloud
spanner
databases
ddl
describe
DATABASE
\
--instance =
INSTANCE
The gcloud CLI prints the Data Definition Language (DDL)
statements to create the database's tables and indexes. The CREATE
INDEX
statements describe the existing indexes. For
example:
--- |-
CREATE
TABLE
Singers
(
SingerId
INT64
NOT
NULL
,
FirstName
STRING
(
1024
),
LastName
STRING
(
1024
),
SingerInfo
BYTES
(
MAX
),
)
PRIMARY
KEY
(
SingerId
)
---
CREATE
INDEX
SingersByFirstLastName
ON
Singers
(
FirstName
,
LastName
)
Query with a specific index
The following sections explain how to specify an index in a SQL statement and
with the read interface for Spanner. The examples in these sections
assume that you added a MarketingBudget
column to the Albums
table and
created an index called AlbumsByAlbumTitle
:
GoogleSQL
CREATE
TABLE
Albums
(
SingerId
INT64
NOT
NULL
,
AlbumId
INT64
NOT
NULL
,
AlbumTitle
STRING
(
MAX
),
MarketingBudget
INT64
,
)
PRIMARY
KEY
(
SingerId
,
AlbumId
),
INTERLEAVE
IN
PARENT
Singers
ON
DELETE
CASCADE
;
CREATE
INDEX
AlbumsByAlbumTitle
ON
Albums
(
AlbumTitle
);
PostgreSQL
CREATE
TABLE
Albums
(
SingerId
BIGINT
NOT
NULL
,
AlbumId
BIGINT
NOT
NULL
,
AlbumTitle
VARCHAR
,
MarketingBudget
BIGINT
,
PRIMARY
KEY
(
SingerId
,
AlbumId
)
)
INTERLEAVE
IN
PARENT
Singers
ON
DELETE
CASCADE
;
CREATE
INDEX
AlbumsByAlbumTitle
ON
Albums
(
AlbumTitle
);
Specify an index in a SQL statement
When you use SQL to query a Spanner table, Spanner automatically
uses any indexes that are likely to make the query more efficient. As a result,
you don't need to specify an index for SQL queries. However,
for queries that are critical for your workload, Google advises you to use FORCE_INDEX
directives in your SQL statements for more consistent performance.
In a few cases, Spanner might choose an index that causes query latency to increase. If you've followed the troubleshooting steps for performance regressions and confirmed that it makes sense to try a different index for the query, you can specify the index as part of your query.
To specify an index in a SQL statement, use the FORCE_INDEX
hint to provide an index directive
. Index directives use the following syntax:
GoogleSQL
FROM
MyTable
@{
FORCE_INDEX
=
MyTableIndex
}
PostgreSQL
FROM
MyTable
/*@ FORCE_INDEX = MyTableIndex
*/
You can also use an index directive to tell Spanner to scan the base table instead of using an index:
GoogleSQL
FROM
MyTable
@{
FORCE_INDEX
=
_BASE_TABLE
}
PostgreSQL
FROM
MyTable
/*@ FORCE_INDEX = _BASE_TABLE */
You can use an index directive to tell Spanner to scan an index in a table with named schemas:
GoogleSQL
FROM
MyNamedSchema
.
MyTable
@{
FORCE_INDEX
=
" MyNamedSchema
. MyTableIndex
"
}
PostgreSQL
FROM
MyTable
/*@ FORCE_INDEX = MyTableIndex
*/
The following example shows a SQL query that specifies an index:
GoogleSQL
SELECT
AlbumId
,
AlbumTitle
,
MarketingBudget
FROM
Albums
@{
FORCE_INDEX
=
AlbumsByAlbumTitle
}
WHERE
AlbumTitle
> =
"Aardvark"
AND
AlbumTitle
<
"Goo"
;
PostgreSQL
SELECT
AlbumId
,
AlbumTitle
,
MarketingBudget
FROM
Albums
/*@ FORCE_INDEX = AlbumsByAlbumTitle */
WHERE
AlbumTitle
> =
'Aardvark'
AND
AlbumTitle
<
'Goo'
;
An index directive might force Spanner's query processor to read
additional columns that are required by the query but not stored in the index.
The query processor retrieves these columns by joining the index and the base
table. To avoid this extra join, use a STORING
clause (GoogleSQL-dialect databases) or INCLUDE
clause (PostgreSQL-dialect databases)
to
store the additional columns in the index.
In the previous example, the MarketingBudget
column is not
stored in the index, but the SQL query selects this column. As a result,
Spanner must look up the MarketingBudget
column in the base table,
then join it with data from the index, to return the query results.
Spanner raises an error if the index directive has any of the following issues:
- The index does not exist.
- The index is on a different base table.
- The query is missing a required
NULL
filtering expression for aNULL_FILTERED
index.
The following examples show how to write and execute queries that fetch the
values of AlbumId
, AlbumTitle
, and MarketingBudget
using the index AlbumsByAlbumTitle
:
C++
C#
Go
Java
Node.js
PHP
Python
Ruby
Specify an index in the read interface
When you use the read interface to Spanner, and you want Spanner to use an index, you must specify the index. The read interface does not select the index automatically.
In addition, your index must contain all of the data that appears in the query results, excluding columns that are part of the primary key. This restriction exists because the read interface does not support joins between the index and the base table. If you need to include other columns in the query results, you have a few options:
- Use a
STORING
orINCLUDE
clause to store the additional columns in the index. - Query without including the additional columns, then use the primary keys to send another query that reads the additional columns.
Spanner returns values from the index in ascending sort order by index key. To retrieve values in descending order, complete these steps:
-
Annotate the index key with
DESC
. For example:CREATE INDEX AlbumsByAlbumTitle ON Albums ( AlbumTitle DESC );
The
DESC
annotation applies to a single index key. If the index includes more than one key, and you want the query results to appear in descending order based on all keys, include aDESC
annotation for each key. -
If the read specifies a key range, ensure that the key range is also in descending order. In other words, the value of the start key must be greater than the value of the end key.
The following example shows how to retrieve the values of AlbumId
and AlbumTitle
using the index AlbumsByAlbumTitle
:
C++
C#
Go
Java
Node.js
PHP
Python
Ruby
Create an index for index-only scans
Optionally, you can use the STORING
clause (for GoogleSQL-dialect databases) or INCLUDE
clause (for PostgreSQL-dialect databases) to store a copy of a column in the
index. This type of index provides advantages for queries and read calls using
the index, at the cost of using extra storage:
- SQL queries that use the index and select columns stored in the
STORING
orINCLUDE
clause don't require an extra join to the base table. -
read()
calls that use the index can read columns stored by theSTORING
/INCLUDE
clause.
For example, suppose you created an alternate version of AlbumsByAlbumTitle
that stores a copy of the MarketingBudget
column in the index (note the STORING
or INCLUDE
clause in bold):
GoogleSQL
CREATE
INDEX
AlbumsByAlbumTitle2
ON
Albums
(
AlbumTitle
)
STORING
(
MarketingBudget
)
;
PostgreSQL
CREATE
INDEX
AlbumsByAlbumTitle2
ON
Albums
(
AlbumTitle
)
INCLUDE
(
MarketingBudget
)
;
With the old AlbumsByAlbumTitle
index, Spanner must join the index
with the base table, then retrieve the column from the base table. With the new AlbumsByAlbumTitle2
index, Spanner reads the column directly from the
index, which is more efficient.
If you use the read interface instead of SQL, the new AlbumsByAlbumTitle2
index also lets you read the MarketingBudget
column directly:
C++
C#
Go
Java
Node.js
PHP
Python
Ruby
Alter an index
You can use the ALTER INDEX
statement to add additional columns
into an existing index or drop columns. This
can update the columns list defined by the STORING
clause
(GoogleSQL-dialect databases) or INCLUDE
clause (PostgreSQL-dialect databases) when you create the
index. You can't use this statement to add columns to or drop
columns from the index key. For example, instead of creating a new
index AlbumsByAlbumTitle2
, you can use ALTER INDEX
to add a
column into AlbumsByAlbumTitle
, as shown in the following example:
GoogleSQL
ALTER
INDEX
AlbumsByAlbumTitle
ADD
STORED
COLUMN
MarketingBudget
PostgreSQL
ALTER
INDEX
AlbumsByAlbumTitle
ADD
INCLUDE
COLUMN
MarketingBudget
When you add a new column into an existing index, Spanner
uses a background backfilling process. While the backfill is ongoing,
the column in the index is not readable, so you might not get the expected
performance boost. You can use the gcloud spanner operations
command
to list the long-running operation and view its status.
For more information, see describe operation
.
You can also use cancel operation to cancel a running operation.
After the backfill is done, Spanner adds the column into the index. As the Index grows bigger, this might slow down the queries that use the index.
The following example shows how to drop a column from an index:
GoogleSQL
ALTER
INDEX
AlbumsByAlbumTitle
DROP
STORED
COLUMN
MarketingBudget
PostgreSQL
ALTER
INDEX
AlbumsByAlbumTitle
DROP
INCLUDE
COLUMN
MarketingBudget
Index of NULL values
By default, Spanner indexes NULL
values. For example, recall the
definition of the index SingersByFirstLastName
on the table Singers
:
CREATE
INDEX
SingersByFirstLastName
ON
Singers
(
FirstName
,
LastName
);
All rows of Singers
are indexed even if either FirstName
or LastName
, or
both, are NULL
.
When NULL
values are indexed, you can perform efficient SQL queries and reads
over data that includes NULL
values. For example, use this SQL query statement
to find all Singers
with a NULL
FirstName
:
GoogleSQL
SELECT
s
.
SingerId
,
s
.
FirstName
,
s
.
LastName
FROM
Singers
@{
FORCE_INDEX
=
SingersByFirstLastName
}
AS
s
WHERE
s
.
FirstName
IS
NULL
;
PostgreSQL
SELECT
s
.
SingerId
,
s
.
FirstName
,
s
.
LastName
FROM
Singers
/* @ FORCE_INDEX = SingersByFirstLastName */
AS
s
WHERE
s
.
FirstName
IS
NULL
;
Sort order for NULL values
Spanner sorts NULL
as the smallest value for any given type. For a
column in ascending ( ASC
) order, NULL
values sort first. For a column in
descending ( DESC
) order, NULL
values sort last.
Disable indexing of NULL values
GoogleSQL
To disable the indexing of nulls, add the NULL_FILTERED
keyword to the index
definition. NULL_FILTERED
indexes are particularly useful for indexing sparse
columns, where most rows contain a NULL
value. In these cases, the NULL_FILTERED
index can be considerably smaller and more efficient to maintain
than a normal index that includes NULL
values.
Here's an alternate definition of SingersByFirstLastName
that does
not index NULL
values:
CREATE
NULL_FILTERED
INDEX
SingersByFirstLastNameNoNulls
ON
Singers
(
FirstName
,
LastName
);
The NULL_FILTERED
keyword applies to all index key columns. You cannot specify NULL
filtering on a per-column basis.
PostgreSQL
To filter out rows with null values in one or more indexed columns, use the WHERE COLUMN
IS NOT NULL
predicate.
Null-filtered indexes are particularly useful for indexing sparse
columns, where most rows contain a NULL
value. In these cases, the
null-filtered index can be considerably smaller and more efficient to maintain
than a normal index that includes NULL
values.
Here's an alternate definition of SingersByFirstLastName
that does
not index NULL
values:
CREATE
INDEX
SingersByFirstLastNameNoNulls
ON
Singers
(
FirstName
,
LastName
)
WHERE
FirstName
IS
NOT
NULL
AND
LastName
IS
NOT
NULL
;
Filtering out NULL
values prevents Spanner from using it for some
queries. For example, Spanner does not use the index for this query,
because the index omits any Singers
rows for which LastName
is NULL
; as a
result, using the index would prevent the query from returning the correct rows:
GoogleSQL
FROM
Singers
@{
FORCE_INDEX
=
SingersByFirstLastNameNoNulls
}
WHERE
FirstName
=
"John"
;
PostgreSQL
FROM
Singers
/*@ FORCE_INDEX = SingersByFirstLastNameNoNulls */
WHERE
FirstName
=
'John'
;
To enable Spanner to use the index, you must rewrite the query so it excludes the rows that are also excluded from the index:
GoogleSQL
SELECT
FirstName
,
LastName
FROM
Singers
@{
FORCE_INDEX
=
SingersByFirstLastNameNoNulls
}
WHERE
FirstName
=
'John'
AND
LastName
IS
NOT
NULL
;
PostgreSQL
SELECT
FirstName
,
LastName
FROM
Singers
/*@ FORCE_INDEX = SingersByFirstLastNameNoNulls */
WHERE
FirstName
=
'John'
AND
LastName
IS
NOT
NULL
;
Index proto fields
Use generated columns
to index
fields in protocol buffers stored in PROTO
columns, as long as the fields
being indexed use the primitive or ENUM
data types.
If you define an index on a protocol message field, you can't modify or remove that field from the proto schema. For more information, see Updates to schemas that contain an index on proto fields .
The following is an example of the Singers
table with a SingerInfo
proto
message column. To define an index on the nationality
field of the PROTO
,
you need to create a stored generated column:
GoogleSQL
CREATE
PROTO
BUNDLE
(
googlesql
.
example
.
SingerInfo
,
googlesql
.
example
.
SingerInfo
.
Residence
);
CREATE
TABLE
Singers
(
SingerId
INT64
NOT
NULL
,
...
SingerInfo
googlesql
.
example
.
SingerInfo
,
SingerNationality
STRING
(
MAX
)
AS
(
SingerInfo
.
nationality
)
STORED
)
PRIMARY
KEY
(
SingerId
);
It has the following definition of the googlesql.example.SingerInfo
proto type:
GoogleSQL
package
googlesql
.
example
;
message
SingerInfo
{
optional
string
nationality
=
1
;
repeated
Residence
residence
=
2
;
message
Residence
{
required
int64
start_year
=
1
;
optional
int64
end_year
=
2
;
optional
string
city
=
3
;
optional
string
country
=
4
;
}
}
Then define an index on the nationality
field of the proto:
GoogleSQL
CREATE
INDEX
SingersByNationality
ON
Singers
(
SingerNationality
);
The following SQL query reads data using the previous index:
GoogleSQL
SELECT
s
.
SingerId
,
s
.
FirstName
FROM
Singers
AS
s
WHERE
s
.
SingerNationality
=
"English"
;
Notes:
- Use an index directive to access indexes on the fields of protocol buffer columns.
- You can't create an index on repeated protocol buffer fields.
Updates to schemas that contain an index on proto fields
If you define an index on a protocol message field, you can't modify or remove that field from the proto schema. This is because after you define the index, type checking is performed every time the schema is updated. Spanner captures the type information for all fields in the path that are used in the index definition.
Unique indexes
Indexes can be declared UNIQUE
. UNIQUE
indexes add a constraint to the
data being indexed that prohibits duplicate entries for a given index key.
This constraint is enforced by Spanner at transaction commit time.
Specifically, any transaction that would cause multiple index entries for the
same key to exist will fail to commit.
If a table contains non- UNIQUE
data in it to begin with, attempting to
create a UNIQUE
index on it will fail.
A note about UNIQUE NULL_FILTERED indexes
A UNIQUE NULL_FILTERED
index does not enforce index key uniqueness when at
least one of the index's key parts is NULL.
For example, suppose that you created the following table and index:
GoogleSQL
CREATE
TABLE
ExampleTable
(
Key1
INT64
NOT
NULL
,
Key2
INT64
,
Key3
INT64
,
Col1
INT64
,
)
PRIMARY
KEY
(
Key1
,
Key2
,
Key3
);
CREATE
UNIQUE
NULL_FILTERED
INDEX
ExampleIndex
ON
ExampleTable
(
Key1
,
Key2
,
Col1
);
PostgreSQL
CREATE
TABLE
ExampleTable
(
Key1
BIGINT
NOT
NULL
,
Key2
BIGINT
,
Key3
BIGINT
,
Col1
BIGINT
,
PRIMARY
KEY
(
Key1
,
Key2
,
Key3
)
);
CREATE
UNIQUE
INDEX
ExampleIndex
ON
ExampleTable
(
Key1
,
Key2
,
Col1
)
WHERE
Key1
IS
NOT
NULL
AND
Key2
IS
NOT
NULL
AND
Col1
IS
NOT
NULL
;
The following two rows in ExampleTable
have the same values for the secondary
index keys Key1
, Key2
and Col1
:
1
,
NULL
,
1
,
1
1
,
NULL
,
2
,
1
Because Key2
is NULL
and the index is null-filtered, the rows won't be
present in the index ExampleIndex
. Because they are not inserted into the
index, the index won't reject them for violating uniqueness on (Key1, Key2,
Col1)
.
If you want the index to enforce the uniqueness of values of the tuple ( Key1
, Key2
, Col1
), then you must annotate Key2
with NOT NULL
in the table
definition or create the index without filtering nulls.
Drop an index
Use the DROP INDEX
statement to drop a secondary index from
your schema.
To drop the index named SingersByFirstLastName
:
DROP
INDEX
SingersByFirstLastName
;
Index for faster scanning
When Spanner needs to perform a table scan (rather than an indexed lookup) to fetch values from one or more columns, you can receive faster results if an index exists for those columns, and in the order specified by the query. If you frequently perform queries that require scans, consider creating secondary indexes to help these scans happen more efficiently.
In particular, if you need Spanner to frequently scan a table's primary key or other index in reverse order, then you can increase its efficiency through a secondary index that makes the chosen order explicit.
For example, the following query always returns a fast result, even though
Spanner needs to scan Songs
to find the lowest value of SongId
:
SELECT
SongId
FROM
Songs
LIMIT
1
;
SongId
is the table's primary key, stored (as with all primary keys)
in ascending order. Spanner can scan that key's index and find
the first result rapidly.
However, without the help of a secondary index, the following query wouldn't
return as quickly, especially if Songs
holds a lot of data:
SELECT
SongId
FROM
Songs
ORDER
BY
SongId
DESC
LIMIT
1
;
Even though SongId
is the table's primary key, Spanner has no
way to fetch the column's highest value without resorting to a full
table scan.
Adding the following index would allow this query to return more quickly:
CREATE
INDEX
SongIdDesc
On
Songs
(
SongId
DESC
);
With this index in place, Spanner would use it to return a result for the second query much more quickly.
What's next
- Learn about SQL best practices for Spanner .
- Understand query execution plans for Spanner .
- Find out how to troubleshoot performance regressions in SQL queries .