Deploy a database connector

Warning: The Cloud Search reference connectors are provided "as is" as sample code for use in creating your own working connectors. This sample code requires substantial customization and testing before being used in proof-of-concept or production environments. For production use, we highly recommend obtaining help from one of our Cloud Search partners. For further help finding a suitable Cloud Search partner, contact your Google Account Manager.

You can set up Google Cloud Search to discover and index data from your organization's databases by using the Google Cloud Search database connector.

Important considerations

You can install and run the Cloud Search database connector in almost any environment where Java apps can run, so long as the connector has access to both the internet and the database.

System requirements

System requirements
Operating system
Windows or Linux
SQL database
Any SQL database with a JDBC 4.0 or later compliant driver, including the following:
  • MS SQL Server (2008, 2012, 2014, 2016)
  • Oracle (11g, 12c)
  • Google Cloud SQL
  • MySQL
Software
JDBC driver for the connector to use to access the database (downloaded and installed separately)

Deploy the connector

The following steps describe how to install the connector and configure it to index the specified databases and return the results to Cloud Search users.

Prerequisites

Before you deploy the Cloud Search database connector, gather the following information:

  • Google Workspace private key, which also contains the service account ID. To learn how to get a private key, go to
Configure access to the Google Cloud Search REST API .
  • Google Workspace data source ID. To learn how to get a data source ID, go to Add a data source to search .
  • Step 1. Download and build the database connector software

    1. Clone the connector repository from GitHub.
       $  
       
      git  
      clone  
      https://github.com/google-cloudsearch/database-connector.git $  
       
       cd 
        
      database-connector
    2. Check out the desired version of the connector:
       $  
       
      git  
      checkout  
      tags/v1-0.0.3
    3. Build the connector.
       $  
       
      mvn  
      package
      To skip the tests when building the connector, use mvn package -DskipTests .
    4. Copy the connector zip file to your local installation directory and unzip it:
       $  
       
      cp  
      target/google-cloudsearch-database-connector-v1-0.0.3.zip  
       installation-dir 
       $  
       
       cd 
        
       installation-dir 
       $  
       
      unzip  
      google-cloudsearch-database-connector-v1-0.0.3.zip $  
       
       cd 
        
      google-cloudsearch-database-connector-v1-0.0.3

    Step 2. Configure the database connector

    1. Create a text file and name it connector-config.properties (the default) or similar. Google recommends that you name configuration files with the .properties or .config extension and keep the file in the same directory as the connector. If you use a different name or path, you must specify the path when you run the connector.
    2. Add parameters as key/value pairs to the file contents. The configuration file must specify the parameters for data source access, database access, a database full traversal SQL statement, a content field title, and column definitions. You can also configure other connector behavior with optional parameters. For example:
       # Required parameters for data source access 
      api.sourceId = 
      1234567890abcdef
      api.identitySourceId = 
      0987654321lmnopq
      api.serviceAccountPrivateKeyFile = 
      ./PrivateKey.json # 
       # Required parameters for database access 
      db.url = 
      jdbc:mysql://localhost:3306/mysql_test
      db.user = 
      root
      db.password = 
      passw0rd # 
       # Required full traversal SQL statement parameter 
      db.allRecordsSql = 
       select 
        
      customer_id,  
      first_name,  
      last_name,  
      phone,  
      change_timestamp  
      from  
      address_book # 
       # Required parameters for column definitions and URL format 
      db.allColumns = 
      customer_id,  
      first_name,  
      last_name,  
      phone,  
      change_timestamp
      db.uniqueKeyColumns = 
      customer_id
      url.columns = 
      customer_id # 
       # Required content field parameter 
      contentTemplate.db.title = 
      customer_id # 
       # Optional parameters to set ACLs to "entire domain" access 
      defaultAcl.mode = 
      fallback
      defaultAcl.public = 
       true 
       # 
       # Optional parameters for schedule traversals 
      schedule.traversalIntervalSecs = 
       36000 
      schedule.performTraversalOnStart = 
       true 
      schedule.incrementalTraversalIntervalSecs = 
       3600 
      

      For detailed descriptions of the database-specific parameters, go to the Configuration parameters reference at the end of this article.

      To learn about the parameters that are common to all Cloud Search connectors, such as metadata configuration, datetime formats, and ACL options, go to Google-supplied connector parameters .

      If applicable, specify properties of the schema object in the traversal SQL query parameters. Usually you can add aliases to the SQL statement. For example, if you have a movie database and the data source schema contains a property definition named "ActorName", a SQL statement could have the form: SELECT …, last_name AS ActorName, … FROM … .

    Step 3. Run the database connector

    The following example assumes the required components are located in the local directory on a Linux system.

    To run the connector from the command line, enter the following command:

     java 
      
     \ 
      
     - 
     cp 
      
     "google-cloudsearch-database-connector-v1-0.0.3.jar:mysql-connector-java-5.1.41-bin.jar" 
      
     \ 
      
     com 
     . 
     google 
     . 
     enterprise 
     . 
     cloudsearch 
     . 
     database 
     . 
     DatabaseFullTraversalConnector 
      
     \ 
      
     [ 
     - 
     Dconfig 
     = 
     mysql 
     . 
     config 
     ] 
    

    Where:

    • google-cloud-search-database-connector-v1-0.0.3.jar is the database connector .jar file
    • mysql-connector-java-5.1.41-bin.jar is the JDBC driver being used to access the database
    • mysql.config is a custom-named configuration file. To ensure the connector recognizes your configuration file, specify its path on the command line. Otherwise, the connector uses connector-config.properties in your local directory as the default filename.

    The connector reports configuration errors as it detects them. Some errors are reported when the connector initializes, such as when a database column is defined as part of the record content (in db.allColumns ), but the column isn't used in the traversal SQL query of the database (in db.allRecordsSql ). Other errors are only detected and reported when the connector attempts to access the database for the first traversal, such as invalid SQL statement syntax.

    Configuration parameters reference

    Data source access parameters

    Setting Parameter
    Data source ID api.sourceId = source-ID

    Required. The Cloud Search source ID that the Google Workspace administrator set up.

    Identity source ID api.identitySourceId = identity-source-ID

    Required to use external users and groups for ACLs. The Cloud Search identity source ID that the Google Workspace administrator set up.

    Service account api.serviceAccountPrivateKeyFile = path-to-private-key

    Required. The path to the Cloud Search service account key file that the Google Workspace administrator createed.

    Database access parameters

    Setting Parameter
    Database URL db.url = database-URL

    Required. The full path of the database to be accessed, such as jdbc:mysql://127.0.0.1/dbname .

    Database username and password db.user = username
    db.password = password

    Required. A valid username and password that the connector uses to access the database. This database user must have read access to the relevant records of the database being read.

    JDBC driver db.driverClass = oracle.jdbc.OracleDriver

    Required only if the JDBC 4.0 driver is not already specified in the class path.

    Traversal SQL query parameters

    The connector traverses database records with SQL SELECT queries in the configuration file. You must configure a full traversal query; queries for incremental traversals are optional.

    A full traversalreads every database record configured for indexing. A full traversal is required to index new records for Cloud Search and also to re-index all existing records.

    An incremental traversalreads and re-indexes only newly modified database records and recent entries to the database. Incremental traversals can be more efficient than full traversals. To use incremental traversals, your database must contain timestamp fields to indicate modified records.

    The connector executes these traversals according to the schedules you define in

    traversal schedule parameters .
    Setting
    Parameter
    Full traversal query
    db.allRecordsSql = SELECT column-1 [, column-2 ,...] FROM database-name

    Required. The query run for every full traversal.

    Every column name that the connector will use in any capacity (content, unique ID, ACLs) must be present in this query. The connector performs some preliminary verifications at startup to detect errors and omissions. For this reason, do not use a general " SELECT * FROM … " query.

    Full traversal pagination
    db.allRecordsSql.pagination = {none | offset}

    Value can be:

    • none : do not use pagination
    • offset : use pagination by row offset

      To use pagination by offset, the SQL query must have a placeholder question mark ( ? ) for a row offset, starting with zero. In each full traversal, the query is executed repeatedly until no results are returned.

    Incremental traversal query
    db.incrementalUpdateSql = SELECT column-1 [, column-2 ,...] FROM database-name WHERE last_update_time > ?

    Required if you schedule incremental traversals .

    The "?" in the query is a mandatory placeholder for a timestamp value. The connector uses the timestamp to track modifications between incremental traversal SQL queries.

    To track the database timestamp column for the last update time, add the timestamp_column alias to the SQL statement; otherwise, use the current timestamp of the connector traversal.

    For the first incremental traversal, the connector uses the start time of the connector. After the first incremental traversal, Cloud Search stores the timestamp so that connector restarts are able to access the previous incremental traversal timestamp.

    Database time zone
    db.timestamp.timezone = America/Los_Angeles

    Specifies the time zone to use for database timestamps. The database timestamp used to identify new record additions or newly modified database records. The default is the local time zone where the connector is running.

    Traversal SQL query examples

    • Basic full traversal query that reads every record of interest in an employee database for indexing:
       db 
       . 
       allRecordsSql 
        
       = 
        
       SELECT 
        
       customer_id 
       , 
        
       first_name 
       , 
        
       last_name 
       , 
        
       employee_id 
       , 
        
       interesting_field 
        
       \ 
        
       FROM 
        
       employee 
      
    • Specify pagination by offset, and break up a full traversal into multiple queries.

      For SQL Server 2012 or Oracle 12c (standard SQL 2008 syntax):

       db 
       . 
       allRecordsSql 
        
       = 
        
       SELECT 
        
       customer_id 
       , 
        
       first_name 
       , 
        
       last_name 
       , 
        
       employee_id 
       , 
        
       interesting_field 
        
       \ 
        
       FROM 
        
       employee 
        
       \ 
        
       ORDER 
        
       BY 
        
       customer_id 
        
       OFFSET 
        
       ? 
        
       ROWS 
        
       FETCH 
        
       FIRST 
        
       1000 
        
       ROWS 
        
       ONLY 
       db 
       . 
       allRecordsSql 
       . 
       pagination 
        
       = 
        
       offset 
      

      or, for MySQL or Google Cloud SQL:

       db 
       . 
       allRecordsSql 
        
       = 
        
       SELECT 
        
       customer_id 
       , 
        
       first_name 
       , 
        
       last_name 
       , 
        
       employee_id 
       , 
        
       interesting_field 
        
       \ 
        
       FROM 
        
       employee 
        
       \ 
        
       ORDER 
        
       BY 
        
       customer_id 
        
       LIMIT 
        
       1000 
        
       OFFSET 
        
       ? 
       db 
       . 
       allRecordsSql 
       . 
       pagination 
        
       = 
        
       offset 
      
    • Full traversal query that applies individual ACLs with aliases:
       db 
       . 
       allRecordsSql 
        
       = 
        
       SELECT 
        
       customer_id 
       , 
        
       first_name 
       , 
        
       last_name 
       , 
        
       employee_id 
       , 
        
       interesting_field 
       , 
        
       last_update_time 
       , 
        
       \ 
        
       permitted_readers 
        
       AS 
        
       readers_users 
       , 
        
       \ 
        
       denied_readers 
        
       AS 
        
       denied_users 
       , 
        
       \ 
        
       permitted_groups 
        
       AS 
        
       readers_groups 
       , 
        
       \ 
        
       denied_groups 
        
       AS 
        
       denied_groups 
        
       \ 
        
       FROM 
        
       employee 
      
    • Basic incremental traversal query:
       db 
       . 
       incrementalUpdateSql 
        
       = 
        
       SELECT 
        
       customer_id 
       , 
        
       first_name 
       , 
        
       last_name 
       , 
        
       employee_id 
       , 
        
       interesting_field 
       , 
        
       last_update_time 
        
       \ 
        
       FROM 
        
       employee 
        
       \ 
        
       WHERE 
        
       last_update_time 
        
       > 
        
       ? 
      
    • Incremental traversal query that applies individual ACLs with aliases:
       db 
       . 
       incrementalUpdateSql 
        
       = 
        
       SELECT 
        
       customer_id 
       , 
        
       first_name 
       , 
        
       last_name 
       , 
        
       employee_id 
       , 
        
       interesting_field 
       , 
        
       last_update_time 
       , 
        
       \ 
        
       permitted_readers 
        
       AS 
        
       readers_users 
       , 
        
       \ 
        
       denied_readers 
        
       AS 
        
       denied_users 
       , 
        
       \ 
        
       permitted_groups 
        
       AS 
        
       readers_groups 
       , 
        
       \ 
        
       denied_groups 
        
       AS 
        
       denied_groups 
        
       \ 
        
       FROM 
        
       employee 
        
       \ 
        
       WHERE 
        
       last_update_time 
        
       > 
        
       ? 
      
    • Incremental traversal query that uses the database timestamp rather than the current time:
       db 
       . 
       incrementalUpdateSql 
        
       = 
        
       SELECT 
        
       customer_id 
       , 
        
       first_name 
       , 
        
       last_name 
       , 
        
       employee_id 
       , 
        
       interesting_field 
       , 
        
       \ 
        
       last_update_time 
        
       AS 
        
       timestamp_column 
        
       \ 
        
       FROM 
        
       employee 
        
       \ 
        
       WHERE 
        
       last_update_time 
        
       > 
        
       ? 
      

    Column definition parameters

    The following parameters specify the columns that you use in the traversal statements and to uniquely identify each record.

    Setting Parameter
    All columns db.allColumns = column-1 , column-2 , ... column-N

    Required. Identifies all the columns that are required in a SQL query when accessing the database. The columns defined with this parameter must be explicitly referenced in the queries. Every other column definition parameter is compared against this set of columns.

    Example:

    db.allColumns = customer_id, first_name, last_name, phone, change_timestamp
    Unique key columns db.uniqueKeyColumns = column-1 [, column-2 ]

    Required. Lists either a single database column that contains unique values or by a combination of columns whose values together define a unique ID.

    Cloud Search requires every searchable document to have a unique identifier within a data source. You must be able to define a unique ID for each database record from column values. If you run multiple connectors on separate databases but index into a common dataset, make sure that you specify a unique ID across all documents.

    Examples:

    db.uniqueKeyColumns = customer_id
    # or
    db.uniqueKeyColumns = last_name, first_name
    URL link column url.columns = column-1 [, column-2 ]

    Required. Specifies one or more valid, defined names of the columns used for the URL used for a clickable search result. For databases that have no relevant URL associated with each database record, a static link can be used for every record.

    However, if the column values do define a valid link for each record, the view URL columns and format configuration values should be specified.

    URL format url.format = https://www.example.com/{0}

    Defines the format of the view URL. Numbered parameters refer to the columns specified in db.columns, in order, starting with zero.

    If not specified, the default is " {0}. "

    Examples follow this table.

    Percent-encoded columns for URL url.columnsToEscape = column-1 [, column-2 ]

    Specifies columns from db.columnswhose values will be percent-encoded before including them in the formatted URL string.

    URL column examples

    To specify the columns used in traversal queries and the format of the view URL:

    • To use a static URL not using any database record values:
      url.format  
       = 
        
      https://www.example.com
    • To use a single column value that is the view URL:
      url.format  
       = 
        
       { 
       0 
       } 
      url.columns  
       = 
        
      customer_id
    • To use a single column value that is substituted into the view URL at position {0}:
      url.format  
       = 
        
      https://www.example.com/customer/id ={ 
       0 
       } 
      url.columns  
       = 
        
      customer_id
      url.columnsToEscape  
       = 
        
      customer_id
    • To use multiple column values to build the view URL (columns are order-dependent):
      url.format  
       = 
        
       { 
       1 
       } 
      /customer ={ 
       0 
       } 
      url.columns  
       = 
        
      customer_id,  
      linked_url
      url.columnsToEscape  
       = 
        
      customer_id

    Content fields

    Use the content options to define which record values should be made part of the searchable content.

    Setting Parameter
    Highest-quality search column contentTemplate.db.title = column-name

    Required. The highest-quality column for search indexing and result prioritization.

    Column prioritization for search contentTemplate.db.quality.high = column-1 [, column-2 ...]
    contentTemplate.db.quality.medium = column-1 [, column-2 ...]
    contentTemplate.db.quality.low = column-1 [, column-2 ...]

    Designate content columns (except the column set for contentTemplate.db.title ) as high, medium, or low search quality fields. Unspecified columns default to low.

    Content data columns db.contentColumns = column-1 [, column-2 ...]

    Specify content columns in the database. These are formatted and uploaded to Cloud Search as searchable document content.

    If you don't specify a value, the default is " * " indicating that all columns should be used for content.

    Blob column db.blobColumn = column-name

    Specify the name of a single blob column to use for document content instead of a combination of content columns.

    If a blob column is specified, then it is considered an error if content columns are also defined. However, metadata and structured data column definitions are still allowed along with blob columns.

    Create a Mobile Website
    View Site in Mobile | Classic
    Share by: