Get started with media search

You can quickly build a state-of-the-art media search app. Media search enables your audiences to discover content, with Google-quality results.

For general information about Vertex AI Search for media, see Introduction to media search and recommendations .

In this getting-started tutorial, you will use the Movielens dataset to demonstrate how to upload your media content catalog into Vertex AI Search. The Movielens dataset contains a catalog of movies (documents).

After uploading the movie data, you'll create a search app and test it through the preview page.

If you completed the Get started with media recommendations tutorial and you still have the data store (suggested name quickstart-media-data-store ), then you can use that data store instead of creating another. In this case, you should begin the tutorial at Create an app for media search .

Estimated time to complete this tutorial: ~1 hour.

Objectives

  • Learn how to import media documents to create a media data store.
  • Create, configure and test a search app.

Before following this tutorial, make sure you have done the steps in Before you begin .


To follow step-by-step guidance for this task directly in the Google Cloud console, click Guide me :

Guide me


Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project .

  4. Enable the Vertex AI Agent Builder, Cloud Storage, BigQuery APIs.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Google Cloud project .

  7. Enable the Vertex AI Agent Builder, Cloud Storage, BigQuery APIs.

    Enable the APIs

Prepare the dataset

Note: If you completed the Get started with media recommendations tutorial and you still have the data store (suggested name quickstart-media-data-store ), skip to Create an app for media search .

You use the Cloud Shell to import the Movielens dataset and restructure the dataset for Vertex AI Search for media.

Open the Cloud Shell

  1. Open Google Cloud console .
  2. Select your Google Cloud project.
  3. Take note of the project ID in the Project infocard on the dashboard page. You will need the project ID for the following procedures.
  4. Click the Activate Cloud Shellbutton at the top of the console. A Cloud Shell session opens inside a new frame at the bottom of the Google Cloud console and displays a command-line prompt.

    Cloud Shell

Import the dataset

The Movielens dataset is available in a public Cloud Storage bucket to make it easier to import.

  1. Run the following using your project ID to set the default project for the command-line.

     gcloud  
    config  
     set 
      
    project  
     PROJECT_ID 
     
    
  2. Create a BigQuery dataset:

     bq  
    mk  
    movielens 
    
  3. Load movies.csv into a new movies BigQuery table:

     bq  
    load  
    --skip_leading_rows = 
     1 
      
    movielens.movies  
     \ 
      
    gs://cloud-samples-data/gen-app-builder/media-recommendations/movies.csv  
     \ 
      
    movieId:integer,title,genres 
    
  4. Load ratings.csv into a new ratings BigQuery table:

     bq  
    load  
    --skip_leading_rows = 
     1 
      
    movielens.ratings  
     \ 
      
    gs://cloud-samples-data/gen-app-builder/media-recommendations/ratings.csv  
     \ 
      
    userId:integer,movieId:integer,rating:float,time:timestamp 
    

Create BigQuery views

In this step, you will restructure the Movielens dataset so it follows the expected format for media data stores.

For this guide, you will create fake view-item user events during the past 90 days from positive ratings ( < 4 ).

  1. Create a view that converts the movies table into the Document schema :

      bq 
      
     mk 
      
     -- 
     project_id 
     = 
      PROJECT_ID 
     
      
    \  
     -- 
     use_legacy_sql 
     = 
     false 
      
    \  
     -- 
     view 
      
     ' 
      
     WITH 
      
     t 
      
     AS 
      
     ( 
      
     SELECT 
      
     CAST 
     ( 
     movieId 
      
     AS 
      
     string 
     ) 
      
     AS 
      
     id 
     , 
      
     SUBSTR 
     ( 
     title 
     , 
      
     0 
     , 
      
     128 
     ) 
      
     AS 
      
     title 
     , 
      
     SPLIT 
     ( 
     genres 
     , 
      
     "|" 
     ) 
      
     AS 
      
     categories 
      
     FROM 
      
     ` 
      PROJECT_ID 
     
     . 
     movielens 
     . 
     movies 
     ` 
     ) 
      
     SELECT 
      
     id 
     , 
      
     "default_schema" 
      
     as 
      
     schemaId 
     , 
      
     null 
      
     as 
      
     parentDocumentId 
     , 
      
     TO_JSON_STRING 
     ( 
     STRUCT 
     ( 
     title 
      
     as 
      
     title 
     , 
      
     categories 
      
     as 
      
     categories 
     , 
      
     CONCAT 
     ( 
     "http://mytestdomain.movie/content/" 
     , 
      
     id 
     ) 
      
     as 
      
     uri 
     , 
      
     "2023-01-01T00:00:00Z" 
      
     as 
      
     available_time 
     , 
      
     "2033-01-01T00:00:00Z" 
      
     as 
      
     expire_time 
     , 
      
     "movie" 
      
     as 
      
     media_type 
     )) 
      
     as 
      
     jsonData 
      
     FROM 
      
     t 
     ; 
     ' 
      
    \ movielens 
     . 
     movies_view 
     
    

    Now the new view has the schema that the Vertex AI Agent Builder API expects.

  2. Go to the BigQuerypage in Google Cloud console.

    Go to BigQuery

  3. In the Explorerpane, expand your project name, expand the movielens dataset and click movies_view to open the query page for this view.

    Products view

  4. Go to the Table explorertab.

  5. In the Generated querypane, click the Copy to querybutton. The query editor opens.

  6. Click Runto see movie data in the view that you created.

  7. Create fictitious user events from movie ratings by running the following Cloud Shell command:

     bq mk --project_id= PROJECT_ID 
    \
     --use_legacy_sql=false \
     --view '
     WITH t AS (
      SELECT
        MIN(UNIX_SECONDS(time)) AS old_start,
        MAX(UNIX_SECONDS(time)) AS old_end,
        UNIX_SECONDS(TIMESTAMP_SUB(
        CURRENT_TIMESTAMP(), INTERVAL 90 DAY)) AS new_start,
        UNIX_SECONDS(CURRENT_TIMESTAMP()) AS new_end
      FROM ` PROJECT_ID 
    .movielens.ratings` 
    )
      SELECT
        CAST(userId AS STRING) AS userPseudoId,
        "view-item" AS eventType,
        FORMAT_TIMESTAMP("%Y-%m-%dT%X%Ez",
        TIMESTAMP_SECONDS(CAST(
          (t.new_start + (UNIX_SECONDS(time) - t.old_start) *
          (t.new_end - t.new_start) / (t.old_end - t.old_start))
        AS int64))) AS eventTime,
        [STRUCT(movieId AS id, null AS name)] AS documents,
      FROM ` PROJECT_ID 
    .movielens.ratings` 
    , t
      WHERE rating >= 4;' \
      movielens.user_events 
    

Activate Vertex AI Agent Builder

  1. In the Google Cloud console, go to the Agent Builderpage.

    Agent Builder

  2. Read and agree to the Terms of Service, then click Continue and activate the API.

The procedures in this section guide you through creating and deploying a media search app.

  1. In the Google Cloud console, go to the Agent Builderpage.

    Agent Builder

  2. Click Create app .

  3. On the Create apppage, select Search.

  4. Under Content, click Media.

  5. In the Your app namefield, enter a name for your app such as quickstart-media-search . Your app ID appears under the engine name.

  6. Click Continue.

  7. If you completed the Get started with media recommendations tutorial and you still have the data store (suggested name quickstart-media-data-store ), then select it, click Create, and skip to Preview search .

  8. If you don't have a data store that contains the movielens dataset, create a new data store and select it:

    1. On the Data Storespage, click Create data store.

    2. Enter a display name for your data store, such as quickstart-media-data-store , and then click Create.

    3. Select the data store you just created, and then click Createto create your app. You will be redirected to the Select a data sourcepage.

Import data

Next, import the movies and user events data that were formatted earlier.

Import documents

  1. If you are not automatically redirected to the Select a data sourcepage:

    • Open the Documentstab.
    • Click Import Data.
  2. On the Select a data sourcepage, select BigQuery.

  3. Enter the name of the movies BigQuery view you created and click Import.

      PROJECT_ID 
    .movielens.movies_view 
    
  4. Wait until all documents have been imported, which should take about 15 minutes. There should be 86537 documents when complete.

    You can check the Activitytab for the import operation status. When the import is complete, the import operation status changes to Completed.

Import user events

  1. Open the Eventstab.

  2. Click Import Events.

  3. Select BigQuery.

  4. Enter the name of the user_events BigQuery view you created and click Import.

      PROJECT_ID 
    .movielens.user_events 
    
  5. You can proceed to the next step before the events are imported, but the search results won't yet contain the full dataset.

    You can check the Activitytab for the operation status. The process takes about an hour to complete because you are importing millions of rows.

  1. In the navigation menu, click Configurations .

  2. In the Search herebox, type the name of a movie, such as "The Lord of the Rings".

  3. Notice that the search results are relevant to the movie title entered.

  4. On this page, you can customize how the search widget displays the search result information. See Configure results for the search widget to learn more.

    For media search apps, you can:

    After making changes click Save and publishto update the widget.

Deploy the search widget

  1. In the navigation menu, click Integration.

  2. Make sure the Widgettab is selected.

  3. Select JWT or OAuth basedas the widget authorization type.

  4. In the Domainfield, enter the domain name for web page where you will put the widget. For example, if you are going to copy the widget to the web page example.com/ai.html , enter example.com as the domain.

  5. Click Add, and then click Save.

  6. Copy the code snippet provided in the Copy the following code to your web applicationsection.

  7. In your codebase, generate an authorization token.

  8. To pass the authorization token to your widget, use the "Set authorization token" code snippet provided in the Copy the following code to your web applicationsection and replace the text <JWT or OAuth token provided by you backend> with your authorization token.

  9. For help integrating the search app into your web app, see the code samples at Get search results .

Clean up

To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.

You can reuse the data store you created for media recommendations in the Get started with media recommendations tutorial. Try that tutorial before doing this clean up procedure.

  1. To avoid unnecessary Google Cloud charges, use the Google Cloud console to delete your project if you don't need it.
  2. If you created a new project to learn about Vertex AI Agent Builder and you no longer need the project, delete the project .
  3. If you used an existing Google Cloud project, delete the resources you created to avoid incurring charges to your account. For more information, see Delete an app , Purge data from a data store , and Delete a data store .
  4. Follow the steps in Turn off Vertex AI Agent Builder .
  5. If you created a BigQuery dataset, delete it in Cloud Shell:

     bq  
    rm  
    --recursive  
    --dataset  
    movielens 
    

What's next