Tutorial: Perform evaluation using the GenAI Client in Agent Platform SDK

This page shows you how to evaluate your generative AI models and applications across a range of use cases using the GenAI Client in Agent Platform SDK.

Before you begin

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project : To create a project, you need the Project Creator role ( roles/resourcemanager.projectCreator ), which contains the resourcemanager.projects.create permission. Learn how to grant roles .
Note : If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector

Verify that billing is enabled for your Google Cloud project .
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project : To create a project, you need the Project Creator role ( roles/resourcemanager.projectCreator ), which contains the resourcemanager.projects.create permission. Learn how to grant roles .
Note : If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector

Verify that billing is enabled for your Google Cloud project .

Install the Agent Platform SDK:

 !pip  
install  
google-cloud-aiplatform [ 
evaluation ]

Set up your credentials. If you are running this tutorial in Colaboratory, run the following:
```
  from 
  
 google.colab 
  
 import 
 auth 
 auth 
 . 
 authenticate_user 
 () 
 
```
For other environments, refer to Authenticate to Agent Platform .

Initialize the GenAI Client

To initialize the GenAI Client, run the following:

  from 
  
 vertexai 
  
 import 
 Client 
 client 
 = 
 Client 
 ( 
 project 
 = 
 "YOUR_PROJECT_ID" 
 , 
 location 
 = 
 "YOUR_LOCATION" 
 )

Where:

YOUR_PROJECT_ID : your Google Cloud project ID.
YOUR_LOCATION : your cloud region, for example, us-central1 .

Generate responses

Generate model responses for your dataset using run_inference() :

Prepare your dataset as a Pandas DataFrame:

  import 
  
 pandas 
  
 as 
  
 pd 
 eval_df 
 = 
 pd 
 . 
 DataFrame 
 ({ 
 "prompt" 
 : 
 [ 
 "Explain software 'technical debt' using a concise analogy of planting a garden." 
 , 
 "Write a Python function to find the nth Fibonacci number using recursion with memoization, but without using any imports." 
 , 
 "Write a four-line poem about a lonely robot, where every line must be a question and the word 'and' cannot be used." 
 , 
 "A drawer has 10 red socks and 10 blue socks. In complete darkness, what is the minimum number of socks you must pull out to guarantee you have a matching pair?" 
 , 
 "An AI discovers a cure for a major disease, but the cure is based on private data it analyzed without consent. Should the cure be released? Justify your answer." 
 ] 
 })

Generate model responses using run_inference() :

  eval_dataset 
 = 
 client 
 . 
 evals 
 . 
 run_inference 
 ( 
 model 
 = 
 "gemini-2.5-flash" 
 , 
 src 
 = 
 eval_df 
 , 
 )

Visualize your inference results by calling .show() on the EvaluationDataset object to inspect the model's outputs alongside your original prompts and references:
```
  eval_dataset 
 . 
 show 
 () 
 
```

The following image displays the evaluation dataset with prompts and their corresponding generated responses:

A table showing an evaluation dataset with columns for prompts and responses.

Run the evaluation

Run evaluate() to evaluate the model responses:

Evaluate the model responses using the default GENERAL_QUALITY adaptive rubric-based metric :

  eval_result 
 = 
 client 
 . 
 evals 
 . 
 evaluate 
 ( 
 dataset 
 = 
 eval_dataset 
 )

Visualize your evaluation results by calling .show() on the EvaluationResult object to display summary metrics and detailed results:
```
  eval_result 
 . 
 show 
 () 
 
```

The following image displays an evaluation report, which shows summary metrics and detailed results for each prompt-response pair.

An evaluation report displaying summary metrics alongside detailed results for each prompt-response pair.

Clean up

No Gemini Enterprise Agent Platform resources are created during this tutorial.

What's next

Define your evaluation metrics .