This page shows you how to evaluate your generative AI models and applications across a range of use cases using the GenAI Client in Agent Platform SDK.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project
: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles .
Verify that billing is enabled for your Google Cloud project .
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project
: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles .
Verify that billing is enabled for your Google Cloud project .
-
Install the Agent Platform SDK:
!pip install google-cloud-aiplatform [ evaluation ] -
Set up your credentials. If you are running this tutorial in Colaboratory, run the following:
from google.colab import auth auth . authenticate_user ()For other environments, refer to Authenticate to Agent Platform .
Initialize the GenAI Client
To initialize the GenAI Client, run the following:
from
vertexai
import
Client
client
=
Client
(
project
=
"YOUR_PROJECT_ID"
,
location
=
"YOUR_LOCATION"
)
Where:
-
YOUR_PROJECT_ID: your Google Cloud project ID. -
YOUR_LOCATION: your cloud region, for example,us-central1.
Generate responses
Generate model responses for your dataset using run_inference()
:
-
Prepare your dataset as a Pandas DataFrame:
import pandas as pd eval_df = pd . DataFrame ({ "prompt" : [ "Explain software 'technical debt' using a concise analogy of planting a garden." , "Write a Python function to find the nth Fibonacci number using recursion with memoization, but without using any imports." , "Write a four-line poem about a lonely robot, where every line must be a question and the word 'and' cannot be used." , "A drawer has 10 red socks and 10 blue socks. In complete darkness, what is the minimum number of socks you must pull out to guarantee you have a matching pair?" , "An AI discovers a cure for a major disease, but the cure is based on private data it analyzed without consent. Should the cure be released? Justify your answer." ] }) -
Generate model responses using
run_inference():eval_dataset = client . evals . run_inference ( model = "gemini-2.5-flash" , src = eval_df , ) -
Visualize your inference results by calling
.show()on theEvaluationDatasetobject to inspect the model's outputs alongside your original prompts and references:eval_dataset . show ()
The following image displays the evaluation dataset with prompts and their corresponding generated responses:

Run the evaluation
Run evaluate()
to evaluate the model responses:
-
Evaluate the model responses using the default
GENERAL_QUALITYadaptive rubric-based metric :eval_result = client . evals . evaluate ( dataset = eval_dataset ) -
Visualize your evaluation results by calling
.show()on theEvaluationResultobject to display summary metrics and detailed results:eval_result . show ()
The following image displays an evaluation report, which shows summary metrics and detailed results for each prompt-response pair.

Clean up
No Gemini Enterprise Agent Platform resources are created during this tutorial.

