This page shows you how to evaluate your generative AI models and applications across a range of use cases using the GenAI Client in Vertex AI SDK.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project
: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles .
Verify that billing is enabled for your Google Cloud project .
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project
: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles .
Verify that billing is enabled for your Google Cloud project .
-
Install the Vertex AI SDK for Python:
!pip install google-cloud-aiplatform [ evaluation ] -
Set up your credentials. If you are running this tutorial in Colaboratory, run the following:
from google.colab import auth auth . authenticate_user ()For other environments, refer to Authenticate to Vertex AI .
Generate responses
Generate model responses for your dataset using run_inference()
:
-
Prepare your dataset as a Pandas DataFrame:
import pandas as pd eval_df = pd . DataFrame ({ "prompt" : [ "Explain software 'technical debt' using a concise analogy of planting a garden." , "Write a Python function to find the nth Fibonacci number using recursion with memoization, but without using any imports." , "Write a four-line poem about a lonely robot, where every line must be a question and the word 'and' cannot be used." , "A drawer has 10 red socks and 10 blue socks. In complete darkness, what is the minimum number of socks you must pull out to guarantee you have a matching pair?" , "An AI discovers a cure for a major disease, but the cure is based on private data it analyzed without consent. Should the cure be released? Justify your answer." ] }) -
Generate model responses using
run_inference():eval_dataset = client . evals . run_inference ( model = "gemini-2.5-flash" , src = eval_df , ) -
Visualize your inference results by calling
.show()on theEvaluationDatasetobject to inspect the model's outputs alongside your original prompts and references:eval_dataset . show ()
The following image displays the evaluation dataset with prompts and their corresponding generated responses:

Run the evaluation
Run evaluate()
to evaluate the model responses:
-
Evaluate the model responses using the default
GENERAL_QUALITYadaptive rubric-based metric :eval_result = client . evals . evaluate ( dataset = eval_dataset ) -
Visualize your evaluation results by calling
.show()on theEvaluationResultobject to display summary metrics and detailed results:eval_result . show ()
The following image displays an evaluation report, which shows summary metrics and detailed results for each prompt-response pair.

Clean up
No Vertex AI resources are created during this tutorial.

