Learn how to get started with Gen AI evaluation service using the Google Google Cloud console.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Verify that billing is enabled for your Google Cloud project .
-
Make sure that you have the following role or roles on the project: Storage Admin
Check for the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
-
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
- For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.
Grant the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
- Click Grant access .
-
In the New principals field, enter your user identifier. This is typically the email address for a Google Account.
- In the Select a role list, select a role.
- To grant additional roles, click Add another role and add each additional role.
- Click Save .
-
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Verify that billing is enabled for your Google Cloud project .
-
Make sure that you have the following role or roles on the project: Storage Admin
Check for the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
-
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
- For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.
Grant the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
- Click Grant access .
-
In the New principals field, enter your user identifier. This is typically the email address for a Google Account.
- In the Select a role list, select a role.
- To grant additional roles, click Add another role and add each additional role.
- Click Save .
-
Evaluate your model
To evaluate your model:
-
In the Google Cloud console, go to the Gen AI Evaluation page.
-
Click New evaluationto open the evaluation page.
-
For Define evaluation dataset, select an option:
-
Upload file: Click Uploadto upload a CSV or JSONL file. The dataset should contain either prompts or records to use in a prompt template and optionally model responses, with a maximum of 200 rows.
-
Generate data: Enter a Prompt templateto guide the Gen AI evaluation service in generating a dataset. Variables you define in your prompt template are generated and populated in the dataset. For more information, see Use prompt templates .
-
Define variables to generate: Specify variables to generate and descriptions of the variable to guide generation. If needed, click Add another variable description.
-
Enter a Number of samplesto generate.
-
Click Generate and preview datasetto display a generated dataset based on your prompt template and variables. To adjust the dataset, you can add more details to the variable descriptions and click Re-generate.
-
-
Use model logs: Use the snapshot of prompts and responses from the logged traffic of the selected model. You can only use this option if you have request-response logs enabled on a deployed model in Vertex AI. If you just enabled logging, allow time for sufficient samples to accumulate.
-
Select the Modeland the Regionyou want to log traffic from. You must have enabled logging already on your selected model and region.
-
Enter a Sampling count.
-
(Optional) Enable Filter by prompt templateto use only logs that match your Prompt template. This can be useful if you use your selected models for a variety of use cases and want to evaluate one specific use case.
-
-
-
For Define model responses to evaluate, select an option:
-
From dataset(only available if you selected Upload filefor Define evaluation dataset): If you want to use one of the fields in the uploaded dataset as your response, select a Response column.
-
From model(only available if you selected Use model logsfor Define evaluation dataset): If you're using model logs as the evaluation dataset, the Gen AI evaluation service uses the model responses from the model logs.
-
Call model: Select a model. The Gen AI evaluation service runs prompts on the selected model and uses the responses for evaluation.
-
-
(Optional) For Auto-generated evaluation metrics, you can Specify custom instructionsto guide the rubrics generated from each prompt. For example,
Evaluate the dataset on cultural sensitivity to the countries {name}
. For more information, see Define your evaluation metrics . -
For Name and output directory, enter the following:
-
Evaluation name: Enter a name for your evaluation.
-
Output private data path: Enter the name of a Cloud Storage bucket where you want to store your evaluation, or click Browse to choose the bucket.
-
-
Click Evaluate.
View your evaluation results
To view an evaluation result:
-
In the Google Cloud console, go to the Gen AI Evaluation page.
-
Click the evaluation name.
-
For each prompt in your evaluation dataset, the model's response displays along with the evaluation results.