Details for managed rubric-based metrics

This page provides a full list of managed rubric-based metrics offered by the Gen AI evaluation service, which you can use in the GenAI Client in Vertex AI SDK.

For more information about test-driven evaluation, see Define your evaluation metrics .

Overview

The Gen AI evaluation service offers a list of managed rubric-based metrics for the test-driven evaluation framework:

For metrics with adaptive rubrics, most of them include both the workflow for rubric generation for each prompt and rubric validation. You can run them separately if needed. See Run an evaluation for details.
For metrics with static rubrics, no per-prompt rubrics are generated. For details regarding the intended outputs, see Metric details .

Each managed rubric-based metric has a versioning number. The metric uses the latest version by default, but you can pin to a specific version if needed:

  from 
  
 vertexai 
  
 import 
 types 
 text_quality_metric 
 = 
 types 
 . 
 RubricMetric 
 . 
 TEXT_QUALITY 
 general_quality_v1 
 = 
 types 
 . 
 RubricMetric 
 . 
 GENERAL_QUALITY 
 ( 
 version 
 = 
 'v1' 
 )

Backward compatibility

For metrics offered as a Metric prompt templates , you can still access the pointwise metrics through the GenAI Client in Vertex AI SDK through the same approach. Pairwise metrics are not supported by the GenAI Client in Vertex AI SDK, but see Run an evaluation to compare two models in the same evaluation.

  from 
  
 vertexai 
  
 import 
 types 
 # Access metrics represented by metric prompt template examples 
 coherence 
 = 
 types 
 . 
 RubricMetric 
 . 
 COHERENCE 
 fluency 
 = 
 types 
 . 
 RubricMetric 
 . 
 FLUENCY

Managed metrics details

This section lists managed metrics with details such as their type, required inputs, and expected output:

General quality
Text quality
Instruction following
Grounding
Safety
Multi-turn general quality
Multi-turn text quality
Agent final response match
Agent final response reference free
Agent final response quality
Agent hallucination
Agent tool use quality

General quality

Latest version

general_quality_v1

Type

Adaptive rubrics

Description

A comprehensive adaptive rubrics metric that evaluates the overall quality of a model's response. It automatically generates and assesses a broad range of criteria based on the prompt's content. This is the recommended starting point for most evaluations.

How to access in SDK

types.RubricMetric.GENERAL_QUALITY

Input

prompt
response
(Optional) rubric_groups

If you have rubrics already generated, you can provide them directly for evaluation.

Output

score
rubrics and corresponding verdicts

The score represents the passing rate of the response based on the rubrics.

Number of LLM calls

6 calls to Gemini 2.5 Flash

Text quality

Latest version

text_quality_v1

Type

Adaptive rubrics

Description

A targeted adaptive rubrics metric that specifically evaluates the linguistic quality of the response. It assesses aspects like fluency, coherence, and grammar.

How to access in SDK

types.RubricMetric.TEXT_QUALITY

Input

prompt
response
(Optional) rubric_groups

If you have rubrics already generated, you can provide them directly for evaluation.

Output

score
rubrics and corresponding verdicts

The score represents the passing rate of the response based on the rubrics.

Number of LLM calls

6 calls to Gemini 2.5 Flash

Instruction following

Latest version

instruction_following_v1

Type

Adaptive rubrics

Description

A targeted adaptive rubrics metric that measures how well the response adheres to the specific constraints and instructions given in the prompt.

How to access in SDK

types.RubricMetric.INSTRUCTION_FOLLOWING

Input

prompt
response
(Optional) rubric_groups

If you have rubrics already generated, You can provide them directly for evaluation.

Output

score (passing rate)
rubrics and corresponding verdicts

The score represents the passing rate of the response based on the rubrics.

Number of LLM calls

6 calls to Gemini 2.5 Flash

Grounding

Latest version

grounding_v1

Type

Static rubrics

Description

A score-based metric that checks for factuality and consistency. It verifies that the model's response is grounded based on the context.

How to access in SDK

types.RubricMetric.GROUNDING

Input

prompt
response
context

Output

score
explanation

The score has a range of 0-1 , and represents the rate of claims labeled as supported or no_rad (not requiring factual attributions, such as greetings, questions, or disclaimers) to the input prompt.
The explanation contains groupings of sentence, label, reasoning and excerpt from context.

Number of LLM calls

1 call to Gemini 2.5 Flash

Safety

Latest version

safety_v1

Type

Static rubrics

Description

A score-based metric that assesses whether the model's response violated one or more of the following policies:

PII & Demographic Data
Hate Speech
Dangerous Content
Harassment
Sexually Explicit

How to access in SDK

types.RubricMetric.SAFETY

Input

prompt
response

Output

score
explanation

For the score, 0 is unsafe and 1 is safe.
The explanation field includes violated policies.

Number of LLM calls

10 calls to Gemini 2.5 Flash

Multi-turn general quality

Latest version

multi_turn_general_quality_v1

Type

Adaptive rubrics

Description

An adaptive rubrics metric that evaluates the overall quality of a model's response within the context of a multi-turn dialogue.

How to access in SDK

types.RubricMetric.MULTI_TURN_GENERAL_QUALITY

Input

prompt with multi-turn conversations
response
(Optional) rubric_groups

If you have rubrics already generated, you can provide them directly for evaluation.

Output

score
rubrics and corresponding verdicts

The score represents the passing rate of the response based on the rubrics.

Number of LLM calls

6 calls to Gemini 2.5 Flash

Multi-turn text quality

Latest version

multi_turn_text_quality_v1

Type

Adaptive rubrics

Description

An adaptive rubrics metric that evaluates the text quality of a model's response within the context of a multi-turn dialogue.

How to access in SDK

types.RubricMetric.TEXT_QUALITY

Input

prompt with multi-turn conversations
response
(Optional) rubric_groups

If you have rubrics already generated, you can provide them directly for evaluation.

Output

score
rubrics and corresponding verdicts

The score represents the passing rate of the response based on the rubrics.

Number of LLM calls

6 calls to Gemini 2.5 Flash

Agent final response match

Latest version

final_response_match_v2

Type

Static rubrics

Description

A metric that evaluates the quality of an AI agent's final answer by comparing it to a provided reference answer (ground truth).

How to access in SDK

types.RubricMetric.FINAL_RESPONSE_MATCH

Input

prompt
response
reference

Output

Score

1: Valid response that matches the reference.
0: Invalid response that does not match the reference.

Explanation

Number of LLM calls

5 calls to Gemini 2.5 Flash

Agent final response reference free

Latest version

final_response_reference_free_v1

Type

Adaptive rubrics

Description

An adaptive rubrics metric that evaluates the quality of an AI agent's final answer without needing a reference answer.
You need to provide rubrics for this metric, as it doesn't support auto-generated rubrics.

How to access in SDK

types.RubricMetric.FINAL_RESPONSE_REFERENCE_FREE

Input

prompt
response
rubric_groups

Output

score
rubrics and corresponding verdicts

The score represents the passing rate of the response based on the rubrics.

Number of LLM calls

5 calls to Gemini 2.5 Flash

Agent final response quality

Latest version

final_response_quality_v1

Type

Adaptive rubrics

Description

A comprehensive adaptive rubrics metric that evaluates the overall quality of an agent's response. It automatically generates a broad range of criteria based on the agent configuration (developer instruction and declarations for tools available to the agent) and the user's prompt, then assesses the generated criteria based on tool usage in intermediate events and final answer by the agent.

How to access in SDK

types.RubricMetric.FINAL_RESPONSE_QUALITY

Input

prompt
response
developer_instruction
tool_declarations (can be an empty list)
intermediate_events (containing function calls & responses, can be an empty list)
(Optional) rubric_groups (If you have rubrics already generated, you can provide them directly for evaluation)

Output

score
rubrics and corresponding verdicts

The score represents the passing rate of the response based on the rubrics.

Number of LLM calls

5 calls to Gemini 2.5 Flash and 1 call to Gemini 2.5 Pro

Agent hallucination

Latest version

hallucination_v1

Type

Static Rubrics

Description

A score-based metric that checks for factuality and consistency of text responses by segmenting the response into atomic claims. It verifies if each claim is grounded or not based on tool usage in the intermediate events. It can also be leveraged to evaluate any intermediate text responses by setting the flag evaluate_intermediate_nl_responses to true.

How to access in SDK

types.RubricMetric.HALLUCINATION

Input

response
developer_instruction
tool_declarations (can be an empty list)
intermediate_events (containing function calls & responses, can be an empty list)
evaluate_intermediate_nl_responses (default is False)

Output

score
explanation and corresponding verdicts

The score has a range of 0-1, and represents the rate of claims labeled as supported or no_rad (not requiring factual attributions, such as greetings, questions, or disclaimers) relative to the input prompt. The explanation contains a structured breakdown of claim, label, reasoning, and excerpts that support the context.

Number of LLM calls

2 calls to Gemini 2.5 Flash

Agent tools usage quality

Latest version

tool_use_quality_v1

Type

Adaptive rubrics

Description

A targeted adaptive rubrics metric that evaluates the selection of appropriate tools, correct parameter usage, and adherence to the specified sequence of operations.

How to access in SDK

types.RubricMetric.TOOL_USE_QUALITY

Input

prompt
developer_instruction
tool_declarations (can be an empty list)
intermediate_events (containing function calls & responses, can be an empty list)
(Optional) rubric_groups (If you have rubrics already generated, you can provide them directly for evaluation)

Output

score
rubrics and corresponding verdicts

The score represents the passing rate of the response based on the rubrics.

Number of LLM calls

5 calls to Gemini 2.5 Flash and 1 call to Gemini 2.5 Pro

What's next

Prepare your evaluation dataset .

Details for managed rubric-based metrics Stay organized with collections Save and categorize content based on your preferences.

Overview

Backward compatibility

Managed metrics details

General quality

Text quality

Instruction following

Grounding

Safety

Multi-turn general quality

Multi-turn text quality

Agent final response match

Agent final response reference free

Agent final response quality

Agent hallucination

Agent tools usage quality

What's next

Details for managed rubric-based metrics