This page provides a full list of managed rubric-based metrics offered by the Gen AI evaluation service, which you can use in the GenAI Client in Vertex AI SDK.
For more information about test-driven evaluation, see Define your evaluation metrics .
Overview
The Gen AI evaluation service offers a list of managed rubric-based metrics for the test-driven evaluation framework:
-
For metrics with adaptive rubrics, most of them include both the workflow for rubric generation for each prompt and rubric validation. You can run them separately if needed. See Run an evaluation for details.
-
For metrics with static rubrics, no per-prompt rubrics are generated. For details regarding the intended outputs, see Metric details .
Each managed rubric-based metric has a versioning number. The metric uses the latest version by default, but you can pin to a specific version if needed:
from
vertexai
import
types
text_quality_metric
=
types
.
RubricMetric
.
TEXT_QUALITY
general_quality_v1
=
types
.
RubricMetric
.
GENERAL_QUALITY
(
version
=
'v1'
)
Backward compatibility
For metrics offered as a Metric prompt templates , you can still access the pointwise metrics through the GenAI Client in Vertex AI SDK through the same approach. Pairwise metrics are not supported by the GenAI Client in Vertex AI SDK, but see Run an evaluation to compare two models in the same evaluation.
from
vertexai
import
types
# Access metrics represented by metric prompt template examples
coherence
=
types
.
RubricMetric
.
COHERENCE
fluency
=
types
.
RubricMetric
.
FLUENCY
Managed metrics details
This section lists managed metrics with details such as their type, required inputs, and expected output:
- General quality
- Text quality
- Instruction following
- Grounding
- Safety
- Multi-turn general quality
- Multi-turn text quality
- Agent final response match
- Agent final response reference free
General quality
general_quality_v1
types.RubricMetric.GENERAL_QUALITY
-
prompt
-
response
- (Optional)
rubric_groups
-
score
-
rubrics
and correspondingverdicts
Text quality
text_quality_v1
types.RubricMetric.TEXT_QUALITY
-
prompt
-
response
- (Optional)
rubric_groups
-
score
-
rubrics
and correspondingverdicts
Instruction following
instruction_following_v1
types.RubricMetric.INSTRUCTION_FOLLOWING
-
prompt
-
response
- (Optional)
rubric_groups
-
score
(passing rate) -
rubrics
and correspondingverdicts
Grounding
grounding_v1
types.RubricMetric.GROUNDING
-
prompt
-
response
-
context
-
score
-
explanation
0-1
, and represents the rate of claims labeled as supported
or no_rad
(not requiring factual attributions, such as greetings, questions, or disclaimers) to the input prompt.The explanation contains groupings of sentence, label, reasoning and excerpt from context.
Safety
safety_v1
- PII & Demographic Data
- Hate Speech
- Dangerous Content
- Harassment
- Sexually Explicit
types.RubricMetric.SAFETY
-
prompt
-
response
-
score
-
explanation
0
is unsafe and 1
is safe.The explanation field includes violated policies.
Multi-turn general quality
multi_turn_general_quality_v1
types.RubricMetric.MULTI_TURN_GENERAL_QUALITY
-
prompt
with multi-turn conversations -
response
- (Optional)
rubric_groups
-
score
- rubrics and corresponding verdicts
Multi-turn text quality
multi_turn_text_quality_v1
types.RubricMetric.TEXT_QUALITY
-
prompt
with multi-turn conversations -
response
- (Optional)
rubric_groups
-
score
-
rubrics
and correspondingverdicts
Agent final response match
final_response_match_v2
types.RubricMetric.FINAL_RESPONSE_MATCH
-
prompt
-
response
-
reference
- 1: Valid response that matches the reference.
- 0: Invalid response that does not match the reference.
Agent final response reference free
final_response_reference_free_v1
You need to provide rubrics for this metric, as it doesn't support auto-generated rubrics.
types.RubricMetric.FINAL_RESPONSE_REFERENCE_FREE
-
prompt
-
response
-
rubric_groups
-
score
-
rubrics
and correspondingverdicts