The metric used for running evaluations.
aggregationMetrics[]
enum ( AggregationMetric
)
Optional. The aggregation metrics to use.
metric_spec
Union type
metric_spec
can be only one of the following:predefinedMetricSpec
object ( PredefinedMetricSpec
)
The spec for a pre-defined metric.
llmBasedMetricSpec
object ( LLMBasedMetricSpec
)
Spec for an LLM based metric.
customCodeExecutionSpec
object ( CustomCodeExecutionSpec
)
Spec for Custom code Execution metric.
pointwiseMetricSpec
object ( PointwiseMetricSpec
)
Spec for pointwise metric.
pairwiseMetricSpec
object ( PairwiseMetricSpec
)
Spec for pairwise metric.
exactMatchSpec
object ( ExactMatchSpec
)
Spec for exact match metric.
bleuSpec
object ( BleuSpec
)
Spec for bleu metric.
rougeSpec
object ( RougeSpec
)
Spec for rouge metric.
| JSON representation |
|---|
{ "aggregationMetrics" : [ enum ( |
PredefinedMetricSpec
The spec for a pre-defined metric.
metricSpecName
string
Required. The name of a pre-defined metric, such as "instruction_following_v1" or "text_quality_v1".
metricSpecParameters
object ( Struct
format)
Optional. The parameters needed to run the pre-defined metric.
| JSON representation |
|---|
{ "metricSpecName" : string , "metricSpecParameters" : { object } } |
LLMBasedMetricSpec
Specification for an LLM based metric.
rubrics_source
Union type
rubrics_source
can be only one of the following:rubricGroupKey
string
Use a pre-defined group of rubrics associated with the input. Refers to a key in the rubricGroups map of EvaluationInstance.
rubricGenerationSpec
object ( RubricGenerationSpec
)
Dynamically generate rubrics using this specification.
predefinedRubricGenerationSpec
object ( PredefinedMetricSpec
)
Dynamically generate rubrics using a predefined spec.
metricPromptTemplate
string
Required. Template for the prompt sent to the judge model.
systemInstruction
string
Optional. System instructions for the judge model.
judgeAutoraterConfig
object ( AutoraterConfig
)
Optional. Optional configuration for the judge LLM (Autorater).
additionalConfig
object ( Struct
format)
Optional. Optional additional configuration for the metric.
| JSON representation |
|---|
{ // rubrics_source "rubricGroupKey" : string , "rubricGenerationSpec" : { object ( |
RubricGenerationSpec
Specification for how rubrics should be generated.
promptTemplate
string
Template for the prompt used to generate rubrics. The details should be updated based on the most-recent recipe requirements.
rubricContentType
enum ( RubricContentType
)
The type of rubric content to be generated.
rubricTypeOntology[]
string
Optional. An optional, pre-defined list of allowed types for generated rubrics. If this field is provided, it implies include_rubric_type
should be true, and the generated rubric types should be chosen from this ontology.
modelConfig
object ( AutoraterConfig
)
Configuration for the model used in rubric generation. Configs including sampling count and base model can be specified here. Flipping is not supported for rubric generation.
| JSON representation |
|---|
{ "promptTemplate" : string , "rubricContentType" : enum ( |
RubricContentType
Specifies the type of rubric content to generate.
| Enums | |
|---|---|
RUBRIC_CONTENT_TYPE_UNSPECIFIED
|
The content type to generate is not specified. |
PROPERTY
|
Generate rubrics based on properties. |
NL_QUESTION_ANSWER
|
Generate rubrics in an NL question answer format. |
PYTHON_CODE_ASSERTION
|
Generate rubrics in a unit test format. |
CustomCodeExecutionSpec
Specificies a metric that is populated by evaluating user-defined Python code.
evaluationFunction
string
Required. Python function. Expected user to define the following function, e.g.: def evaluate(instance: dict[str, Any]) -> float: Please include this function signature in the code snippet. Instance is the evaluation instance, any fields populated in the instance are available to the function as instance[fieldName].
Example: Example input:
instance= EvaluationInstance(
response=EvaluationInstance.InstanceData(text="The answer is 4."),
reference=EvaluationInstance.InstanceData(text="4")
)
Example converted input:
{
'response': {'text': 'The answer is 4.'},
'reference': {'text': '4'}
}
Example python function:
def evaluate(instance: dict[str, Any]) -> float:
if instance['response']['text'] == instance['reference']['text']:
return 1.0
return 0.0
| JSON representation |
|---|
{ "evaluationFunction" : string } |
PointwiseMetricSpec
Spec for pointwise metric.
customOutputFormatConfig
object ( CustomOutputFormatConfig
)
Optional. CustomOutputFormatConfig allows customization of metric output. By default, metrics return a score and explanation. When this config is set, the default output is replaced with either: - The raw output string. - A parsed output based on a user-defined schema. If a custom format is chosen, the score
and explanation
fields in the corresponding metric result will be empty.
metricPromptTemplate
string
Required. Metric prompt template for pointwise metric.
systemInstruction
string
Optional. System instructions for pointwise metric.
| JSON representation |
|---|
{
"customOutputFormatConfig"
:
{
object (
|
CustomOutputFormatConfig
Spec for custom output format configuration.
custom_output_format_config
Union type
custom_output_format_config
can be only one of the following:returnRawOutput
boolean
Optional. Whether to return raw output.
| JSON representation |
|---|
{ // custom_output_format_config "returnRawOutput" : boolean // Union type } |
PairwiseMetricSpec
Spec for pairwise metric.
candidateResponseFieldName
string
Optional. The field name of the candidate response.
baselineResponseFieldName
string
Optional. The field name of the baseline response.
customOutputFormatConfig
object ( CustomOutputFormatConfig
)
Optional. CustomOutputFormatConfig allows customization of metric output. When this config is set, the default output is replaced with the raw output string. If a custom format is chosen, the pairwiseChoice
and explanation
fields in the corresponding metric result will be empty.
metricPromptTemplate
string
Required. Metric prompt template for pairwise metric.
systemInstruction
string
Optional. System instructions for pairwise metric.
| JSON representation |
|---|
{
"candidateResponseFieldName"
:
string
,
"baselineResponseFieldName"
:
string
,
"customOutputFormatConfig"
:
{
object (
|
ExactMatchSpec
This type has no fields.
Spec for exact match metric - returns 1 if prediction and reference exactly matches, otherwise 0.
BleuSpec
Spec for bleu score metric - calculates the precision of n-grams in the prediction as compared to reference - returns a score ranging between 0 to 1.
useEffectiveOrder
boolean
Optional. Whether to useEffectiveOrder to compute bleu score.
| JSON representation |
|---|
{ "useEffectiveOrder" : boolean } |
RougeSpec
Spec for rouge score metric - calculates the recall of n-grams in prediction as compared to reference - returns a score ranging between 0 and 1.
rougeType
string
Optional. Supported rouge types are rougen[1-9], rougeL, and rougeLsum.
useStemmer
boolean
Optional. Whether to use stemmer to compute rouge score.
splitSummaries
boolean
Optional. Whether to split summaries while using rougeLsum.
| JSON representation |
|---|
{ "rougeType" : string , "useStemmer" : boolean , "splitSummaries" : boolean } |

