DatasetCustomMetric

Defines a custom dataset-level aggregation.

Fields
displayName string

Optional. A display name for this custom summary metric. Used to prefix keys in the output summaryMetrics map. If not provided, a default name like "dataset_custom_metric_1", "dataset_custom_metric_2", etc., will be generated based on the order in the repeated field.

aggregationFunction string

Required. The Python code string containing the aggregation function. Expected function signature: def aggregate(instances: list[dict[str, Any]]) -> dict[str, float]:

The instances argument is a list of dictionaries, where each dictionary represents a single evaluation result item. The structure of each dictionary corresponds to the fields in the EvaluationResult message.

This includes: - "request" : Contains the original input data and model inputs (from EvaluationResult.EvaluationRequest ). - "candidateResults" : Contains the results of any instance-level metrics (from EvaluationResult.CandidateResults ).

Example of a single item in the instances list: { "request": { "prompt": {"text": "What is the capital of France?"}, "goldenResponse": {"text": "Paris"}, "candidateResponses": [{"candidate": "model-v1", "text": "Paris"}] }, "candidateResults": [ {"metric": "exactMatch", "score": 1.0}, {"metric": "bleu", "score": 0.9} ] }

JSON representation
 { 
 "displayName" 
 : 
 string 
 , 
 "aggregationFunction" 
 : 
 string 
 } 
Create a Mobile Website
View Site in Mobile | Classic
Share by: