Class PairwiseMetric (1.90.0)

  PairwiseMetric 
 ( 
 * 
 , 
 metric 
 : 
 str 
 , 
 metric_prompt_template 
 : 
 typing 
 . 
 Union 
 [ 
 vertexai 
 . 
 evaluation 
 . 
 metrics 
 . 
 metric_prompt_template 
 . 
 PairwiseMetricPromptTemplate 
 , 
 str 
 , 
 ], 
 baseline_model 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 Union 
 [ 
 vertexai 
 . 
 generative_models 
 . 
 GenerativeModel 
 , 
 typing 
 . 
 Callable 
 [[ 
 str 
 ], 
 str 
 ] 
 ] 
 ] 
 = 
 None 
 ) 
 

A Model-based Pairwise Metric.

A model-based evaluation metric that compares two generative models' responses side-by-side, and allows users to A/B test their generative models to determine which model is performing better.

For more details on when to use pairwise metrics, see Evaluation methods and metrics .

Result Details:

 * In `EvalResult.summary_metrics`, win rates for both the baseline and
candidate model are computed. The win rate is computed as proportion of
wins of one model's responses to total attempts as a decimal value
between 0 and 1.

* In `EvalResult.metrics_table`, a pairwise metric produces two
evaluation results per dataset row:
    * `pairwise_choice`: The choice shows whether the candidate model or
      the baseline model performs better, or if they are equally good.
    * `explanation`: The rationale behind each verdict using
      chain-of-thought reasoning. The explanation helps users scrutinize
      the judgment and builds appropriate trust in the decisions.

See [documentation
page](https://cloud.google.com/vertex-ai/generative-ai/docs/models/determine-eval#understand-results)
for more details on understanding the metric results. 

Usage Examples:

 ```
baseline_model = GenerativeModel("gemini-1.0-pro")
candidate_model = GenerativeModel("gemini-1.5-pro")

pairwise_groundedness = PairwiseMetric(
    metric_prompt_template=MetricPromptTemplateExamples.get_prompt_template(
        "pairwise_groundedness"
    ),
    baseline_model=baseline_model,
)
eval_dataset = pd.DataFrame({
      "prompt"  : [...],
})
pairwise_task = EvalTask(
    dataset=eval_dataset,
    metrics=[pairwise_groundedness],
    experiment="my-pairwise-experiment",
)
pairwise_result = pairwise_task.evaluate(
    model=candidate_model,
    experiment_run_name="gemini-pairwise-eval-run",
)
``` 

Methods

PairwiseMetric

  PairwiseMetric 
 ( 
 * 
 , 
 metric 
 : 
 str 
 , 
 metric_prompt_template 
 : 
 typing 
 . 
 Union 
 [ 
 vertexai 
 . 
 evaluation 
 . 
 metrics 
 . 
 metric_prompt_template 
 . 
 PairwiseMetricPromptTemplate 
 , 
 str 
 , 
 ], 
 baseline_model 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 Union 
 [ 
 vertexai 
 . 
 generative_models 
 . 
 GenerativeModel 
 , 
 typing 
 . 
 Callable 
 [[ 
 str 
 ], 
 str 
 ] 
 ] 
 ] 
 = 
 None 
 ) 
 

Initializes a pairwise evaluation metric.

Create a Mobile Website
View Site in Mobile | Classic
Share by: