PairwiseMetric
(
*
,
metric
:
str
,
metric_prompt_template
:
typing
.
Union
[
vertexai
.
evaluation
.
metrics
.
metric_prompt_template
.
PairwiseMetricPromptTemplate
,
str
,
],
baseline_model
:
typing
.
Optional
[
typing
.
Union
[
vertexai
.
generative_models
.
GenerativeModel
,
typing
.
Callable
[[
str
],
str
]
]
]
=
None
)
A Model-based Pairwise Metric.
A model-based evaluation metric that compares two generative models' responses side-by-side, and allows users to A/B test their generative models to determine which model is performing better.
For more details on when to use pairwise metrics, see Evaluation methods and metrics .
Result Details:
* In `EvalResult.summary_metrics`, win rates for both the baseline and
candidate model are computed. The win rate is computed as proportion of
wins of one model's responses to total attempts as a decimal value
between 0 and 1.
* In `EvalResult.metrics_table`, a pairwise metric produces two
evaluation results per dataset row:
* `pairwise_choice`: The choice shows whether the candidate model or
the baseline model performs better, or if they are equally good.
* `explanation`: The rationale behind each verdict using
chain-of-thought reasoning. The explanation helps users scrutinize
the judgment and builds appropriate trust in the decisions.
See [documentation
page](https://cloud.google.com/vertex-ai/generative-ai/docs/models/determine-eval#understand-results)
for more details on understanding the metric results.
Usage Examples:
```
baseline_model = GenerativeModel("gemini-1.0-pro")
candidate_model = GenerativeModel("gemini-1.5-pro")
pairwise_groundedness = PairwiseMetric(
metric_prompt_template=MetricPromptTemplateExamples.get_prompt_template(
"pairwise_groundedness"
),
baseline_model=baseline_model,
)
eval_dataset = pd.DataFrame({
"prompt" : [...],
})
pairwise_task = EvalTask(
dataset=eval_dataset,
metrics=[pairwise_groundedness],
experiment="my-pairwise-experiment",
)
pairwise_result = pairwise_task.evaluate(
model=candidate_model,
experiment_run_name="gemini-pairwise-eval-run",
)
```
Methods
PairwiseMetric
PairwiseMetric
(
*
,
metric
:
str
,
metric_prompt_template
:
typing
.
Union
[
vertexai
.
evaluation
.
metrics
.
metric_prompt_template
.
PairwiseMetricPromptTemplate
,
str
,
],
baseline_model
:
typing
.
Optional
[
typing
.
Union
[
vertexai
.
generative_models
.
GenerativeModel
,
typing
.
Callable
[[
str
],
str
]
]
]
=
None
)
Initializes a pairwise evaluation metric.