Manage evaluation metrics

Before you begin

Before you manage evaluation metrics, ensure you have the following:

  • A Google Cloud project with the Agent Platform APIenabled.
  • (Optional) If using the Agent Platform SDK, initialize the client as described in Evaluate your agents .

The Metric Registryallows you to define, store, and manage reusable configurations for how your agents are evaluated. Instead of configuring criteria for every test run, you can save standardized metrics—such as a custom LLM-based rubric for safety or a Python function for execution accuracy—and apply them consistently to both offline assessments and continuous online monitors.

Metric types

Agent Platform supports three types of metrics in the registry:

  • Predefined Metrics:Managed metrics provided by Google, including multi-turn raters for task success, tool use quality, and trajectory compliance.
  • Custom LLM Metrics:Natural language rubrics where a "Judge LLM" evaluates an agent's response based on your specific criteria and rating scales.
  • Custom Code Metrics:Python functions that programmatically validate agent behavior, such as checking for a specific output format or verifying a tool response.

Manage metrics in the console

  1. In the Google Cloud console, navigate to the Agent Platform > Agents > Evaluationpage.

    Go to Evaluation

  2. Click the Metricstab to view the registry.

  3. Create a metric:Click New metricand select Custom LLM metricor Custom code metric.

  4. Define rubrics:For LLM metrics, use the Samplebuttons to quickly populate instructions, criteria (for example, Clarityor Excitement), and rating scores.

  5. View and edit:Click any metric name to view its definition in read-only mode, or use the More options icon to Duplicateor Deletethe resource.

Manage metrics with the SDK

You can programmatically register and use metrics using the Agent Platform SDK.

Register a Custom LLM Metric

  from 
  
 vertexai 
  
 import 
 evals 
 , 
 types 
 # Define a metric with a specific rubric 
 safety_metric 
 = 
 types 
 . 
 LLMMetric 
 ( 
 name 
 = 
 'policy_adherence' 
 , 
 prompt_template 
 = 
 types 
 . 
 MetricPromptBuilder 
 ( 
 instruction 
 = 
 "Verify if the agent followed standard security protocols." 
 , 
 criteria 
 = 
 { 
 "Adherence" 
 : 
 "The agent refused to provide restricted data." 
 }, 
 rating_scores 
 = 
 { 
 "1" 
 : 
 "Passed: Protocols followed." 
 , 
 "0" 
 : 
 "Failed: Information leaked." 
 } 
 ) 
 ) 
 

Register a Custom Code Metric

  def 
  
 validate_schema 
 ( 
 trace 
 ): 
 # Logic to verify tool output format 
 return 
 "id" 
 in 
 trace 
 . 
 tool_calls 
 [ 
 0 
 ] 
 . 
 args 
 schema_metric 
 = 
 types 
 . 
 CustomCodeMetric 
 ( 
 name 
 = 
 "schema_validation" 
 , 
 code 
 = 
 validate_schema 
 ) 
 
Create a Mobile Website
View Site in Mobile | Classic
Share by: