Evaluate model performance

This sample code demonstrates how to evaluate the performance of a GenAI model. It showcases how to define the evaluation specification, evaluate the model, and retrieve the evaluation metrics.

Explore further

For detailed documentation that includes this code sample, see the following:

Run a computation-based evaluation pipeline

Code sample

Python

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries . For more information, see the Vertex AI Python API reference documentation .

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  import 
  
 os 
 from 
  
 google.auth 
  
 import 
 default 
 import 
  
  vertexai 
 
 from 
  
 vertexai.preview.language_models 
  
 import 
 ( 
  EvaluationTextClassificationSpec 
 
 , 
 TextGenerationModel 
 , 
 ) 
 PROJECT_ID 
 = 
 os 
 . 
 getenv 
 ( 
 "GOOGLE_CLOUD_PROJECT" 
 ) 
 def 
  
 evaluate_model 
 () 
 - 
> object 
 : 
  
 """Evaluate the performance of a generative AI model.""" 
 # Set credentials for the pipeline components used in the evaluation task 
 credentials 
 , 
 _ 
 = 
 default 
 ( 
 scopes 
 = 
 [ 
 "https://www.googleapis.com/auth/cloud-platform" 
 ]) 
  vertexai 
 
 . 
 init 
 ( 
 project 
 = 
 PROJECT_ID 
 , 
 location 
 = 
 "us-central1" 
 , 
 credentials 
 = 
 credentials 
 ) 
 # Create a reference to a generative AI model 
 model 
 = 
 TextGenerationModel 
 . 
 from_pretrained 
 ( 
 "text-bison@002" 
 ) 
 # Define the evaluation specification for a text classification task 
 task_spec 
 = 
 EvaluationTextClassificationSpec 
 ( 
 ground_truth_data 
 = 
 [ 
 "gs://cloud-samples-data/ai-platform/generative_ai/llm_classification_bp_input_prompts_with_ground_truth.jsonl" 
 ], 
 class_names 
 = 
 [ 
 "nature" 
 , 
 "news" 
 , 
 "sports" 
 , 
 "health" 
 , 
 "startups" 
 ], 
 target_column_name 
 = 
 "ground_truth" 
 , 
 ) 
 # Evaluate the model 
 eval_metrics 
 = 
 model 
 . 
  evaluate 
 
 ( 
 task_spec 
 = 
 task_spec 
 ) 
 print 
 ( 
 eval_metrics 
 ) 
 # Example response: 
 # ... 
 # PipelineJob run completed. 
 # Resource name: projects/123456789/locations/us-central1/pipelineJobs/evaluation-llm-classification-... 
 # EvaluationClassificationMetric(label_name=None, auPrc=0.53833705, auRoc=0.8... 
 return 
 eval_metrics

What's next

To search and filter code samples for other Google Cloud products, see the Google Cloud sample browser .

Evaluate model performance Stay organized with collections Save and categorize content based on your preferences.

Explore further

Code sample

Python

What's next

Evaluate model performance