Agent observability

In the rapidly evolving landscape of AI, building and deploying agents introduces unique challenges. AI agents can drift, hallucinate, and regress silently. They can make decisions and take actions that you don't expect. They can also fail in ways that are different from non-agentic software. Agent observability refers to the methods for gaining insights into the internal state and behavior of software agents, particularly AI-powered agents such as those built using Large Language Models (LLMs).

Benefits of agent observability

Because AI agents are non-deterministic and complex, observability is crucial for understanding, debugging, evaluating, and improving their performance, safety, and reliability.

Key aspects of agent observability include monitoring and analyzing the following:

  • LLM interactions: Track prompts, responses, token usage, latency, and error rates.
  • Tool usage: Monitor external tools and APIs the agent interacts with, including call counts, successes or failures, latency, and the data exchanged.
  • Agent behavior and reasoning: Understand the agent's decision-making process, the sequence of steps taken, and internal state changes.
  • Performance: Measure end-to-end latency of agent invocations, latency of individual steps, and resource consumption, which often involves detailed tracing.
  • Security and safety: Track policy enforcement, identify risky operations, analyze content safety, and monitor access patterns.
  • Quality and evaluation: Assess the correctness, factuality, helpfulness, and overall quality of the agent's outputs, often integrating with evaluation frameworks.

What is agent observability within Google Cloud?

Application Monitoring in Google Cloud provides both agent observability and application observability. This service provides dashboards and topology maps that let you understand the health and performance of your App Hub applications, services, and workloads. It also generates and displays metrics such as error rates and token usage for AI resources. To generate these metrics, Application Monitoring filters and aggregates your trace data using application-specific labels and events that follow the OpenTelemetry GenAI semantic conventions .

For agent observability, we recommend building your agents with the Agent Development Kit (ADK) framework . Because ADK relies on OpenTelemetry , the telemetry ADK generates is consistent with the OpenTelemetry GenAI semantic conventions.

To debug failures, monitor costs, or analyze agent behavior—including from Gemini Enterprise Agent Platform, Agent Gateway, and Model Armor agents—you need log, metric, and trace data:

  • Logs provide information about events and errors.
  • Metrics lets you monitor your latency and token usage.
  • Traces provide information about execution paths, and are analyzed to derive metrics such as the number of model calls or total token usage. These derived metrics provide visibility into agent performance and behavior. For more information, see View AI resources .
  • Prompt and response data lets you assess agent quality and decision-making using the Gen AI evaluation service.

The Application Monitoring dashboard for an application displays a list of the application's services and workloads, such as Gemini Enterprise apps , Gemini Enterprise Agent Platform agents and MCP servers:

An overview that lists the services and workloads in an application.

You can identify agentic services and workloads by using the infrastructure type or the App Hub functional type . The functional type column is hidden by default.

Get started with agent observability

For information about how to build, deploy, and manage AI agents that use reasoning and tools to perform complex enterprise tasks, see Agents overview .

To learn how to perform evaluations, which provide information about agent quality, see Agent evaluation .

For code samples, see the following:

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: