Agentic AI use case: Automate data science workflows

This document describes a high-level architecture for an application that runs a data science workflow to automate complex data analytics and machine learning tasks.

This architecture uses datasets that are hosted in BigQuery or AlloyDB for PostgreSQL. The architecture is a multi-agent system that lets users run actions in natural language commands and it eliminates the need to write complex SQL or Python code.

The intended audience for this document includes architects, developers, and administrators who build and manage agentic AI applications. This architecture lets business and data teams analyze metrics across a wide range of industries, such as retail, finance, and manufacturing. The document assumes a foundational understanding of agentic AI systems. For information about how agents differ from non-agentic systems, see What is the difference between AI agents, AI assistants, and bots?

The deployment section of this document provides links to code samples to help you experiment with deploying an agentic AI application that runs a data science workflow.

Architecture

The following diagram shows the architecture for a data science workflow agent.

Architecture for a data science workflow agent.

This architecture includes the following components:

Component

Description

Frontend

Users interact with the multi-agent system through a frontend, such as a chat interface, that runs as a serverless Cloud Run service.

Agents

This architecture uses the following agents:

Root agent: A coordinator agent that receives requests from the frontend service. The root agent interprets the user's request and it attempts to resolve a request itself. If the task requires specialized tools, the root agent delegates the request to the appropriate specialized agent.
Specialized agent: The root agent invokes the following specialized agents by using the agent as a tool feature.
- Analytics agent: A specialized agent for data analysis and visualization. The analytics agent uses the AI model to generate and run Python code to process datasets, create charts, and perform statistical analysis.
- AlloyDB for PostgreSQL agent: A specialized agent for interacting with data in AlloyDB for PostgreSQL. The agent uses the AI model to interpret the user's request and to generate SQL in the PostgreSQL dialect . The agent securely connects to the database by using MCP Toolbox for Databases and then it runs the query to retrieve the requested data.
- BigQuery agent: A specialized agent for interacting with data in BigQuery. The agent uses the AI model to interpret the user's request and generate GoogleSQL queries. The agent connects to the database by using Agent Development Kit (ADK)'s built-in BigQuery tool and then it runs the query to retrieve the requested data.
BigQuery ML agent: A subagent of the root agent that is dedicated to machine learning workflows. The agent interacts with BigQuery ML to manage the end-to-end ML lifecycle. The agent can create and train models, run evaluations, and generate predictions based on user requests.

Agents runtime

The AI agents in this architecture are deployed as serverless Cloud Run services .

ADK

ADK provides tools and a framework to develop, test, and deploy agents. ADK abstracts the complexity of agent creation and lets AI developers focus on the agent's logic and capabilities.

AI model and model runtimes

For inference serving, the agents in this example architecture use the latest Gemini model on Vertex AI .

Products used

This example architecture uses the following Google Cloud and open-source products and tools:

Cloud Run : A serverless compute platform that lets you run containers directly on top of Google's scalable infrastructure.
Agent Development Kit (ADK) : A set of tools and libraries to develop, test, and deploy AI agents.
Vertex AI : An ML platform that lets you train and deploy ML models and AI applications, and customize LLMs for use in AI-powered applications.
Gemini : A family of multimodal AI models developed by Google.
BigQuery : An enterprise data warehouse that helps you manage and analyze your data with built-in features like machine learning geospatial analysis, and business intelligence.
AlloyDB for PostgreSQL : A fully managed, PostgreSQL-compatible database service that's designed for your most demanding workloads, including hybrid transactional and analytical processing.
MCP Toolbox for Databases : An open-source Model Context Protocol (MCP) server that lets AI agents securely connect to databases by managing database complexities like connection pooling, authentication, and observability.

Deployment

To deploy a sample implementation of this architecture, use Data Science with Multiple Agents . The repository provides two sample datasets to demonstrate the system's flexibility, including a flight dataset for operational analysis and an ecommerce sales dataset for business analytics.

What's next

(Video) Watch the Agent Factory Podcast about AI agents for data engineering and data science .
(Notebook) Use the data science agent in Colab Enterprise .
Learn about how to host AI agents on Cloud Run .
For an overview of architectural principles and recommendations that are specific to AI and ML workloads in Google Cloud, see the AI and ML perspective in the Well-Architected Framework.
For more reference architectures, diagrams, and best practices, explore the Cloud Architecture Center .

Contributors

Author: Samantha He | Technical Writer

Other contributors:

Amina Mansour | Head of Cloud Platform Evaluations Team
Kumar Dhanagopal | Cross-Product Solution Developer
Megan O'Keefe | Developer Advocate
Rachael Deacon-Smith | Developer Advocate
Shir Meir Lador | Developer Relations Engineering Manager

Agentic AI use case: Automate data science workflows Stay organized with collections Save and categorize content based on your preferences.