Cloud Run AI Cookbook

This page provides a curated list of resources to help you build and deploy AI solutions on Cloud Run.

Cloud Run is a fully managed application platform for running your code, function, or container on top of Google's highly scalable infrastructure. You can use Cloud Run to run various AI solutions, such as AI inference endpoints, generative model APIs, entire Retrieval-Augmented Generation (RAG) pipelines, and more.

Use the categories and links below to navigate official guides, quickstarts, and valuable community content. For Cloud Run documentation and recommendations, see Explore AI solutions on Cloud Run .

A note on community resources

Content that is labeled as "Community" are selected resources from the developer community, and are not developed or maintained by Google. Consider these cautions when using these resources:

Security audit: Always carefully review any code, especially how it handles private information, user input, and network access.
Deprecation and updates: Community code might become outdated or stop working with new Cloud Run features or AI versions without warning. Check its last update date and if it's still actively maintained.
Cost efficiency: While these setups often aim for low cost, they might not follow Google's best practices for saving money in live projects. Monitor your billing closely.
License compliance: Make sure you understand and follow the open-source license for any community code or libraries you add to your application.
Test before deploying: Verify all important settings, and try community solutions in a test environment before using them for live projects.

Filter by category or keyword

Use the filters or search box to find content by category or keyword.

Filter by:

Categories	Title and description	Published date
Blog Cold starts Deployment	A Guide to AI Cold Starts on Cloud Run This blog post addresses the challenge of cold starts for AI applications on Cloud Run and outlines optimizations across configuration, architecture, and runtime settings to improve latency while scaling to zero.	2026-05-28
ADK Agents Codelab MCP Security	Governing agentic workloads with Agent Gateway on Gemini Enterprise Agent Platform This codelab shows how to use Agent Gateway to govern and secure an ADK agent running in Agent Runtime as it connects to external tools hosted as MCP servers on Cloud Run.	2026-05-28
Agents Community MCP Security	Securing AI agents with MCP Authorization This article demonstrates how to configure and enforce MCP authorization to secure agentic systems deploying remote MCP tools on Cloud Run.	2026-05-26
AI Studio Blog Cloud SQL Firebase Vibe coding	AI Studio unlocks full-stack vibe coding with Cloud Run, Firebase, and Cloud SQL, no credit card required This article introduces the full-stack vibe coding updates in Google AI Studio, detailing integrations with Firebase and Cloud SQL, and a no-credit-card onboarding flow for deploying apps to Cloud Run.	2026-05-21
ADK Flutter Go Video	Build an AI agent app with Go ADK, Cloud Run, and Flutter This video shows how to build an AI agent application using the Go Agent Development Kit (ADK), deploy it as a container service on Cloud Run, and access it from a multi-platform Flutter frontend.	2026-05-21
AI Studio Cloud SQL Video	Build full-stack apps with Google AI Studio, Cloud Run, and Cloud SQL This video guides users through building full-stack applications in Google AI Studio's Build Mode and deploying them to Cloud Run with automatic database provisioning.	2026-05-21
AlloyDB BigQuery Codelab MCP Toolbox MongoDB	Build an Intelligent Ecommerce Catalog with Multi-database Persistence Build an intelligent ecommerce catalog using AlloyDB, MongoDB, Cloud Storage, BigQuery, and the MCP Toolbox on Cloud Run, then deploy a multi-agent chat app.	2026-04-22
Agents Codelab Gemini Enterprise	Next ‘26 Keynote: Fabric of Unified Intelligence Deploy a multi-agent system on Cloud Run and orchestrate it using Gemini Enterprise with shared context to demonstrate the fabric of unified intelligence.	2026-04-22
ADK Agents Codelab MCP	Build and Deploy a Pet Passport Agent on Cloud Run This codelab guides you through building and deploying a tool-using Pet Passport agent using ADK and Google Model Context Protocol (MCP) servers on Cloud Run.	2026-04-22
ADK Agents Codelab Eventarc	Build Event-Driven AI Agents with Eventarc, Cloud Run and ADK Learn how to build and deploy asynchronous, event-driven AI agents on Cloud Run using Eventarc and the Agent Development Kit (ADK).	2026-04-22
Agents Codelab MCP Security	Deploy an Enterprise Governance-Aware Agent with MCP and Cloud Run In Part 2 of this series, learn how to deploy a Model Context Protocol (MCP) server on Cloud Run to act as a data control plane and connect it to a governance-aware ADK agent.	2026-04-16
ADK Agents Codelab Model Armor Security	Build a Secure Agent with Model Armor and Identity Build a production-grade secure AI agent using the Agent Development Kit (ADK) and deploy it to Google Cloud. This guide covers implementing Model Armor for input/output filtering and Agent Identity for access control.	2026-04-16
ADK Agents Codelab MCP	Way Back Home - Level 1: Pinpoint Location Build a multi-agent AI system using the Agent Development Kit (ADK) that incorporates custom MCP servers and OneMCP BigQuery integration.	2026-04-16
AI Studio Codelab Deployment Vibe coding	Vibe Code with Gemini in Google AI Studio This codelab shows you how to use Build Mode in Google AI Studio to rapid-prototype a React application and deploy it to Cloud Run with one click.	2026-04-15
Codelab Gemma 4 GPUs LLMs	Run inference of Gemma 4 model on Cloud Run with RTX 6000 Pro GPU with vLLM This codelab shows how to deploy a Gemma 4 model on a Cloud Run NVIDIA RTX Pro 6000 GPU using vLLM for high-throughput serverless inference.	2026-04-13
ADK Agents Community	How I Built and Deployed Real AI Agents Using Google ADK on Cloud Run Build a multi-agent travel planner using the Google Agent Development Kit (ADK) and deploy it to Google Cloud Run.	2026-04-10
Agents Automation Community SRE Use cases	From Incident to Pull Request: Building an AI-Powered SRE Agent on GCP Build an SRE agent that automates root cause analysis and software fixes using Gemini, Spring Boot, and Cloud Run.	2026-04-10
Agents Codelab Frameworks LangChain	Deploy LangChain Agent on Cloud Run Build a LangChain-based AI agent, package it into a container, and deploy it to Google Cloud Run for serving.	2026-03-27
Agents Community Elasticsearch Gemini Multimodal	Snap, Plan, Go: Building a Multimodal Travel Agent with Google Cloud, Elasticsearch, and Gemini Build a multimodal travel agent that identifies landmarks from images and suggests travel routes using Gemini on Cloud Run.	2026-02-22
AI Studio Codelab Deployment Vibe coding	Deploy from AI Studio to Cloud Run In this codelab, you will create a simple web application using vibe coding in Google AI Studio and deploy it onto Cloud Run.	2026-02-18
Frameworks Gemini LangChain	Quickstart: Build and deploy a Python (LangChain) web app to Cloud Run This quickstart shows you how to build and deploy a LangChain application using Cloud Run and Gemini to respond to queries about city capitals.	2026-02-03
Agents Frameworks Gemini	Quickstart: Build and deploy a Python (smolagents) web app to Cloud Run This quickstart shows you how to build and deploy a smolagents application using Cloud Run and Gemini.	2026-01-28
Agents Antigravity Video	Stop coding, start architecting: Google Antigravity + Cloud Run This video introduces Google's agentic IDE, Antigravity. Use it to build and deploy a full stack app to Cloud Run from scratch. Watch this video to write a spec sheet for the AI, force it to use modern Node.js (no build steps!), and watch it autonomously debug a port mismatch during deployment touching a config file.	2025-12-08
Codelab Tools	Deploying and Running n8n on Google Cloud Run This codelab shows how to deploy a production-ready instance of the n8n workflow automation tool on Cloud Run, complete with a Cloud SQL database for persistence and Secret Manager for sensitive data.	2025-11-20
Blog Gemma 3	Hands-on with Gemma 3 on Google Cloud This blog post announces two codelabs that show developers how to deploy Gemma 3 on Google Cloud using either Cloud Run for a serverless approach or Google Kubernetes Engine (GKE) for a platform approach.	2025-11-17
Agents GPUs Ollama Video	This AI agent runs on Cloud Run + NVIDIA GPUs This video shows how to build a real AI agent application on a serverless NVIDIA GPU. See a demo of a smart health agent that uses open source models like Gemma with Ollama on Cloud Run, and LangGraph to build a multi-agent workflow (RAG + tools).	2025-11-13
Codelab GPUs LLMs	How to run LLM inference on Cloud Run GPUs with vLLM and the OpenAI Python SDK This codelab shows how to deploy Google's Gemma 2 2b instruction-tuned model on Cloud Run with GPUs, using vLLM as an inference engine and the OpenAI Python SDK to perform sentence completion.	2025-11-13
ADK Agents Codelab	Deploy, Manage, and Observe ADK Agent on Cloud Run This codelab walks you through deploying, managing, and monitoring a powerful agent built with the Agent Development Kit (ADK) on Cloud Run.	2025-11-12
Blog Tools	Easy AI workflow automation: Deploy n8n on Cloud Run This blog post explains how to deploy agents using the n8n workflow automation tool on Cloud Run to create AI-powered workflows and integrate with tools like Google Workspace.	2025-11-07
MCP Video	Power your AI agents with MCP tools on Google Cloud Run This video introduces MCP (Model Context Protocol) and how it makes life easier for AI agent developers. Get a walk through of building an MCP server using FastMCP, and deploying an ADK agent on Cloud Run. See how the code handles service to service authentication using Cloud Run's built-in OIDC tokens.	2025-11-06
Model Armor Security Video	We tried to jailbreak our AI (and Model Armor stopped it) This video shows an example of using Google's Model Armor to block threats with an API call.	2025-10-30
Codelab Gemini CLI MCP	How to deploy a secure MCP server on Cloud Run This codelab walks you through deploying a secure Model Context Protocol (MCP) server on Cloud Run and connecting to it from the Gemini CLI.	2025-10-28
ADK Agents Codelab MCP	Build and deploy an ADK agent that uses an MCP server on Cloud Run This codelab guides you through building and deploying a tool-using AI agent with the Agent Development Kit (ADK). The agent connects to a remote MCP server for its tools, and is deployed as a container on Cloud Run.	2025-10-27
Benchmarking Vertex AI Video	Don't guess: How to benchmark your AI prompts This video shows how to use Vertex AI to build reliable generative AI applications using Google Cloud's tools. Developers will learn how to use Google Cloud tools for rapid prototyping, get hard numbers with data-driven benchmarking, and finally, build an automated CI/CD pipeline for true quality control, all while avoiding common pitfalls.	2025-10-23
AI models Cloud Run jobs Codelab Model tuning	How to fine-tune a LLM using Cloud Run Jobs This codelab provides a step-by-step guide on how to use Cloud Run Jobs with GPUs to fine-tune a Gemma 3 model on the Text2Emoji dataset and then serve the resulting model on a Cloud Run service with vLLM.	2025-10-21
Batch inference Cloud Run jobs Codelab	How to run batch inference on Cloud Run jobs This codelab demonstrates how to use a GPU-powered Cloud Run job to run batch inference on a Llama 3.2-1b model and write the results directly to a Cloud Storage bucket.	2025-10-21
ADK Multi-agent Video	How to build a multi-agent app with ADK and Gemini This video shows how to build an app using Google's ADK (Agent Development Toolkit) that helps you refine and collaborate on content. Explore how stateful multi-agents work better than a single agent.	2025-10-16
Community Security	Securely call your Cloud Run service from anywhere This article provides a Python code example that acquires an identity token to securely call an authenticated Cloud Run service from any environment. The example uses application default credentials (ADC) to authenticate the call.	2025-10-15
Gemini Video	Build an AI app that watches videos using Gemini This video shows how to build an app that watches and understands YouTube videos using Gemini 2.5 Pro. Use smart prompts to customize your app's output for blog posts, summaries, quizzes, and more. This video covers how to integrate Gemini to generate both text content and header images from video input, discuss cost considerations, and explain how to handle longer videos with batch requests.	2025-10-06
ADK Agents Codelab GPUs LLMs MCP	Lab 3:Prototype to Production - Deploy Your ADK Agent to Cloud Run with GPU This codelab demonstrates how to deploy a production-ready Agent Development Kit (ADK) agent with a GPU-accelerated Gemma backend on Cloud Run. The codelab covers deployment, integration, and performance testing.	2025-10-03
Agents Codelab	How to deploy a Gradio frontend app that calls a backend ADK agent, both running on Cloud Run This codelab demonstrates how to deploy a two-tier application on Cloud Run, consisting of a Gradio frontend and an ADK agent backend, with a focus on implementing secure, authenticated service-to-service communication.	2025-09-29
AI models Community RAG	Serverless AI: EmbeddingGemma with Cloud Run This article provides a step-by-step guide on how to containerize and deploy the EmbeddingGemma model to Cloud Run with GPUs, and then use it to build a RAG application.	2025-09-24
Blog Extensions Gemini	Automate app deployment and security analysis with new Gemini CLI extensions This blog post announces the Cloud Run extension in the Gemini CLI to simplify application deployment with a single /deploy command.	2025-09-10
Community Security	Chain of Trust for AI: Securing MCP Toolbox Architecture on Cloud Run This article deconstructs a simple hotel booking application built on Google Cloud. It demonstrates a robust, zero-trust security model using service identities, and shows how a secure chain of trust is established from the end-user all the way to the database.	2025-09-03
AI models Community Containerization Docker Ollama RAG	Serverless AI: Qwen3 Embeddings with Cloud Run This article provides a tutorial on how to deploy the Qwen3 Embedding model to Cloud Run with GPUs. The article also covers containerization with Docker and Ollama, and provides an example of how to use it in a RAG application.	2025-08-20
Architecture Community LLMs	Still Packaging AI Models in Containers? Do This Instead on Cloud Run This article advocates for a more efficient and scalable architecture for serving large language models (LLMs) on Cloud Run by decoupling model files from the application container, and instead using Cloud Storage FUSE.	2025-08-11
AI models Community	Building an AI-Powered Podcast Generator with Gemini and Cloud Run This article details how to build a serverless AI-powered podcast generator that uses Gemini for content summarization and Cloud Run. The example orchestrates the automated pipeline for generating and delivering daily audio briefings from RSS feeds.	2025-08-11
GenAI Video	Let's build a GenAI app on Cloud Run This video walks you through the architecture and code, using AI to help with every step.	2025-07-17
Blog Extensions Gemini	From localhost to launch: Simplify AI app deployment with Cloud Run and Docker Compose This blog post announces a collaboration between Google Cloud and Docker that simplifies the deployment of complex AI applications by allowing developers to use the gcloud run compose up command to deploy their compose.yaml files directly to Cloud Run.	2025-07-10
Agents Firebase Video	Build AI agents with Cloud Run and Firebase Genkit This video shows how to build AI agents with Cloud Run and Firebase Genkit, a serverless AI agent builder.	2025-07-10
Community MCP	Power your MCP servers with Google Cloud Run This article explains the purpose of the Model Context Protocol (MCP) and provides a tutorial on how to build and deploy a MCP server on Cloud Run to expose resources as tools for AI applications.	2025-07-09
AI Studio Firebase Gemini LLMs Video	Cloud AI: it's just an API This videos provides a demo on how to quickly build a tech support application using AI Studio, Cloud Functions, and Firebase Hosting. Learn how to leverage Large Language Models (LLMs) and see a practical example of integrating AI into a traditional web application.	2025-06-19
Blog MCP	Build and Deploy a Remote MCP Server to Google Cloud Run in Under 10 Minutes This blog post provides a step-by-step guide to building and deploying a secure, remote Model Context Protocol (MCP) server on Google Cloud Run in under 10 minutes using FastMCP, and then testing it from a local client.	2025-06-07
Community ML models Monitoring	Deploying & Monitoring ML Models with Cloud Run — Lightweight, Scalable, and Cost-Efficient This article explains how to deploy, monitor, and automatically scale a machine learning model on Cloud Run, utilizing a lightweight monitoring stack with Google Cloud services to track performance and control costs.	2025-05-29
AI models AI Studio Community LLMs	Deploying Gemma Directly from AI Studio to Cloud Run This article provides a step-by-step tutorial on how to take a Gemma model from AI Studio, adapt its code for production, and deploy it as a containerized web application on Cloud Run.	2025-05-29
ADK Agents Community MCP	The Triad of Agent Architecture: ADK, MCP, and Cloud Run This article demonstrates how to build an AI agentic architecture by setting up an Agent Development Kit (ADK) workflow that communicates with a Model Context Protocol (MCP) server hosted on Cloud Run to manage flight bookings.	2025-05-27
ADK Agents Frameworks LangGraph Vertex AI Video	Building AI agents on Google Cloud This video shows how to build and deploy AI agents using Cloud Run and Vertex AI. Explore key concepts like tool calling, model agnosticism, and the use of frameworks like LangGraph and the Agent Development Kit (ADK).	2025-05-21
Agents AI Studio Blog MCP	AI deployment made easy: Deploy your app to Cloud Run from AI Studio or MCP-compatible AI agents This blog post introduces ways to simplify AI deployments with one-click deployment from AI Studio to Cloud Run, direct deployment of Gemma 3 models, and a MCP server for agent-based deployments.	2025-05-20
A2A Agents Community Frameworks Use cases	Exploring Agent2Agent (A2A) Protocol with Purchasing Concierge Use Case on Cloud Run This article explains the Agent2Agent (A2A) protocol and demonstrates its use with a purchasing concierge application. The Cloud Run app contains multiple AI agents, built with different frameworks, and collaborate amongst itself to fulfill a user's order.	2025-05-15
AI models Automation CI/CD Community GitHub	Automating ML Models Deployment with GitHub Actions and Cloud Run This article provides a comprehensive guide on how to create a CI/CD pipeline with GitHub Actions to automate the build and deployment of machine learning models as containerized services on Cloud Run.	2025-05-08
AI models GPUs Ollama Video	How to host DeepSeek with Cloud Run GPUs in 3 steps This video shows how to simplify hosting the DeepSeek AI model with Cloud Run GPUs. Learn how to deploy and manage Large Language Models (LLMs) on Google Cloud with three commands. Watch along and discover the capabilities of Cloud Run and the Ollama command-line tool, allowing developers to operate AI applications rapidly with on-demand resource allocation and scaling.	2025-04-24
Agents Blog Use cases	50% faster merge and 50% fewer bugs: How CodeRabbit built its AI code review agent with Google Cloud Run This article showcases how CodeRabbit, an AI code review tool, utilizes Cloud Run to build a scalable and secure platform for executing untrusted code, ultimately cutting code review time and bugs in half.	2025-04-22
Community LLMs Security	Building Sovereign AI Solutions with Google Cloud - Cloud Run This article provides a step-by-step guide on how to build and deploy a sovereign AI solution on Google Cloud by using Sovereign Controls by Partners. The examples runs a Gemma model on Cloud Run, ensuring data residency and compliance with European regulations.	2025-04-03
Codelab Gemini	How to deploy a FastAPI chatbot app to Cloud Run using Gemini This codelab shows you how to deploy a FastAPI chatbot app to Cloud Run.	2025-04-02
Cloud Run functions Codelab LLMs	How to host a LLM in a sidecar for a Cloud Run function This codelab shows you how to host a gemma3:4b model in a sidecar for a Cloud Run function.	2025-03-27
Blog Deployment	How to deploy serverless AI with Gemma 3 on Cloud Run This blog post announces Gemma 3, a family of lightweight, open AI models, and explains how to deploy them on Cloud Run for scalable and cost-effective serverless AI applications.	2025-03-12
Architecture RAG Vertex AI	RAG infrastructure for generative AI using Vertex AI and Vector Search This document presents a reference architecture for building a generative AI application with Retrieval-Augmented Generation (RAG) on Google Cloud, utilizing Vector Search for large-scale similarity matching and Vertex AI for managing embeddings and models.	2025-03-07
Blog Vertex AI	Create shareable generative AI apps in less than 60 seconds with Vertex AI and Cloud Run This article introduces a feature in Vertex AI that allows for one-click deployment of web applications on Cloud Run. Use generative AI prompts to streamline the process of turning a generative AI concept into a shareable prototype.	2025-02-20
Blog GPUs Inference RAG Vertex AI	Unlock Inference-as-a-Service with Cloud Run and Vertex AI This blog post explains how developers can accelerate the development of generative AI applications by adopting an Inference-as-a-Service model on Cloud Run. This enables hosting and scaling of LLMs with GPU support and integrating them with Retrieval-Augmented Generation (RAG) for context-specific responses.	2025-02-20
Community LLMs	From Zero to Deepseek on Cloud Run during my morning commute This article shows how to rapidly deploy the Deepseek R1 model on Cloud Run with GPUs using Ollama during a morning commute. This article explores advanced topics such as embedding the model in the container, A/B testing with traffic splitting, and adding a web UI with a sidecar container.	2025-02-11
Function calling Gemini Video	How to use Gemini function calling with Cloud Run This video explores the power of Gemini function calling and learn how to integrate external APIs into your AI applications. Build a weather app that leverages Gemini's natural language understanding to process user requests and fetch weather data from an external API, providing a practical example of function calling in action.	2025-01-23
Community LLMs Ollama	How to run (any) open LLM with Ollama on Google Cloud Run [Step-by-step] This article shows how to host any open LLM, such as Gemma 2, on Google Cloud Run using Ollama. The article also includes instructions for creating a Cloud Storage bucket for model persistence and testing the deployment.	2025-01-20
Community ML models	Deployment of Serverless Machine Learning models with GPUs using Google Cloud: Cloud Run This article provides a step-by-step guide to deploying a machine learning (ML) model with GPU support on Cloud Run. The article covers everything from project setup and containerization to automated deployment with Cloud Build and testing with curl and JavaScript.	2025-01-17
Image generation Vertex AI Video	Text to image with Google Cloud's Vertex AI on Cloud Run This video shows how to build an image generation app using Vertex AI on Google Cloud. With Vertex AI image generation model, developers can create stunning visuals without the need for complex infrastructure or model management.	2025-01-16
Data protection Security Video	Protecting sensitive data in AI apps This video shows how to safeguard sensitive data in AI applications. Explore key concepts, best practices, and tools for protecting data throughout the AI lifecycle.	2024-11-21
Large prompt window Model tuning RAG Video	RAG vs Model tuning vs Large prompt window This video discusses the three primary methods for integrating your data into AI applications: prompts with long context windows, Retrieval Augmented Generation (RAG), and model tuning. Learn the strengths, limitations, and ideal use cases for each approach to make informed decisions for your AI projects in this episode of Serverless Expeditions.	2024-11-14
Prompt engineering Video	Prompt engineering for developers This video shows how to use prompt engineering to improve the quality of AI responses. Watch the video to learn how to unlock more accurate and relevant responses from generative AI with chain of thought, few-shot, and multi-shot prompting techniques.	2024-10-31

Cloud Run AI Cookbook Stay organized with collections Save and categorize content based on your preferences.

A note on community resources

Filter by category or keyword

Cloud Run AI Cookbook