Deploy open models with prebuilt containers

This document describes how to deploy and serve open models on Gemini Enterprise Agent Platform using prebuilt container images. Gemini Enterprise Agent Platform provides prebuilt containers for popular serving frameworks like vLLM , Hex-LLM , and SGLang , as well as support for Hugging Face Text Generation Inference (TGI) , Text Embeddings Inference (TEI) , Inference Toolkit (via Google Cloud Hugging Face PyTorch Inference Containers ) and Tensor-RT-LLM containers to serve supported models on Gemini Enterprise Agent Platform.

vLLM is an open-source library for fast inference and serving of Large Language Models (LLMs). Gemini Enterprise Agent Platform uses an optimized and customized version of vLLM. This version is specifically designed for enhanced performance, reliability, and seamless integration within Google Cloud. You can use Gemini Enterprise Agent Platform's customized vLLM container image to serve models on Gemini Enterprise Agent Platform. The prebuilt vLLM container can download models from Hugging Face or from Cloud Storage. For more information about model serving with Gemini Enterprise Agent Platform prebuilt vLLM container images, see Model serving with Gemini Enterprise Agent Platform prebuilt vLLM container images .

Example Notebooks

The following notebooks demonstrate how to use Gemini Enterprise Agent Platform prebuilt containers for model serving. You can find more sample notebooks in the GitHub repository for Gemini Enterprise Agent Platform samples .

Notebook Name	Description	Direct Link (GitHub/Colab)
Gemini Enterprise Agent Platform Model Garden - Gemma 3 (deployment)	Demonstrates deploying Gemma 3 models on GPU using vLLM.	View on GitHub
Gemini Enterprise Agent Platform Model Garden - Serve Multimodal Llama 3.2 with vLLM	Deploys multimodal Llama 3.2 models using the vLLM prebuilt container.	View on GitHub
Gemini Enterprise Agent Platform Model Garden - Hugging Face Text Generation Inference Deployment	Demonstrates deploying Gemma-2-2b-it model with Text Generation Inference (TGI) from Hugging Face	View on GitHub
Gemini Enterprise Agent Platform Model Garden - Hugging Face Text Embeddings Inference Deployment	Demonstrates deploying nomic-ai/nomic-embed-text-v1 with Text Embeddings Inference (TEI) from Hugging Face	View on GitHub
Gemini Enterprise Agent Platform Model Garden - Hugging Face PyTorch Inference Deployment	Demonstrates deploying distilbert/distilbert-base-uncased-finetuned-sst-2-english with Hugging Face PyTorch Inference	View on GitHub
Gemini Enterprise Agent Platform Model Garden - DeepSeek Deployment	Demonstrates serving DeepSeek models with vLLM, SGLang, or TensorRT-LLM	View on GitHub
Gemini Enterprise Agent Platform Model Garden - Qwen3 Deployment	Demonstrates serving Qwen3 models with SGLang	View on GitHub
Gemini Enterprise Agent Platform Model Garden - Gemma 3n Deployment	Demonstrates serving Gemma3n models with SGLang	View on GitHub
Gemini Enterprise Agent Platform Model Garden - Deep dive: Deploy Llama 3.1 and 3.2 with Hex-LLM	Demonstrates deploying Llama 3.1 and 3.2 models using Hex-LLM on TPUs using Gemini Enterprise Agent Platform Model Garden	View on GitHub

Deploy open models with prebuilt containers Stay organized with collections Save and categorize content based on your preferences.

Example Notebooks

What's next

Deploy open models with prebuilt containers