This document describes how to deploy and serve open models on Gemini Enterprise Agent Platform using prebuilt container images. Gemini Enterprise Agent Platform provides prebuilt containers for popular serving frameworks like vLLM , Hex-LLM , and SGLang , as well as support for Hugging Face Text Generation Inference (TGI) , Text Embeddings Inference (TEI) , Inference Toolkit (via Google Cloud Hugging Face PyTorch Inference Containers ) and Tensor-RT-LLM containers to serve supported models on Gemini Enterprise Agent Platform.
vLLM is an open-source library for fast inference and serving of Large Language Models (LLMs). Gemini Enterprise Agent Platform uses an optimized and customized version of vLLM. This version is specifically designed for enhanced performance, reliability, and seamless integration within Google Cloud. You can use Gemini Enterprise Agent Platform's customized vLLM container image to serve models on Gemini Enterprise Agent Platform. The prebuilt vLLM container can download models from Hugging Face or from Cloud Storage. For more information about model serving with Gemini Enterprise Agent Platform prebuilt vLLM container images, see Model serving with Gemini Enterprise Agent Platform prebuilt vLLM container images .
Example Notebooks
The following notebooks demonstrate how to use Gemini Enterprise Agent Platform prebuilt containers for model serving. You can find more sample notebooks in the GitHub repository for Gemini Enterprise Agent Platform samples .
| Notebook Name | Description | Direct Link (GitHub/Colab) |
|---|---|---|
|
Gemini Enterprise Agent Platform Model Garden - Gemma 3 (deployment)
|
Demonstrates deploying Gemma 3 models on GPU using vLLM. | View on GitHub |
|
Gemini Enterprise Agent Platform Model Garden - Serve Multimodal Llama 3.2 with vLLM
|
Deploys multimodal Llama 3.2 models using the vLLM prebuilt container. | View on GitHub |
|
Gemini Enterprise Agent Platform Model Garden - Hugging Face Text Generation Inference Deployment
|
Demonstrates deploying Gemma-2-2b-it model with Text Generation Inference (TGI) from Hugging Face | View on GitHub |
|
Gemini Enterprise Agent Platform Model Garden - Hugging Face Text Embeddings Inference Deployment
|
Demonstrates deploying nomic-ai/nomic-embed-text-v1 with Text Embeddings Inference (TEI) from Hugging Face | View on GitHub |
|
Gemini Enterprise Agent Platform Model Garden - Hugging Face PyTorch Inference Deployment
|
Demonstrates deploying distilbert/distilbert-base-uncased-finetuned-sst-2-english with Hugging Face PyTorch Inference | View on GitHub |
|
Gemini Enterprise Agent Platform Model Garden - DeepSeek Deployment
|
Demonstrates serving DeepSeek models with vLLM, SGLang, or TensorRT-LLM | View on GitHub |
|
Gemini Enterprise Agent Platform Model Garden - Qwen3 Deployment
|
Demonstrates serving Qwen3 models with SGLang | View on GitHub |
|
Gemini Enterprise Agent Platform Model Garden - Gemma 3n Deployment
|
Demonstrates serving Gemma3n models with SGLang | View on GitHub |
|
Gemini Enterprise Agent Platform Model Garden - Deep dive: Deploy Llama 3.1 and 3.2 with Hex-LLM
|
Demonstrates deploying Llama 3.1 and 3.2 models using Hex-LLM on TPUs using Gemini Enterprise Agent Platform Model Garden | View on GitHub |

