Training
-
Train Llama 3-8B using JAX, Ray, and GKE on Trillium
Perform distributed training of the Llama 3-8B model on GKE using JAX, Ray Train, and TPU v6e (Trillium) with MaxText for optimized multi-host scaling.
-
Pretrain Llama 3.1-70B using GKE clusters on Ironwood
Train the Llama 3.1-70B model on TPU7x (Ironwood) using the MaxText framework.
-
Pretrain DeepSeek 3-671B using GKE clusters on Ironwood
Train the DeepSeek 3-671B model on TPU7x using optimized recipes for large-scale Mixture-of-Experts (MoE) architectures.
-
Pretrain GPT OSS-120B using GKE clusters on Ironwood
Train the GPT OSS-120B reasoning model on TPU7x using optimized recipes for large-scale distributed training.
-
Pretrain Qwen 3-235B using GKE clusters on Ironwood
Train the Qwen 3-235B-A22B MoE model on TPU7x using optimized recipes for high-performance reasoning.
-
Pretrain Wan 2.1-14B using GKE clusters on Ironwood
Train the Wan 2.1-14B video generation model on TPU7x using optimized recipes for high-performance video synthesis.
-
Pretrain GPT3-175B using GKE clusters on Trillium
Train the GPT3-175B model on TPU v6e using MaxText and optimized recipes for large-scale, cost-effective performance.
-
Pretrain Gemma3-12B using GKE clusters on Trillium
Train the Gemma3-12B model on TPU v6e using MaxText and optimized recipes for high-performance open-model development.
-
Pretrain Llama 3.1-70B using GKE clusters on Trillium
Train Llama 3.1-70B on TPU v6e using MaxText and optimized recipes for high-throughput, large-scale model training.
-
Pretrain Llama 3.1-8B using GKE clusters on Trillium
Train Llama 3.1-8B using MaxText on TPU v6e with this optimized recipe for scalable and high-performance pre-training.
-
Pretrain Mixtral-8x22B using GKE clusters on Trillium
Train Mixtral-8x22B on TPU v6e using MaxText for optimized performance and efficiency.
-
Pretrain Mixtral-8x7B using GKE clusters on Trillium
Train Mixtral-8x7B using MaxText on TPU v6e with optimized configurations for high-throughput MoE performance on Google Cloud.
-
Pretrain DeepSeek 3-671B using GKE clusters on v5p
Train and deploy the DeepSeek 3-671B model on TPU v5p using MaxText for optimized large-scale performance.
-
Pretrain GPT3-175B using GKE clusters on v5p
Train the GPT3-175B model on TPU v5p using MaxText with optimized configurations for large-scale distributed training.
-
Pretrain Mixtral-8x7B using GKE clusters on v5p
Train Mixtral-8x7B on TPU v5p using MaxText with optimized configurations for high-performance MoE workloads.
-
Pretrain SDXL using GKE clusters on v5p
Train and scale Stable Diffusion XL (SDXL) on TPU v5p using MaxDiffusion for high-performance generative AI workloads.
Inference
-
Serve Llama 3.1-70B using GKE and vLLM on Trillium
Serve LLMs on GKE using TPU v6e and vLLM, featuring optimized autoscaling and high-performance model serving on Google Cloud.
-
Serve LLMs using GKE with KubeRay
Serve an LLM using TPUs on GKE with the Ray Operator add-on and the vLLM serving framework.
-
Serve open LLMs using GKE with Terraform
Provision a GKE inference environment and deploy open LLMs using TPUs and a pre-configured Terraform architecture.
-
Serve Stable Diffusion XL (SDXL) using GKE
Serve Stable Diffusion XL (SDXL) on GKE using Cloud TPUs and the MaxDiffusion framework for high-performance image generation.
-
Serve GPT OSS-120B with vLLM using GKE clusters on Ironwood
Run high-performance inference for GPT-OSS models on TPU7x using vLLM for optimized throughput and low-latency serving on Google Cloud.
-
Serve Qwen3-Coder-480B with vLLM using GKE clusters on Ironwood
Serve Qwen3-Coder-480B-A35B on TPU7x using vLLM for optimized, high-throughput code generation and inference.
-
Serve Llama 3.1-8B with vLLM on Trillium
Serve Llama 3.1-8B on TPU v6e using vLLM for optimized, low-latency inference and high-throughput serving.
-
Serve Qwen 3 with vLLM on Trillium
Serve Qwen 3 models on TPU v6e using vLLM for high-performance, scalable inference and optimized throughput.
-
Serve Qwen2.5-32B with vLLM on Trillium
Serve the Qwen2.5-32B model on TPU v6e using vLLM for optimized, high-throughput inference.
-
Serve Qwen2.5-VL with vLLM on Trillium
Serve Qwen2.5-VL vision-language models on TPU v6e using vLLM for optimized, high-performance multimodal inference.

