Samples

Training

Train Llama 3-8B using JAX, Ray, and GKE on Trillium

Perform distributed training of the Llama 3-8B model on GKE using JAX, Ray Train, and TPU v6e (Trillium) with MaxText for optimized multi-host scaling.
Pretrain Llama 3.1-70B using GKE clusters on Ironwood

Train the Llama 3.1-70B model on TPU7x (Ironwood) using the MaxText framework.
Pretrain DeepSeek 3-671B using GKE clusters on Ironwood

Train the DeepSeek 3-671B model on TPU7x using optimized recipes for large-scale Mixture-of-Experts (MoE) architectures.
Pretrain GPT OSS-120B using GKE clusters on Ironwood

Train the GPT OSS-120B reasoning model on TPU7x using optimized recipes for large-scale distributed training.
Pretrain Qwen 3-235B using GKE clusters on Ironwood

Train the Qwen 3-235B-A22B MoE model on TPU7x using optimized recipes for high-performance reasoning.
Pretrain Wan 2.1-14B using GKE clusters on Ironwood

Train the Wan 2.1-14B video generation model on TPU7x using optimized recipes for high-performance video synthesis.
Pretrain GPT3-175B using GKE clusters on Trillium

Train the GPT3-175B model on TPU v6e using MaxText and optimized recipes for large-scale, cost-effective performance.
Pretrain Gemma3-12B using GKE clusters on Trillium

Train the Gemma3-12B model on TPU v6e using MaxText and optimized recipes for high-performance open-model development.
Pretrain Llama 3.1-70B using GKE clusters on Trillium

Train Llama 3.1-70B on TPU v6e using MaxText and optimized recipes for high-throughput, large-scale model training.
Pretrain Llama 3.1-8B using GKE clusters on Trillium

Train Llama 3.1-8B using MaxText on TPU v6e with this optimized recipe for scalable and high-performance pre-training.
Pretrain Mixtral-8x22B using GKE clusters on Trillium

Train Mixtral-8x22B on TPU v6e using MaxText for optimized performance and efficiency.
Pretrain Mixtral-8x7B using GKE clusters on Trillium

Train Mixtral-8x7B using MaxText on TPU v6e with optimized configurations for high-throughput MoE performance on Google Cloud.
Pretrain DeepSeek 3-671B using GKE clusters on v5p

Train and deploy the DeepSeek 3-671B model on TPU v5p using MaxText for optimized large-scale performance.
Pretrain GPT3-175B using GKE clusters on v5p

Train the GPT3-175B model on TPU v5p using MaxText with optimized configurations for large-scale distributed training.
Pretrain Mixtral-8x7B using GKE clusters on v5p

Train Mixtral-8x7B on TPU v5p using MaxText with optimized configurations for high-performance MoE workloads.
Pretrain SDXL using GKE clusters on v5p

Train and scale Stable Diffusion XL (SDXL) on TPU v5p using MaxDiffusion for high-performance generative AI workloads.

Inference

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.

Create a Mobile Website

View Site in Mobile | Classic

Share by: