Use a custom OS image

You can use a custom OS image for your TPU VMs to pre-load software, use a specific OS distribution, or apply custom kernel modifications. Creating a custom image involves making specific system modifications during the image creation process and configuring the image to handle boot-time tasks required for TPU functionality.

Keep the following disclaimers in mind if you use a custom OS image with TPUs:

  • Google provides default TPU-optimized Ubuntu long-term support (LTS) images. The OS changes listed on this page are only validated for the Google-supported, TPU-optimized Ubuntu LTS images.
  • You are responsible for extrapolating the required OS changes for any other OS distribution or custom images. Google doesn't guarantee that the modifications for Ubuntu listed on this page work with other OS distributions or another Ubuntu image with a custom kernel.
  • Google doesn't build or provide testing for any OS images other than the default TPU-optimized Ubuntu LTS images. You must build and test your custom OS image.

For more information about the default TPU-optimized Ubuntu LTS images, see TPU OS images .

Prerequisites

Your base image must have the following components installed:

  • Python 3
  • gcloud CLI

Make modifications during image creation

Apply the following modifications while building your custom Ubuntu image.

Bind TPU devices to VFIO

To allow the guest OS to access TPU hardware, you must bind TPU devices to the vfio-pci driver .

  1. Create a udev rules file named 99-tpu-vfiopci.rules in /etc/udev/rules.d/ :

      # Rules for binding vfio-enabled TPU devices to vfio-pci. 
     # v5p 
     SUBSYSTEM 
     == 
     "pci" 
    ,  
     ACTION 
     == 
     "add" 
    ,  
    ATTRS { 
    vendor }== 
     "0x1ae0" 
    ,  
    ATTRS { 
    device }== 
     "0x0062" 
    ,  
    ATTRS { 
    subsystem_vendor }== 
     "0x1ae0" 
    ,  
    ATTRS { 
    subsystem_device }== 
     "0x00ad" 
    ,  
    DRIVER! = 
     "vfio-pci" 
    ,  
     TAG 
     += 
     "bind_to_vfio_pci" 
     # v6e 
     SUBSYSTEM 
     == 
     "pci" 
    ,  
     ACTION 
     == 
     "add" 
    ,  
    ATTRS { 
    vendor }== 
     "0x1ae0" 
    ,  
    ATTRS { 
    device }== 
     "0x006f" 
    ,  
    ATTRS { 
    subsystem_vendor }== 
     "0x1ae0" 
    ,  
    ATTRS { 
    subsystem_device }== 
     "0x00d1" 
    ,  
    DRIVER! = 
     "vfio-pci" 
    ,  
     TAG 
     += 
     "bind_to_vfio_pci" 
     # TPU7x 
     SUBSYSTEM 
     == 
     "pci" 
    ,  
     ACTION 
     == 
     "add" 
    ,  
    ATTRS { 
    vendor }== 
     "0x1ae0" 
    ,  
    ATTRS { 
    device }== 
     "0x0076" 
    ,  
    ATTRS { 
    subsystem_vendor }== 
     "0x1ae0" 
    ,  
    ATTRS { 
    subsystem_device }== 
     "0x00f2" 
    ,  
    DRIVER! = 
     "vfio-pci" 
    ,  
     TAG 
     += 
     "bind_to_vfio_pci" 
     # Bind all 'bind_to_vfio_pci' tagged devices to vfio-pci. 
     TAG 
     == 
     "bind_to_vfio_pci" 
    ,  
     RUN 
     += 
     "/lib/udev/bind_to_vfio_pci.sh 
     $kernel 
     " 
     
    
  2. Create a script named bind_to_vfio_pci.sh in /lib/udev/ :

      #!/bin/bash 
     #!/usr/bin/env bash 
     # Run ./bind_to_vfio_pci.sh <DBDF> 
     # Binds the device at <DBDF> to vfio-pci. 
     # If the device is already bound to a driver, unbinds it first. 
     # Load the vfio-pci module into the kernel. No-op if already loaded. 
    modprobe  
    vfio-pci DBDF_REGEX 
     = 
     "^[[:xdigit:]]{4}:[[:xdigit:]]{2}:[[:xdigit:]]{2}.[[:xdigit:]] 
    $ " 
     unset 
      
    BDF if 
      
     [[ 
      
     $1 
      
     = 
    ~  
     $DBDF_REGEX 
      
     ]] 
     ; 
      
     then 
      
     BDF 
     = 
     $1 
     else 
      
     echo 
      
     "Error: BDF arg ( 
     $1 
     ) is not in form dddd:bb:dd.f" 
      
     exit 
      
     1 
     fi 
     PCI_PATH 
     = 
     "/sys/bus/pci/devices/ 
     $BDF 
     " 
     echo 
      
     "vfio-pci" 
     > 
     " 
     $PCI_PATH 
     /driver_override" 
     PCI_DRIVER_PATH 
     = 
     " 
     $PCI_PATH 
     /driver" 
     if 
      
     [[ 
      
    -d  
     " 
     $PCI_DRIVER_PATH 
     " 
      
     ]] 
     ; 
      
     then 
      
     curr_driver 
     = 
     $( 
    readlink  
     " 
     $PCI_DRIVER_PATH 
     " 
     ) 
      
     curr_driver 
     = 
     ${ 
     curr_driver 
     ##*/ 
     } 
      
     if 
      
     [[ 
      
     $curr_driver 
      
     == 
      
     "vfio-pci" 
      
     ]] 
     ; 
      
     then 
      
     echo 
      
     " 
     $BDF 
     already bound to vfio-pci" 
      
     exit 
      
     0 
      
     else 
      
     echo 
      
     " 
     $BDF 
     " 
     > 
     " 
     $PCI_DRIVER_PATH 
     /unbind" 
      
     if 
      
     [[ 
      
    -d  
     " 
     $PCI_DRIVER_PATH 
     " 
      
     ]] 
     ; 
      
     then 
      
     echo 
      
     "Error: Unable to unbind 
     $PCI_DRIVER_PATH 
     " 
      
     exit 
      
     1 
      
     fi 
      
     echo 
      
     "Unbound 
     $BDF 
     from driver 
     $curr_driver 
     " 
      
     fi 
     fi 
     echo 
      
     " 
     $BDF 
     " 
     > 
    /sys/bus/pci/drivers_probe echo 
      
     "Bound 
     $BDF 
     to vfio-pci" 
     # Grant read/write access on VFIO device to all users 
     IOMMU_GROUP 
     = 
     $( 
    readlink  
     " 
     $PCI_PATH 
     /iommu_group" 
      
     | 
      
    xargs  
    basename ) 
     VFIO_DEV 
     = 
     "/dev/vfio/ 
     $IOMMU_GROUP 
     " 
     if 
      
     [[ 
      
    -c  
     " 
     $VFIO_DEV 
     " 
      
     ]] 
     ; 
      
     then 
      
    chmod  
     0666 
      
     " 
     $VFIO_DEV 
     " 
     else 
      
     echo 
      
     " 
     $VFIO_DEV 
     not found" 
      
     exit 
      
     1 
     fi 
     # Set allow_unsafe_interrupts for x86 platforms. 
     ( 
    uname  
    -a  
     | 
      
    grep  
    -q  
    x86_64 ) 
     && 
     echo 
      
     1 
     > 
    /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts # This is only needed to avoid non-zero exit code from previous command. 
     echo 
      
     "All Done!" 
     
    
  3. Make the script executable:

     chmod  
    +x  
    /lib/udev/bind_to_vfio_pci.sh 
    
  4. Grant all users on the system access to the TPU device:

      echo 
      
     'KERNEL=="accel*" MODE="0666"' 
     >> 
    /etc/udev/rules.d/99-tpu.rules 
    

Modify the image to enhance performance

To ensure optimal performance, adjust the following system limits and parameters.

Memory limits

Allow a single process to lock unlimited memory by updating /etc/security/limits.conf :

  echo 
  
 '*  hard  memlock  unlimited' 
 >> 
/etc/security/limits.conf echo 
  
 '*  soft  memlock  unlimited' 
 >> 
/etc/security/limits.conf 

File limits

Increase the number of open files by updating /etc/security/limits.conf :

  echo 
  
 "*    soft    nofile       100000" 
 >> 
/etc/security/limits.conf echo 
  
 "*    hard    nofile       100000" 
 >> 
/etc/security/limits.conf echo 
  
 "root soft    nofile       100000" 
 >> 
/etc/security/limits.conf echo 
  
 "root hard    nofile       100000" 
 >> 
/etc/security/limits.conf 

Kernel parameters

Update your GRUB configuration (typically in /etc/default/grub ) to include the following parameters in GRUB_CMDLINE_LINUX :

  • idle=poll : Prevents the CPU from entering low-power idle states.
  • intel_iommu=on,sm_on : Enables Intel Input-Output Memory Management Unit (IOMMU). Required for TPU7x and v5p architectures.
  • transparent_hugepage=always : Enables Transparent Huge Pages (THP).

The following steps show how to update these kernel parameters:

  1. Prevent the CPU from moving into a low power idle state by setting the following variable, which you will use in the next step.

      kernel_cmdline 
     = 
     "idle=poll" 
     
    
  2. Enable the Intel Input-Output Memory Management Unit (IOMMU). This step is required for TPU7x and TPU v5p.

      kernel_cmdline 
     = 
     " 
     ${ 
     kernel_cmdline 
     } 
     intel_iommu=on,sm_on" 
     ; 
    sed  
    -i  
     "s/GRUB_CMDLINE_LINUX=\"\"/GRUB_CMDLINE_LINUX=\" 
     ${ 
     kernel_cmdline 
     } 
     \"/" 
      
    /etc/default/grub echo 
      
     "Status: New kernel cmdline: 
     $( 
    cat  
    /etc/default/grub  
     | 
      
    grep  
    -e  
     '^GRUB_CMDLINE_LINUX=' 
     ) 
     " 
    update-grub 
    
  3. Enable Transparent Huge Pages (THP):

      echo 
      
     "Status: Enabling THP" 
    sed  
    -i  
    -r  
     's/GRUB_CMDLINE_LINUX="[a-zA-Z0-9_= ]*/& transparent_hugepage=always/' 
      
    /etc/default/grub
    
    update-grub 
    

Install vBar agent

The vBar agent is required for the inter-chip interconnect (ICI) network to function.

To install the vBar agent, run the following commands:

  1. Authenticate Docker with Artifact Registry:

     gcloud  
    auth  
    configure-docker  
    us-docker.pkg.dev 
    
  2. Pull the Docker image from Artifact Registry:

     docker  
    pull  
    gcr.io/cloud-tpu-v2-images/vbar_control_agent:0.0.1 
    
  3. Run a container using the vBar agent image:

     docker  
    run  
    --privileged  
    --net = 
    host  
    vbar_control_agent:0.0.1 
    

Optional: Install and run AI Telemetry Collector

The AI Telemetry Collector runs inside the TPU VM and lets you access runtime and infrastructure metrics through Cloud Monitoring or through your own Prometheus-based monitoring pipeline. You can use the AI Telemetry Collector with a custom OS by using the ai-telemetry-collector Docker image. You can install the image onto your custom OS and use a config.yaml file to dictate the collection intervals, enable or disable specific metrics, or change the export destinations.

To install the AI Telemetry Collector, run the following commands:

  1. Authenticate Docker with Artifact Registry:

     gcloud  
    auth  
    configure-docker  
    us-docker.pkg.dev 
    
  2. Pull the Docker image from Artifact Registry:

     docker  
    pull  
    gcr.io/cloud-tpu-v2-images/ai-telemetry-collector:latest 
    
  3. Run a container using the AI Telemetry Collector image with the default configuration:

     docker  
    run  
    --privileged  
    --net = 
    host  
    ai-telemetry-collector:latest 
    

    For information about using a custom configuration file or adding additional configuration files, see AI Telemetry Collector .

Make boot time modifications

Configure your image to perform the tasks in the following sections every time a VM boots. You can use the cloud-init tool to configure boot time tasks by passing metadata to your instances. The configurations in the following sections use modules such as write_files and runcmd . Snippets that define files to be written should be included under the write_files: key, and commands that should be run at boot time should be included under the runcmd: key in your cloud-init configuration.

Start the vBar agent

Initiate the vBar control agent with the appropriate user and group IDs:

 vbar_control_agent  
--logtostderr  
--gid = 
  
--uid = 
  
--chroot = 
  
--census_enabled = 
 false 
  
--loas_pwd_fallback_in_corp 

Configure environment variables

To ensure your environment is correctly initialized for TPU workloads, you must retrieve runtime configuration variables from the Compute Engine metadata server during the system boot process. To do this, add the following snippet to the write_files: section of your cloud-init configuration, which creates a script named /var/scripts/configure-env-vars.sh . This script automates retrieval of attributes from the tpu-env metadata key and saves them in /${HOME}/tpu-env to be used by the TPU software stack.

   
 - 
  
 path 
 : 
  
 /var/scripts/configure-env-vars.sh 
  
 permissions 
 : 
  
 0444 
  
 owner 
 : 
  
 root 
  
 content 
 : 
  
 | 
  
 grep -q CLOUDSDK_PYTHON /etc/environment || echo "CLOUDSDK_PYTHON=/usr/bin/python3" >> /etc/environment 
  
 export HOME=/home/tpu-runtime 
  
 curl -s 'http://metadata.google.internal/computeMetadata/v1/instance/attributes/tpu-env' -H 'Metadata-Flavor: Google' > /tmp/tpu-env.yaml 
  
 eval $(python3 -c ''' 
  
 import yaml 
  
 stream_in=open("/tmp/tpu-env.yaml", "r") 
  
 for k,v in yaml.safe_load(stream_in).items(): 
  
 print("{var}=\"{value}\"".format(var = k, value = str(v))) 
  
 ''' > "/${HOME}/tpu-env" 
  
 ) 
  
 rm -f "/tmp/tpu-env.yaml" 
  
 printenv 
  
 cat ${HOME}/tpu-env 
 

The following snippet creates a script named /var/scripts/get-vm-metadata.py , a Python utility to programmatically query the metadata server for specific instance attributes and custom metadata tags. Add the following to the write_files: section of your cloud-init configuration:

   
 - 
  
 path 
 : 
  
 /var/scripts/get-vm-metadata.py 
  
 permissions 
 : 
  
 0444 
  
 owner 
 : 
  
 root 
  
 content 
 : 
  
 | 
  
 import sys, requests, os 
  
 if len(sys.argv) < 2: 
  
 sys.stderr.write('Must provide key') 
  
 os._exit(1) 
  
 key = sys.argv[1] 
  
 default = None 
  
 if len(sys.argv) > 2: 
  
 default = sys.argv[2] 
  
 attribute_type = 'attributes' 
  
 if len(sys.argv) > 3: 
  
 attribute_type = sys.argv[3] 
  
 request = requests.get("http://metadata.google.internal/computeMetadata/v1/instance/{}/{}".format(attribute_type, key), headers={'Metadata-Flavor': 'Google'}) 
  
 if request.status_code == 200: 
  
 print(request.content) 
  
 elif request.status_code == 404 or request.status_code == '403': 
  
 sys.stderr.write('Metadata key: {} does not exist\n'.format(key)) 
  
 if default: 
  
 print(default) 
  
 else: 
  
 sys.stderr.write('Lookup failed with: {}'.format(request)) 
 

Increase Cloud Storage timeouts

If your workload interacts with Cloud Storage, increase timeout durations by adding timeout values to /etc/environment . To do this, add the following snippet to the write_files: section of your cloud-init configuration, which creates a script named /var/scripts/configure-gcs-timeouts.sh .

   
 - 
  
 path 
 : 
  
 /var/scripts/configure-gcs-timeouts.sh 
  
 permissions 
 : 
  
 0444 
  
 owner 
 : 
  
 root 
  
 content 
 : 
  
 | 
  
 echo "GCS_RESOLVE_REFRESH_SECS=60" >> /etc/environment 
  
 echo "GCS_REQUEST_CONNECTION_TIMEOUT_SECS=300" >> /etc/environment 
  
 echo "GCS_METADATA_REQUEST_TIMEOUT_SECS=300" >> /etc/environment 
  
 echo "GCS_READ_REQUEST_TIMEOUT_SECS=300" >> /etc/environment 
  
 echo "GCS_WRITE_REQUEST_TIMEOUT_SECS=600" >> /etc/environment 
 

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: