Introducing Google AI Edge Portal : Benchmark Edge AI at scale. Sign-up to request access during private preview.

LiteRT CompiledModel Python API

The LiteRT CompiledModel API is available in Python, offering a high-level interface for compiling and running TFLite models with the LiteRT runtime.

The following guide shows basic CPU inference with the CompiledModel Python API.

Install the pip package

Install the LiteRT pip package in your Python environment:

 pip  
install  
ai-edge-litert

Basic inference

Create `CompiledModel`

Create a compiled model from a .tflite file. The current Python wrapper compiles for CPU by default.

  from 
  
 ai_edge_litert.compiled_model 
  
 import 
 CompiledModel 
 model 
 = 
 CompiledModel 
 . 
 from_file 
 ( 
 "mymodel.tflite" 
 )

You can also create a compiled model from an in-memory buffer:

  from 
  
 ai_edge_litert.compiled_model 
  
 import 
 CompiledModel 
 with 
 open 
 ( 
 "mymodel.tflite" 
 , 
 "rb" 
 ) 
 as 
 f 
 : 
 model 
 = 
 CompiledModel 
 . 
 from_buffer 
 ( 
 f 
 . 
 read 
 ())

Create Input and Output Buffers

Create the necessary data structures (buffers) to hold the input data that you will feed into the model for inference, and the output data that the model produces after running inference.

  signature_index 
 = 
 0 
 input_buffers 
 = 
 model 
 . 
 create_input_buffers 
 ( 
 signature_index 
 ) 
 output_buffers 
 = 
 model 
 . 
 create_output_buffers 
 ( 
 signature_index 
 )

The signature_index value of 0 selects the first signature in the model.

If you are using CPU memory, fill the inputs by writing NumPy arrays directly into the input buffers.

  import 
  
 numpy 
  
 as 
  
 np 
 input_data 
 = 
 np 
 . 
 array 
 ([[ 
 1.0 
 , 
 2.0 
 , 
 3.0 
 , 
 4.0 
 ]], 
 dtype 
 = 
 np 
 . 
 float32 
 ) 
 input_buffers 
 [ 
 0 
 ] 
 . 
 write 
 ( 
 input_data 
 )

Invoke the model

Providing the input and output buffers, run the model.

  model 
 . 
 run_by_index 
 ( 
 signature_index 
 , 
 input_buffers 
 , 
 output_buffers 
 )

Retrieve Outputs

Retrieve outputs by directly reading the model output from memory.

  import 
  
 numpy 
  
 as 
  
 np 
 # Replace num_elements with the size of your model's output tensor. 
 num_elements 
 = 
 4 
 output_array 
 = 
 output_buffers 
 [ 
 0 
 ] 
 . 
 read 
 ( 
 num_elements 
 , 
 np 
 . 
 float32 
 )

Use `TensorBuffer`

LiteRT provides built-in support for I/O buffer interoperability through the TensorBuffer API, which supports writing NumPy arrays ( write ) and reading NumPy arrays ( read ). Supported dtypes are np.float32 , np.int32 , and np.int8 .

You can also create a buffer backed by existing host memory:

  import 
  
 numpy 
  
 as 
  
 np 
 from 
  
 ai_edge_litert.tensor_buffer 
  
 import 
 TensorBuffer 
 input_array 
 = 
 np 
 . 
 array 
 ([[ 
 1.0 
 , 
 2.0 
 , 
 3.0 
 , 
 4.0 
 ]], 
 dtype 
 = 
 np 
 . 
 float32 
 ) 
 input_buffer 
 = 
 TensorBuffer 
 . 
 create_from_host_memory 
 ( 
 input_array 
 )

To run by signature name, first inspect the model signatures and then supply maps from input/output names to TensorBuffer instances:

  from 
  
 ai_edge_litert.tensor_buffer 
  
 import 
 TensorBuffer 
 signatures 
 = 
 model 
 . 
 get_signature_list 
 () 
 # Example signature structure: 
 # {"serving_default": {"inputs": ["input_0"], "outputs": ["output_0"]}} 
 input_buffer 
 = 
 TensorBuffer 
 . 
 create_from_host_memory 
 ( 
 input_array 
 ) 
 output_buffer 
 = 
 model 
 . 
 create_output_buffer_by_name 
 ( 
 "serving_default" 
 , 
 "output_0" 
 ) 
 model 
 . 
 run_by_name 
 ( 
 "serving_default" 
 , 
 { 
 "input_0" 
 : 
 input_buffer 
 }, 
 { 
 "output_0" 
 : 
 output_buffer 
 }, 
 )

For a more complete view of how the TensorBuffer API is implemented, see the source code at TensorBuffer .

Use GPU Accelerator

If you have GPU, you can use it just by adding HardwareAccelerator.GPU option to CompiledModel creation API.

  from 
  
 ai_edge_litert.compiled_model 
  
 import 
 CompiledModel 
 from 
  
 ai_edge_litert.compiled_model 
  
 import 
 HardwareAccelerator 
 model 
 = 
 CompiledModel 
 . 
 from_file 
 ( 
 "mymodel.tflite" 
 , 
 HardwareAccelerator 
 . 
 GPU 
 )

Check this to see which backend is supported for your platform.