Introducing Google AI Edge Portal : Benchmark Edge AI at scale. Sign-up to request access during private preview.

LiteRT-LM Swift API

The Swift API of LiteRT-LM lets you integrate large language models natively into iOS and macOSapplications. Features like multi-modality, tool use, and GPU acceleration(via Metal) are fully supported.

Introduction

Here is an example of using the Swift API to initialize a model and send a message:

  import 
  
 LiteRTLM 
 // 1. Initialize the Engine with your model 
 let 
  
 config 
  
 = 
  
 try 
  
 EngineConfig 
 ( 
  
 modelPath 
 : 
  
 "path/to/model.litertlm" 
 , 
  
 backend 
 : 
  
 . 
 gpu 
 , 
  
 // Use .cpu() for CPU execution 
  
 cacheDir 
 : 
  
 NSTemporaryDirectory 
 () 
 ) 
 let 
  
 engine 
  
 = 
  
 Engine 
 ( 
 engineConfig 
 : 
  
 config 
 ) 
 try 
  
 await 
  
 engine 
 . 
 initialize 
 () 
 // 2. Start a new Conversation 
 let 
  
 conversation 
  
 = 
  
 try 
  
 await 
  
 engine 
 . 
 createConversation 
 () 
 // 3. Send a message and print the response 
 let 
  
 response 
  
 = 
  
 try 
  
 await 
  
 conversation 
 . 
 sendMessage 
 ( 
 Message 
 ( 
 "What is the capital of France?" 
 )) 
 print 
 ( 
 response 
 . 
 toString 
 )

Getting Started

This section provides instructions on how to integrate the LiteRT-LM Swift API into your application.

Swift Package Manager (SPM)

You can integrate LiteRT-LM into your Xcode project using Swift Package Manager.

Open your project in Xcode and navigate to File > Add Package Dependencies...
Enter the package repository URL: https://github.com/google-ai-edge/LiteRT-LM
Select the LiteRTLMlibrary to add it to your application target.

If you are developing a package using Package.swift , add it to your dependencies:

  dependencies 
 : 
  
 [ 
  
 . 
 package 
 ( 
 url 
 : 
  
 "https://github.com/google-ai-edge/LiteRT-LM" 
 , 
  
 from 
 : 
  
 "0.12.0" 
 ) 
 ]

Core API Guide

This section details the fundamental components and workflows for using the LiteRT-LM Swift API, including engine initialization, conversation management, and sending messages.

Initialize the Engine

The Engine handles model loading, resource allocation, and lifecycle management.

  import 
  
 LiteRTLM 
 let 
  
 engineConfig 
  
 = 
  
 try 
  
 EngineConfig 
 ( 
  
 modelPath 
 : 
  
 "path/to/your/model.litertlm" 
 , 
  
 backend 
 : 
  
 . 
 gpu 
 , 
  
 // Use .gpu for Metal hardware acceleration 
  
 maxNumTokens 
 : 
  
 512 
 , 
  
 // Size of the KV-cache 
  
 cacheDir 
 : 
  
 NSTemporaryDirectory 
 () 
  
 // Writable directory for compilation cache 
 ) 
 let 
  
 engine 
  
 = 
  
 Engine 
 ( 
 engineConfig 
 : 
  
 engineConfig 
 ) 
 try 
  
 await 
  
 engine 
 . 
 initialize 
 ()

Create a Conversation

A Conversation manages chat history, system instructions, and sampler configurations.

  // Configure custom sampling parameters 
 let 
  
 samplerConfig 
  
 = 
  
 try 
  
 SamplerConfig 
 ( 
  
 topK 
 : 
  
 40 
 , 
  
 topP 
 : 
  
 0.95 
 , 
  
 temperature 
 : 
  
 0.7 
 ) 
 // Create the conversation config with system instructions 
 let 
  
 config 
  
 = 
  
 ConversationConfig 
 ( 
  
 systemMessage 
 : 
  
 Message 
 ( 
 "You are a helpful assistant." 
 ), 
  
 samplerConfig 
 : 
  
 samplerConfig 
 ) 
 let 
  
 conversation 
  
 = 
  
 try 
  
 await 
  
 engine 
 . 
 createConversation 
 ( 
 with 
 : 
  
 config 
 )

Send Messages

You can interact with the model synchronously or asynchronously (streaming).

Synchronous Example

  let 
  
 response 
  
 = 
  
 try 
  
 await 
  
 conversation 
 . 
 sendMessage 
 ( 
 Message 
 ( 
 "Hello!" 
 )) 
 print 
 ( 
 response 
 . 
 toString 
 )

Asynchronous (Streaming) Example

  let 
  
 message 
  
 = 
  
 Message 
 ( 
 "Tell me a long story." 
 ) 
 for 
  
 try 
  
 await 
  
 chunk 
  
 in 
  
 conversation 
 . 
 sendMessageStream 
 ( 
 message 
 ) 
  
 { 
  
 // Output response chunks in real-time 
  
 print 
 ( 
 chunk 
 . 
 toString 
 , 
  
 terminator 
 : 
  
 "" 
 ) 
 } 
 print 
 ()

Multi-Modality

To use vision or audio features, make sure to configure the specialized backends during engine initialization.

  let 
  
 engineConfig 
  
 = 
  
 try 
  
 EngineConfig 
 ( 
  
 modelPath 
 : 
  
 "path/to/multimodal_model.litertlm" 
 , 
  
 backend 
 : 
  
 . 
 gpu 
 , 
  
 visionBackend 
 : 
  
 . 
 cpu 
 (), 
  
 // Enable CPU vision executor 
  
 audioBackend 
 : 
  
 . 
 cpu 
 (), 
  
 // Enable CPU audio executor 
  
 cacheDir 
 : 
  
 NSTemporaryDirectory 
 () 
 ) 
 let 
  
 engine 
  
 = 
  
 Engine 
 ( 
 engineConfig 
 : 
  
 engineConfig 
 ) 
 try 
  
 await 
  
 engine 
 . 
 initialize 
 ()

Image Input (Vision)

Provide an image as a path or raw bytes:

  let 
  
 imagePath 
  
 = 
  
 Bundle 
 . 
 main 
 . 
 path 
 ( 
 forResource 
 : 
  
 "scenery" 
 , 
  
 ofType 
 : 
  
 "jpg" 
 ) 
 ! 
 let 
  
 message 
  
 = 
  
 Message 
 ( 
 contents 
 : 
  
 [ 
  
 Content 
 . 
 imageFile 
 ( 
 imagePath 
 ), 
  
 Content 
 . 
 text 
 ( 
 "Describe this image." 
 ) 
 ]) 
 let 
  
 response 
  
 = 
  
 try 
  
 await 
  
 conversation 
 . 
 sendMessage 
 ( 
 message 
 ) 
 print 
 ( 
 response 
 . 
 toString 
 )

Audio Input

Provide an audio path:

  let 
  
 audioPath 
  
 = 
  
 Bundle 
 . 
 main 
 . 
 path 
 ( 
 forResource 
 : 
  
 "recording" 
 , 
  
 ofType 
 : 
  
 "wav" 
 ) 
 ! 
 let 
  
 message 
  
 = 
  
 Message 
 ( 
 contents 
 : 
  
 [ 
  
 Content 
 . 
 audioFile 
 ( 
 audioPath 
 ), 
  
 Content 
 . 
 text 
 ( 
 "Transcribe this recording." 
 ) 
 ]) 
 let 
  
 response 
  
 = 
  
 try 
  
 await 
  
 conversation 
 . 
 sendMessage 
 ( 
 message 
 ) 
 print 
 ( 
 response 
 . 
 toString 
 )

🔴 New: Multi-Token Prediction (MTP)

Multi-Token Prediction (MTP) is a performance optimization that significantly accelerates decode speeds. It is universally recommended for all tasks using GPU/Metal backends.

To use MTP, enable speculative decoding in experimental flags before initializing the engine.

  import 
  
 LiteRTLM 
 // Opt into experimental APIs to configure MTP 
 ExperimentalFlags 
 . 
 optIntoExperimentalAPIs 
 () 
 ExperimentalFlags 
 . 
 enableSpeculativeDecoding 
  
 = 
  
 true 
 let 
  
 engineConfig 
  
 = 
  
 try 
  
 EngineConfig 
 ( 
  
 modelPath 
 : 
  
 "path/to/model.litertlm" 
 , 
  
 backend 
 : 
  
 . 
 gpu 
 , 
  
 cacheDir 
 : 
  
 NSTemporaryDirectory 
 () 
 ) 
 let 
  
 engine 
  
 = 
  
 Engine 
 ( 
 engineConfig 
 : 
  
 engineConfig 
 ) 
 try 
  
 await 
  
 engine 
 . 
 initialize 
 ()

Define and Use Tools

You can define Swift structures as tools that the model can automatically call to execute logic.

Conform to the Tool protocol.
Declare parameters using the @ToolParam property wrapper.
Implement the run() method.

  import 
  
 LiteRTLM 
 // 1. Define your custom tool 
 struct 
  
 GetCurrentWeatherTool 
 : 
  
 Tool 
  
 { 
  
 static 
  
 let 
  
 name 
  
 = 
  
 "get_current_weather" 
  
 static 
  
 let 
  
 description 
  
 = 
  
 "Get the current weather for a location." 
  
 @ 
 ToolParam 
 ( 
 description 
 : 
  
 "The city and state, e.g. San Francisco, CA" 
 ) 
  
 var 
  
 location 
 : 
  
 String 
  
 @ 
 ToolParam 
 ( 
 description 
 : 
  
 "The temperature unit to use (celsius or fahrenheit)" 
 ) 
  
 var 
  
 unit 
 : 
  
 String 
  
 = 
  
 "celsius" 
  
 func 
  
 run 
 () 
  
 async 
  
 throws 
  
 - 
>  
 Any 
  
 { 
  
 // Call your weather API here 
  
 return 
  
 [ 
  
 "location" 
 : 
  
 location 
 , 
  
 "temperature" 
 : 
  
 "22" 
 , 
  
 "unit" 
 : 
  
 unit 
 , 
  
 "condition" 
 : 
  
 "sunny" 
  
 ] 
  
 } 
 } 
 // 2. Register the tool in your conversation configuration 
 let 
  
 config 
  
 = 
  
 ConversationConfig 
 ( 
  
 tools 
 : 
  
 [ 
 GetCurrentWeatherTool 
 ()] 
 ) 
 let 
  
 conversation 
  
 = 
  
 try 
  
 await 
  
 engine 
 . 
 createConversation 
 ( 
 with 
 : 
  
 config 
 ) 
 // 3. The model will invoke the tool automatically if needed 
 let 
  
 response 
  
 = 
  
 try 
  
 await 
  
 conversation 
 . 
 sendMessage 
 ( 
 Message 
 ( 
 "What is the weather in Paris right now?" 
 )) 
 print 
 ( 
 response 
 . 
 toString 
 )