The Swift API of LiteRT-LM lets you integrate large language models natively into iOS and macOSapplications. Features like multi-modality, tool use, and GPU acceleration(via Metal) are fully supported.
Introduction
Here is an example of using the Swift API to initialize a model and send a message:
import
LiteRTLM
// 1. Initialize the Engine with your model
let
config
=
try
EngineConfig
(
modelPath
:
"path/to/model.litertlm"
,
backend
:
.
gpu
,
// Use .cpu() for CPU execution
cacheDir
:
NSTemporaryDirectory
()
)
let
engine
=
Engine
(
engineConfig
:
config
)
try
await
engine
.
initialize
()
// 2. Start a new Conversation
let
conversation
=
try
await
engine
.
createConversation
()
// 3. Send a message and print the response
let
response
=
try
await
conversation
.
sendMessage
(
Message
(
"What is the capital of France?"
))
print
(
response
.
toString
)
Getting Started
This section provides instructions on how to integrate the LiteRT-LM Swift API into your application.
Swift Package Manager (SPM)
You can integrate LiteRT-LM into your Xcode project using Swift Package Manager.
- Open your project in Xcode and navigate to File > Add Package Dependencies...
- Enter the package repository URL:
https://github.com/google-ai-edge/LiteRT-LM - Select the LiteRTLMlibrary to add it to your application target.
If you are developing a package using Package.swift
, add it to your
dependencies:
dependencies
:
[
.
package
(
url
:
"https://github.com/google-ai-edge/LiteRT-LM"
,
from
:
"0.12.0"
)
]
Core API Guide
This section details the fundamental components and workflows for using the LiteRT-LM Swift API, including engine initialization, conversation management, and sending messages.
Initialize the Engine
The Engine
handles model loading, resource allocation, and lifecycle
management.
import
LiteRTLM
let
engineConfig
=
try
EngineConfig
(
modelPath
:
"path/to/your/model.litertlm"
,
backend
:
.
gpu
,
// Use .gpu for Metal hardware acceleration
maxNumTokens
:
512
,
// Size of the KV-cache
cacheDir
:
NSTemporaryDirectory
()
// Writable directory for compilation cache
)
let
engine
=
Engine
(
engineConfig
:
engineConfig
)
try
await
engine
.
initialize
()
Create a Conversation
A Conversation
manages chat history, system instructions, and sampler
configurations.
// Configure custom sampling parameters
let
samplerConfig
=
try
SamplerConfig
(
topK
:
40
,
topP
:
0.95
,
temperature
:
0.7
)
// Create the conversation config with system instructions
let
config
=
ConversationConfig
(
systemMessage
:
Message
(
"You are a helpful assistant."
),
samplerConfig
:
samplerConfig
)
let
conversation
=
try
await
engine
.
createConversation
(
with
:
config
)
Send Messages
You can interact with the model synchronously or asynchronously (streaming).
Synchronous Example
let
response
=
try
await
conversation
.
sendMessage
(
Message
(
"Hello!"
))
print
(
response
.
toString
)
Asynchronous (Streaming) Example
let
message
=
Message
(
"Tell me a long story."
)
for
try
await
chunk
in
conversation
.
sendMessageStream
(
message
)
{
// Output response chunks in real-time
print
(
chunk
.
toString
,
terminator
:
""
)
}
print
()
Multi-Modality
To use vision or audio features, make sure to configure the specialized backends during engine initialization.
let
engineConfig
=
try
EngineConfig
(
modelPath
:
"path/to/multimodal_model.litertlm"
,
backend
:
.
gpu
,
visionBackend
:
.
cpu
(),
// Enable CPU vision executor
audioBackend
:
.
cpu
(),
// Enable CPU audio executor
cacheDir
:
NSTemporaryDirectory
()
)
let
engine
=
Engine
(
engineConfig
:
engineConfig
)
try
await
engine
.
initialize
()
Image Input (Vision)
Provide an image as a path or raw bytes:
let
imagePath
=
Bundle
.
main
.
path
(
forResource
:
"scenery"
,
ofType
:
"jpg"
)
!
let
message
=
Message
(
contents
:
[
Content
.
imageFile
(
imagePath
),
Content
.
text
(
"Describe this image."
)
])
let
response
=
try
await
conversation
.
sendMessage
(
message
)
print
(
response
.
toString
)
Audio Input
Provide an audio path:
let
audioPath
=
Bundle
.
main
.
path
(
forResource
:
"recording"
,
ofType
:
"wav"
)
!
let
message
=
Message
(
contents
:
[
Content
.
audioFile
(
audioPath
),
Content
.
text
(
"Transcribe this recording."
)
])
let
response
=
try
await
conversation
.
sendMessage
(
message
)
print
(
response
.
toString
)
🔴 New: Multi-Token Prediction (MTP)
Multi-Token Prediction (MTP) is a performance optimization that significantly accelerates decode speeds. It is universally recommended for all tasks using GPU/Metal backends.
To use MTP, enable speculative decoding in experimental flags before initializing the engine.
import
LiteRTLM
// Opt into experimental APIs to configure MTP
ExperimentalFlags
.
optIntoExperimentalAPIs
()
ExperimentalFlags
.
enableSpeculativeDecoding
=
true
let
engineConfig
=
try
EngineConfig
(
modelPath
:
"path/to/model.litertlm"
,
backend
:
.
gpu
,
cacheDir
:
NSTemporaryDirectory
()
)
let
engine
=
Engine
(
engineConfig
:
engineConfig
)
try
await
engine
.
initialize
()
Define and Use Tools
You can define Swift structures as tools that the model can automatically call to execute logic.
- Conform to the
Toolprotocol. - Declare parameters using the
@ToolParamproperty wrapper. - Implement the
run()method.
import
LiteRTLM
// 1. Define your custom tool
struct
GetCurrentWeatherTool
:
Tool
{
static
let
name
=
"get_current_weather"
static
let
description
=
"Get the current weather for a location."
@
ToolParam
(
description
:
"The city and state, e.g. San Francisco, CA"
)
var
location
:
String
@
ToolParam
(
description
:
"The temperature unit to use (celsius or fahrenheit)"
)
var
unit
:
String
=
"celsius"
func
run
()
async
throws
-
>
Any
{
// Call your weather API here
return
[
"location"
:
location
,
"temperature"
:
"22"
,
"unit"
:
unit
,
"condition"
:
"sunny"
]
}
}
// 2. Register the tool in your conversation configuration
let
config
=
ConversationConfig
(
tools
:
[
GetCurrentWeatherTool
()]
)
let
conversation
=
try
await
engine
.
createConversation
(
with
:
config
)
// 3. The model will invoke the tool automatically if needed
let
response
=
try
await
conversation
.
sendMessage
(
Message
(
"What is the weather in Paris right now?"
))
print
(
response
.
toString
)

