LiteRT-LM is the production-readyorchestration layer to run LLMs with LiteRT, engineered for high-performance, cross-platformexecution.
- Cross-Platform Support:Run on Android, iOS, Web, Desktop, and IoT (e.g. Raspberry Pi).
- Hardware Acceleration:Get peak performance and system stability by leveraging GPU and NPU accelerators across diverse hardware.
- Multi-Modality:Build with LLMs that have vision and audio support.
- Tool Use:Function calling support for agentic workflows with constrained decoding for improved accuracy.
- Broad Model Support:Run Gemma, Llama, Phi-4, Qwen and more.
What's New ( v0.12.0 )
- Swift APIs: Natively integrate LiteRT-LM into iOS applications with Metal GPU acceleration. See the Swift Guide .
- Web JavaScript APIs: Run models inside web browsers with high performance using web GPU/CPU. See the JavaScript Guide .
- LiteRT-LM CLI / Python API Update: The command-line interface and Python API now supports NPU, besides CPU and GPU backends across Linux, macOS, and Windows. See the CLI Guide .
- Community-Maintained Flutter APIs: Build cross-platform Flutter applications using the community flutter_gemma package. See the Flutter Guide .
On-Device GenAI Showcase
![]()
The Google AI Edge Gallery is an experimental app designed to showcase on-device Generative AI capabilities running entirely offline using LiteRT-LM.
- Google Play : Use LLMs locally on supported Android devices.
- App Store : Experience on-device AI on your iOS device.
- GitHub Source : View the source code for the gallery app to learn how to integrate LiteRT-LM inside your own projects.
Featured Model: Gemma-4-E2B
- Model Size: 2.58 GB
-
Additional technical details are in the HuggingFace model card
Platform (Device)BackendPrefill (tk/s)Decode (tk/s)Time to First Token (seconds)Peak CPU Memory (MB)Android (S26 Ultra)CPU557471.81733GPU3808520.3676iOS (iPhone 17 Pro)CPU532251.9607GPU2878560.31450Linux (Arm 2.3 & 2.8 GHz, NVIDIA GeForce RTX 4090)CPU2603541628GPU112341430.1913macOS (MacBook Pro M4)CPU901421.1736GPU78351600.11623Windows (Intel LunarLake)CPU435302.43505GPU3751480.33540IoT (Raspberry Pi 5 16GB)CPU13387.81546
Start Building
LiteRT-LM provides APIs for several programming languages and platforms to help you build on-device AI applications quickly. Select a guide below to get started:
| Language | Status | Best For... | Documentation |
|---|---|---|---|
|
CLI
|
✅ Stable |
Getting started with LiteRT-LM in less than 1 min. | CLI Guide |
|
Python
|
✅ Stable |
Rapid prototyping, development, on desktop & Raspberry Pi. | Python Guide |
|
Kotlin
|
✅ Stable |
Native Android apps and JVM-based desktop tools. Optimized for Coroutines. | Kotlin Guide |
|
Swift
|
🚀 Early Preview |
Native iOS and macOS integration with specialized Metal support. | Swift Guide |
|
JavaScript
(web)
|
🚀 Early Preview |
Deploy models directly in web browsers with high performance. | JavaScript Guide |
|
Flutter
|
🚀 Community |
Cross-platform
Flutter apps
using community flutter_gemma
. |
Flutter Guide |
|
C++
|
✅ Stable |
High-performance, cross-platform core logic and embedded systems. | C++ Guide |
Build from Source
If you want to customize LiteRT-LM or build it for a specific hardware configuration, you can compile it directly from the source code. For step-by-step instructions on how to set up your environment and build the framework, refer to the LiteRT-LM Build and Run Guide on GitHub.
Supported Backends & Platforms
| Acceleration | Android | iOS | macOS | Windows | Linux | IoT |
|---|---|---|---|---|---|---|
|
CPU
|
✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
GPU
|
✅ | ✅ | ✅ | ✅ | ✅ | - |
|
NPU
|
✅ | - | - | 🚀 | - | - |
Supported Models
The following table lists models supported by LiteRT-LM. For more detailed performance numbers and model cards, visit the LiteRT Community on Hugging Face .
| Model | Type | Size (MB) | Details | Device | CPU Prefill (tk/s) | CPU Decode (tk/s) | GPU Prefill (tk/s) | GPU Decode (tk/s) |
|---|---|---|---|---|---|---|---|---|
|
Gemma4-E2B
|
Chat | 2583 | Model Card | Samsung S26 Ultra | 557 | 47 | 3808 | 52 |
| |
iPhone 17 Pro | 532 | 25 | 2878 | 57 | |||
| |
MacBook Pro M4 | 901 | 42 | 7835 | 160 | |||
|
Gemma4-E4B
|
Chat | 3654 | Model Card | Samsung S26 Ultra | 195 | 18 | 1293 | 22 |
| |
iPhone 17 Pro | 159 | 10 | 1189 | 25 | |||
| |
MacBook Pro M4 | 277 | 27 | 2560 | 101 | |||
|
Gemma-3n-E2B
|
Chat | 2965 | Model Card | MacBook Pro M3 | 233 | 28 | - | - |
| |
Samsung S24 Ultra | 111 | 16 | 816 | 16 | |||
|
Gemma-3n-E4B
|
Chat | 4235 | Model Card | MacBook Pro M3 | 170 | 20 | - | - |
| |
Samsung S24 Ultra | 74 | 9 | 548 | 9 | |||
|
Gemma3-1B
|
Chat | 1005 | Model Card | Samsung S24 Ultra | 177 | 33 | 1191 | 24 |
|
FunctionGemma
|
Base | 289 | Model Card | Samsung S25 Ultra | 2238 | 154 | - | - |
|
phi-4-mini
|
Chat | 3906 | Model Card | Samsung S24 Ultra | 67 | 7 | 314 | 10 |
|
Qwen2.5-1.5B
|
Chat | 1598 | Model Card | Samsung S25 Ultra | 298 | 34 | 1668 | 31 |
|
Qwen3-0.6B
|
Chat | 586 | Model Card | Vivo X300 Pro | 165 | 9 | 580 | 21 |
|
Qwen2.5-0.5B
|
Chat | 521 | Model Card | Samsung S24 Ultra | 251 | 30 | - | - |
Report Issues
If you encounter a bug or have a feature request, report at LiteRT-LM GitHub Issues .

