LiteRT-LM Web API

The Web API of LiteRT-LM for JavaScript and TypeScriptin the browser. This is an early preview that supports text-in / text-out running in WebGPU.

Supported Models

The LiteRT-LM JS API currently supports a limited set of web-compatible models. We're working on expanding this to cover general .litertlm model files, but for now, the following models are supported:

Introduction

Here is a sample REPL chat app built with the JavaScript API:

 < div 
  
 id 
 = 
 "out" 
  
 style 
 = 
 "white-space: pre-wrap; font-family: monospace;" 
>< / 
 div 
>
< input 
  
 id 
 = 
 "in" 
  
 onkeydown 
 = 
 "if(event.key === 'Enter') repl(this)" 
>

< script 
  
 type 
 = 
 "module" 
>  
 import 
  
 { 
  
 Engine 
  
 } 
  
 from 
  
 'https://cdn.jsdelivr.net/npm/@litert-lm/core/+esm' 
 ; 
  
 const 
  
 engine 
  
 = 
  
 await 
  
 Engine 
 . 
 create 
 ({ 
  
 // Load the Gemma 4 E2B model 
  
 model 
 : 
  
 'https://huggingface.co/litert-community/gemma-4-E2B-it-litert-lm/resolve/main/gemma-4-E2B-it-web.litertlm' 
  
 // Or use the E4B model by swapping in this line 
  
 // model: 'https://huggingface.co/litert-community/gemma-4-E4B-it-litert-lm/resolve/main/gemma-4-E4B-it-web.litertlm' 
  
 }); 
  
 const 
  
 chat 
  
 = 
  
 await 
  
 engine 
 . 
 createConversation 
 (); 
  
 window 
 . 
 repl 
  
 = 
  
 async 
  
 ( 
 el 
 ) 
  
 = 
>  
 { 
  
 const 
  
 text 
  
 = 
  
 el 
 . 
 value 
 ; 
  
 el 
 . 
 value 
  
 = 
  
 '' 
 ; 
  
 // Clear immediately 
  
 out 
 . 
 append 
 ( 
 `\n>>> 
 ${ 
 text 
 } 
 \nAI: ` 
 ); 
  
 for 
  
 await 
  
 ( 
 const 
  
 chunk 
  
 of 
  
 chat 
 . 
 sendMessageStreaming 
 ( 
 text 
 )) 
  
 { 
  
 out 
 . 
 append 
 ( 
 chunk 
 . 
 content 
 [ 
 0 
 ]. 
 text 
 ); 
  
 } 
  
 }; 
< /script 
> 

Getting Started

LiteRT-LM is available as an npm package. You can install the latest version from npm or directly import it from a CDN:

  # From npm 
npm  
i  
--save  
@litert-lm/core # From a CDN (in your JavaScript file) 
import  
*  
as  
litertlm  
from  
 'https://cdn.jsdelivr.net/npm/@litert-lm/core/+esm' 
 ; 
 

Initialize the Engine

The Engine is the entry point to the API. It handles model loading, session creation, and resource management. Remember to delete the engine to release resources when the model is no longer needed.

Note:Initializing the engine can take several seconds to load the model.

  import 
  
 { 
 Engine 
 , 
  
 EngineSettings 
 } 
  
 from 
  
 '@litert-lm/core' 
 ; 
 const 
  
 engineSettings 
  
 = 
  
 { 
  
 model 
 : 
  
 'url/path/to/model.litertlm' 
 , 
  
 // or a ReadableStream, or a Blob 
  
 // You can configure context length and other settings here 
  
 mainExecutorSettings 
 : 
  
 { 
  
 maxNumTokens 
 : 
  
 8192 
 , 
  
 }, 
 } 
  
 satisfies 
  
 EngineSettings 
 ; 
 const 
  
 engine 
  
 = 
  
 await 
  
 Engine 
 . 
 create 
 ( 
 engineSettings 
 ); 
 // ... Use the engine to create a conversation ... 
 // Delete the engine when done. 
 await 
  
 engine 
 . 
 delete 
 (); 
 

Create a Conversation

Once the engine is initialized, create a Conversation instance. You can provide a ConversationConfig to customize its behavior.

  const 
  
 conversation 
  
 = 
  
 await 
  
 engine 
 . 
 createConversation 
 ({ 
  
 preface 
 : 
  
 { 
  
 messages 
 : 
  
 [ 
  
 { 
 role 
 : 
  
 'system' 
 , 
  
 content 
 : 
  
 'You are a helpful assistant' 
 } 
  
 ] 
  
 } 
 }); 
 conversation 
 . 
 sendMessage 
 ({ 
  
 role 
 : 
  
 'user' 
 , 
  
 content 
 : 
  
 'Write a poem' 
 , 
 }); 
 

Send Messages

You can send messages with or without streaming.

Non-Streaming Example

  // Simple string input 
 let 
  
 response 
  
 = 
  
 await 
  
 conversation 
 . 
 sendMessage 
 ( 
 "What is the capital of France?" 
 ); 
 console 
 . 
 log 
 ( 
 response 
 . 
 content 
 [ 
 0 
 ]. 
 text 
 ); 
 // Or with full message structure 
 response 
  
 = 
  
 await 
  
 conversation 
 . 
 sendMessage 
 ({ 
 role 
 : 
  
 'user' 
 , 
  
 content 
 : 
  
 '...' 
 }); 
 

Streaming Example

  // sendMessageStreaming returns a ReadableStream of response chunks 
 const 
  
 stream 
  
 = 
  
 conversation 
 . 
 sendMessageStreaming 
 ( 
 'Tell me a long story.' 
 ); 
 for 
  
 await 
  
 ( 
 const 
  
 chunk 
  
 of 
  
 stream 
 ) 
  
 { 
  
 // Chunks are Records containing pieces of the response 
  
 for 
  
 ( 
 const 
  
 item 
  
 of 
  
 chunk 
 . 
 content 
 ) 
  
 { 
  
 if 
  
 ( 
 item 
 . 
 type 
  
 === 
  
 'text' 
 ) 
  
 { 
  
 console 
 . 
 log 
 ( 
 item 
 . 
 text 
 ); 
  
 } 
  
 } 
 } 
 

Cancel Generation

You can cancel an ongoing generation explicitly by calling cancel() on the Conversation instance:

  // Cancel any ongoing generation 
 conversation 
 . 
 cancel 
 (); 
 

If you are streaming the response, exiting the for await...of loop early (such as with break ) will also automatically cancel the ongoing generation:

  for 
  
 await 
  
 ( 
 const 
  
 chunk 
  
 of 
  
 stream 
 ) 
  
 { 
  
 if 
  
 ( 
 shouldStop 
 ()) 
  
 { 
  
 break 
 ; 
  
 // Cancels the stream and underlying generation 
  
 } 
 } 
 
Create a Mobile Website
View Site in Mobile | Classic
Share by: