Send audio and video streams

This document describes how to send audio and video streams to the Live API for real-time, bidirectional communication with Gemini models. Learn how to configure and transmit audio and video data to build dynamic and interactive applications.

Send audio streams

Implementing real-time audio requires strict adherence to sample rate specifications and careful buffer management to ensure low latency and natural interruptibility.

The Live API supports the following audio formats:

Input audio: Raw 16-bit PCM audio at 16 kHz, little-endian
Output audio: Raw 16-bit PCM audio at 24 kHz, little-endian

The following code sample shows you how to send streaming audio data:

  import 
  
 asyncio 
 # Assumes session is an active Live API session 
 # and chunk_data contains bytes of raw 16-bit PCM audio at 16 kHz. 
 from 
  
 google.genai 
  
 import 
 types 
 # Send audio input data in chunks 
 await 
 session 
 . 
 send_realtime_input 
 ( 
 audio 
 = 
 types 
 . 
 Blob 
 ( 
 data 
 = 
 chunk_data 
 , 
 mime_type 
 = 
 "audio/pcm;rate=16000" 
 ) 
 )

The client must maintain a playback buffer. The server streams audio in chunks within server_content messages. The client's responsibility is to decode, buffer, and play the data.

The following code sample shows you how to process streaming audio data:

  import 
  
 asyncio 
 # Assumes session is an active Live API session 
 # and audio_queue is an asyncio.Queue for buffering audio for playback. 
 import 
  
 numpy 
  
 as 
  
 np 
 async 
 for 
 msg 
 in 
 session 
 . 
 receive 
 (): 
 server_content 
 = 
 msg 
 . 
 server_content 
 if 
 server_content 
 : 
 # 1. Handle Interruption 
 if 
 server_content 
 . 
 interrupted 
 : 
 print 
 ( 
 " 
 \n 
 [Interrupted] Flushing buffer..." 
 ) 
 # Clear the Python queue 
 while 
 not 
 audio_queue 
 . 
 empty 
 (): 
 try 
 : 
 audio_queue 
 . 
 get_nowait 
 () 
 except 
 asyncio 
 . 
 QueueEmpty 
 : 
 break 
 # Send signal to worker to reset hardware buffers if needed 
 await 
 audio_queue 
 . 
 put 
 ( 
 None 
 ) 
 continue 
 # 2. Process Audio chunks 
 if 
 server_content 
 . 
 model_turn 
 : 
 for 
 part 
 in 
 server_content 
 . 
 model_turn 
 . 
 parts 
 : 
 if 
 part 
 . 
 inline_data 
 : 
 # Add PCM data to playback queue 
 await 
 audio_queue 
 . 
 put 
 ( 
 np 
 . 
 frombuffer 
 ( 
 part 
 . 
 inline_data 
 . 
 data 
 , 
 dtype 
 = 
 'int16' 
 ))

Send video streams

Video streaming provides visual context. The Live API expects a sequence of discrete image frames and supports video frames input at 1 FPS. For best results, use native 768x768 resolution at 1 FPS.

The following code sample shows you how to send streaming video data:

  import 
  
 asyncio 
 # Assumes session is an active Live API session 
 # and chunk_data contains bytes of a JPEG image. 
 from 
  
 google.genai 
  
 import 
 types 
 # Send video input data in chunks 
 await 
 session 
 . 
 send_realtime_input 
 ( 
 media 
 = 
 types 
 . 
 Blob 
 ( 
 data 
 = 
 chunk_data 
 , 
 mime_type 
 = 
 "image/jpeg" 
 ) 
 )

The client implementation captures a frame from the video feed, encodes it as a JPEG blob, and transmits it using the realtime_input message structure.

  import 
  
 cv2 
 import 
  
 asyncio 
 from 
  
 google.genai 
  
 import 
 types 
 async 
 def 
  
 send_video_stream 
 ( 
 session 
 ): 
 # Open webcam 
 cap 
 = 
 cv2 
 . 
 VideoCapture 
 ( 
 0 
 ) 
 while 
 True 
 : 
 ret 
 , 
 frame 
 = 
 cap 
 . 
 read 
 () 
 if 
 not 
 ret 
 : 
 break 
 # 1. Resize to optimal resolution (768x768 max) 
 frame 
 = 
 cv2 
 . 
 resize 
 ( 
 frame 
 , 
 ( 
 768 
 , 
 768 
 )) 
 # 2. Encode as JPEG 
 _ 
 , 
 buffer 
 = 
 cv2 
 . 
 imencode 
 ( 
 '.jpg' 
 , 
 frame 
 ,) 
 # 3. Send as realtime input 
 await 
 session 
 . 
 send_realtime_input 
 ( 
 media 
 = 
 types 
 . 
 Blob 
 ( 
 data 
 = 
 buffer 
 . 
 tobytes 
 (), 
 mime_type 
 = 
 "image/jpeg" 
 ) 
 ) 
 # 4. Wait 1 second (1 FPS) 
 await 
 asyncio 
 . 
 sleep 
 ( 
 1.0 
 ) 
 cap 
 . 
 release 
 ()

Configure media resolution

You can specify the resolution for input media by setting the media_resolution field in the session configuration. Lower resolution reduces token usage and latency, while higher resolution improves detail recognition. Supported values include low , medium , and high .

  config 
 = 
 { 
 "response_modalities" 
 : 
 [ 
 "audio" 
 ], 
 "media_resolution" 
 : 
 "low" 
 , 
 }

Send audio and video streams Stay organized with collections Save and categorize content based on your preferences.

Send audio streams

Send video streams

Configure media resolution

What's next

Send audio and video streams