Introducing Google AI Edge Portal : Benchmark Edge AI at scale. Sign-up to request access during private preview.

Retrain a speech recognition model with TensorFlow Lite Model Maker

Copyright 2024 The AI Edge Authors.

Licensed under the Apache License, Version 2.0 (the "License");

  # 
  
 you 
  
 may 
  
 not 
  
 use 
  
 this 
  
 file 
  
 except 
  
 in 
  
 compliance 
  
 with 
  
 the 
  
 License 
 . 
 # 
  
 You 
  
 may 
  
 obtain 
  
 a 
  
 copy 
  
 of 
  
 the 
  
 License 
  
 at 
 # 
 # 
  
 https 
 : 
 //www.apache.org/licenses/LICENSE-2.0 
 # 
 # 
  
 Unless 
  
 required 
  
 by 
  
 applicable 
  
 law 
  
 or 
  
 agreed 
  
 to 
  
 in 
  
 writing 
 , 
  
 software 
 # 
  
 distributed 
  
 under 
  
 the 
  
 License 
  
 is 
  
 distributed 
  
 on 
  
 an 
  
 "AS IS" 
  
 BASIS 
 , 
 # 
  
 WITHOUT 
  
 WARRANTIES 
  
 OR 
  
 CONDITIONS 
  
 OF 
  
 ANY 
  
 KIND 
 , 
  
 either 
  
 express 
  
 or 
  
 implied 
 . 
 # 
  
 See 
  
 the 
  
 License 
  
 for 
  
 the 
  
 specific 
  
 language 
  
 governing 
  
 permissions 
  
 and 
 # 
  
 limitations 
  
 under 
  
 the 
  
 License 
 .

Run in Google Colab

View source on GitHub

Download notebook

In this colab notebook, you'll learn how to use the TensorFlow Lite Model Maker to train a speech recognition model that can classify spoken words or short phrases using one-second sound samples. The Model Maker library uses transfer learning to retrain an existing TensorFlow model with a new dataset, which reduces the amount of sample data and time required for training.

By default, this notebook retrains the model (BrowserFft, from the TFJS Speech Command Recognizer ) using a subset of words from the speech commands dataset (such as "up," "down," "left," and "right"). Then it exports a TFLite model that you can run on a mobile device or embedded system (such as a Raspberry Pi). It also exports the trained model as a TensorFlow SavedModel.

This notebook is also designed to accept a custom dataset of WAV files, uploaded to Colab in a ZIP file. The more samples you have for each class, the better your accuracy will be, but because the transfer learning process uses feature embeddings from the pre-trained model, you can still get a fairly accurate model with only a few dozen samples in each of your classes.

If you want to run the notebook with the default speech dataset, you can run the whole thing now by clicking Runtime > Run allin the Colab toolbar. However, if you want to use your own dataset, then continue down to Prepare the dataset and follow the instructions there.

Import the required packages

You'll need TensorFlow, TFLite Model Maker, and some modules for audio manipulation, playback, and visualizations.

 sudo  
apt  
-y  
install  
libportaudio2 
 pip  
install  
tflite-model-maker

  import 
  
 os 
 import 
  
 glob 
 import 
  
 random 
 import 
  
 shutil 
 import 
  
 librosa 
 import 
  
 soundfile 
  
 as 
  
 sf 
 from 
  
 IPython.display 
  
 import 
 Audio 
 import 
  
 numpy 
  
 as 
  
 np 
 import 
  
 matplotlib.pyplot 
  
 as 
  
 plt 
 import 
  
 seaborn 
  
 as 
  
 sns 
 import 
  
 tensorflow 
  
 as 
  
 tf 
 import 
  
 tflite_model_maker 
  
 as 
  
 mm 
 from 
  
 tflite_model_maker 
  
 import 
 audio_classifier 
 from 
  
 tflite_model_maker.config 
  
 import 
 ExportFormat 
 print 
 ( 
 f 
 "TensorFlow Version: 
 { 
 tf 
 . 
 __version__ 
 } 
 " 
 ) 
 print 
 ( 
 f 
 "Model Maker Version: 
 { 
 mm 
 . 
 __version__ 
 } 
 " 
 )

Prepare the dataset

To train with the default speech dataset, just run all the code below as-is.

But if you want to train with your own speech dataset, follow these steps:

Be sure each sample in your dataset is in WAV file format, about one second long. Then create a ZIP file with all your WAV files, organized into separate subfolders for each classification. For example, each sample for a speech command "yes" should be in a subfolder named "yes". Even if you have only one class, the samples must be saved in a subdirectory with the class name as the directory name. (This script assumes your dataset is not splitinto train/validation/test sets and performs that split for you.)
Click the Filestab in the left panel and just drag-drop your ZIP file there to upload it.
Use the following drop-down option to set use_custom_dataset to True.
Then skip to Prepare a custom audio dataset to specify your ZIP filename and dataset directory name.

 use_custom_dataset = False

Generate a background noise dataset

Whether you're using the default speech dataset or a custom dataset, you should have a good set of background noises so your model can distinguish speech from other noises (including silence).

Because the following background samples are provided in WAV files that are a minute long or longer, we need to split them up into smaller one-second samples so we can reserve some for our test dataset. We'll also combine a couple different sample sources to build a comprehensive set of background noises and silence:

  tf 
 . 
 keras 
 . 
 utils 
 . 
 get_file 
 ( 
 'speech_commands_v0.01.tar.gz' 
 , 
  
 'http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz' 
 , 
  
 cache_dir 
 = 
 './' 
 , 
  
 cache_subdir 
 = 
 'dataset-speech' 
 , 
  
 extract 
 = 
 True 
 ) 
 tf 
 . 
 keras 
 . 
 utils 
 . 
 get_file 
 ( 
 'background_audio.zip' 
 , 
  
 'https://storage.googleapis.com/download.tensorflow.org/models/tflite/sound_classification/background_audio.zip' 
 , 
  
 cache_dir 
 = 
 './' 
 , 
  
 cache_subdir 
 = 
 'dataset-background' 
 , 
  
 extract 
 = 
 True 
 )

  # 
Create a list of all the background wav files
files = glob.glob(os.path.join('./dataset-speech/_background_noise_', '*.wav'))
files = files + glob.glob(os.path.join('./dataset-background', '*.wav'))

background_dir = './background'
os.makedirs(background_dir, exist_ok=True) # 
Loop through all files and split each into several one-second wav files
for file in files:
  filename = os.path.basename(os.path.normpath(file))
  print('Splitting', filename)
  name = os.path.splitext(filename)[0]
  rate = librosa.get_samplerate(file)
  length = round(librosa.get_duration(filename=file))
  for i in range(length - 1):
    start = i * rate 
 stop = (i * 
rate) + rate
    data, _ = sf.read(file, start=start, stop=stop)
    sf.write(os.path.join(background_dir, name + str(i) + '.wav'), data, rate)

Prepare the speech commands dataset

We already downloaded the speech commands dataset, so now we just need to prune the number of classes for our model.

This dataset includes over 30 speech command classifications, and most of them have over 2,000 samples. But because we're using transfer learning, we don't need that many samples. So the following code does a few things:

Specify which classifications we want to use, and delete the rest.
Keep only 150 samples of each class for training (to prove that transfer learning works well with smaller datasets and simply to reduce the training time).
Create a separate directory for a test dataset so we can easily run inference with them later.

  if 
  
 not 
  
 use_custom_dataset 
 : 
  
 commands 
  
 = 
  
 [ 
 "up", "down", "left", "right", "go", "stop", "on", "off", "background" 
 ] 
  
 dataset_dir 
  
 = 
  
 './dataset-speech' 
  
 test_dir 
  
 = 
  
 './dataset-test' 
  
 # 
  
 Move 
  
 the 
  
 processed 
  
 background 
  
 samples 
  
 shutil 
 . 
 move 
 ( 
 background_dir 
 , 
  
 os 
 . 
 path 
 . 
 join 
 ( 
 dataset_dir 
 , 
  
 'background' 
 )) 
  
  
 # 
  
 Delete 
  
 all 
  
 directories 
  
 that 
  
 are 
  
 not 
  
 in 
  
 our 
  
 commands 
  
 list 
  
 dirs 
  
 = 
  
 glob 
 . 
 glob 
 ( 
 os 
 . 
 path 
 . 
 join 
 ( 
 dataset_dir 
 , 
  
 '*/' 
 )) 
  
 for 
  
 dir 
  
 in 
  
 dirs 
 : 
  
 name 
  
 = 
  
 os 
 . 
 path 
 . 
 basename 
 ( 
 os 
 . 
 path 
 . 
 normpath 
 ( 
 dir 
 )) 
  
 if 
  
 name 
  
 not 
  
 in 
  
 commands 
 : 
  
 shutil 
 . 
 rmtree 
 ( 
 dir 
 ) 
  
 # 
  
 Count 
  
 is 
  
 per 
  
 class 
  
 sample_count 
  
 = 
  
 150 
  
 test_data_ratio 
  
 = 
  
 0.2 
  
 test_count 
  
 = 
  
 round 
 ( 
 sample_count 
  
 * 
  
 test_data_ratio 
 ) 
  
 # 
  
 Loop 
  
 through 
  
 child 
  
 directories 
  
 ( 
 each 
  
 class 
  
 of 
  
 wav 
  
 files 
 ) 
  
 dirs 
  
 = 
  
 glob 
 . 
 glob 
 ( 
 os 
 . 
 path 
 . 
 join 
 ( 
 dataset_dir 
 , 
  
 '*/' 
 )) 
  
 for 
  
 dir 
  
 in 
  
 dirs 
 : 
  
 files 
  
 = 
  
 glob 
 . 
 glob 
 ( 
 os 
 . 
 path 
 . 
 join 
 ( 
 dir 
 , 
  
 '*.wav' 
 )) 
  
 random 
 . 
 seed 
 ( 
 42 
 ) 
  
 random 
 . 
 shuffle 
 ( 
 files 
 ) 
  
 # 
  
 Move 
  
 test 
  
 samples 
 : 
  
 for 
  
 file 
  
 in 
  
 files 
 [ 
 sample_count:sample_count + test_count 
 ] 
 : 
  
 class_dir 
  
 = 
  
 os 
 . 
 path 
 . 
 basename 
 ( 
 os 
 . 
 path 
 . 
 normpath 
 ( 
 dir 
 )) 
  
 os 
 . 
 makedirs 
 ( 
 os 
 . 
 path 
 . 
 join 
 ( 
 test_dir 
 , 
  
 class_dir 
 ), 
  
 exist_ok 
 = 
 True 
 ) 
  
 os 
 . 
 rename 
 ( 
 file 
 , 
  
 os 
 . 
 path 
 . 
 join 
 ( 
 test_dir 
 , 
  
 class_dir 
 , 
  
 os 
 . 
 path 
 . 
 basename 
 ( 
 file 
 ))) 
  
 # 
  
 Delete 
  
 remaining 
  
 samples 
  
 for 
  
 file 
  
 in 
  
 files 
 [ 
 sample_count + test_count: 
 ] 
 : 
  
 os 
 . 
 remove 
 ( 
 file 
 )

Prepare a custom dataset

If you want to train the model with our own speech dataset, you need to upload your samples as WAV files in a ZIP ( as described above ) and modify the following variables to specify your dataset:

  if 
  
 use_custom_dataset 
 : 
  
 # Specify the ZIP file you uploaded: 
  
 ! 
 unzip 
  
 YOUR 
 - 
 FILENAME 
 . 
 zip 
  
 # Specify the unzipped path to your custom dataset 
  
 # (this path contains all the subfolders with classification names): 
  
 dataset_dir 
  
 = 
  
 './YOUR-DIRNAME'

After changing the filename and path name above, you're ready to train the model with your custom dataset. In the Colab toolbar, select Runtime > Run allto run the whole notebook.

The following code integrates our new background noise samples into your dataset and then separates a portion of all samples to create a test set.

 def move_background_dataset(dataset_dir):
  dest_dir = os.path.join(dataset_dir, 'background')
  if os.path.exists(dest_dir):
    files = glob.glob(os.path.join(background_dir, '*.wav'))
    for file in files:
      shutil.move(file, dest_dir)
  else:
    shutil.move(background_dir, dest_dir)

 if use_custom_dataset:
  # Move background samples into custom dataset
  move_background_dataset(dataset_dir)

  # Now we separate some of the files that we'll use for testing:
  test_dir = './dataset-test'
  test_data_ratio = 0.2
  dirs = glob.glob(os.path.join(dataset_dir, '*/'))
  for dir in dirs:
    files = glob.glob(os.path.join(dir, '*.wav'))
    test_count = round(len(files) * test_data_ratio)
    random.seed(42)
    random.shuffle(files)
    # Move test samples:
    for file in files[:test_count]:
      class_dir = os.path.basename(os.path.normpath(dir))
      os.makedirs(os.path.join(test_dir, class_dir), exist_ok=True)
      os.rename(file, os.path.join(test_dir, class_dir, os.path.basename(file)))
    print('Moved', test_count, 'images from', class_dir)

Play a sample

To be sure the dataset looks correct, let's play at a random sample from the test set:

  def 
  
 get_random_audio_file 
 ( 
 samples_dir 
 ): 
  
 files 
  
 = 
  
 os 
 . 
 path 
 . 
 abspath 
 ( 
 os 
 . 
 path 
 . 
 join 
 ( 
 samples_dir 
 , 
  
 '*/*.wav' 
 )) 
  
 files_list 
  
 = 
  
 glob 
 . 
 glob 
 ( 
 files 
 ) 
  
 random_audio_path 
  
 = 
  
 random 
 . 
 choice 
 ( 
 files_list 
 ) 
  
 return 
  
 random_audio_path 
 def 
  
 show_sample 
 ( 
 audio_path 
 ): 
  
 audio_data 
 , 
  
 sample_rate 
  
 = 
  
 sf 
 . 
 read 
 ( 
 audio_path 
 ) 
  
 class_name 
  
 = 
  
 os 
 . 
 path 
 . 
 basename 
 ( 
 os 
 . 
 path 
 . 
 dirname 
 ( 
 audio_path 
 )) 
  
 print 
 ( 
 f 
 'Class: {class_name}' 
 ) 
  
 print 
 ( 
 f 
 'File: {audio_path}' 
 ) 
  
 print 
 ( 
 f 
 'Sample rate: {sample_rate}' 
 ) 
  
 print 
 ( 
 f 
 'Sample length: {len(audio_data)}' 
 ) 
  
 plt 
 . 
 title 
 ( 
 class_name 
 ) 
  
 plt 
 . 
 plot 
 ( 
 audio_data 
 ) 
  
 display 
 ( 
 Audio 
 ( 
 audio_data 
 , 
  
 rate 
 = 
 sample_rate 
 ))

 random_audio = get_random_audio_file(test_dir)
show_sample(random_audio)

Define the model

When using Model Maker to retrain any model, you have to start by defining a model spec. The spec defines the base model from which your new model will extract feature embeddings to begin learning new classes. The spec for this speech recognizer is based on the pre-trained BrowserFft model from TFJS .

The model expects input as an audio sample that's 44.1 kHz, and just under a second long: the exact sample length must be 44034 frames.

You don't need to do any resampling with your training dataset. Model Maker takes care of that for you. But when you later run inference, you must be sure that your input matches that expected format.

All you need to do here is instantiate the BrowserFftSpec :

 spec = audio_classifier.BrowserFftSpec()

Load your dataset

Now you need to load your dataset according to the model specifications. Model Maker includes the DataLoader API, which will load your dataset from a folder and ensure it's in the expected format for the model spec.

We already reserved some test files by moving them to a separate directory, which makes it easier to run inference with them later. Now we'll create a DataLoader for each split: the training set, the validation set, and the test set.

Load the speech commands dataset

 if not use_custom_dataset:
  train_data_ratio = 0.8
  train_data = audio_classifier.DataLoader.from_folder(
      spec, dataset_dir, cache=True)
  train_data, validation_data = train_data.split(train_data_ratio)
  test_data = audio_classifier.DataLoader.from_folder(
      spec, test_dir, cache=True)

Load a custom dataset

 if use_custom_dataset:
  train_data_ratio = 0.8
  train_data = audio_classifier.DataLoader.from_folder(
      spec, dataset_dir, cache=True)
  train_data, validation_data = train_data.split(train_data_ratio)
  test_data = audio_classifier.DataLoader.from_folder(
      spec, test_dir, cache=True)

Train the model

Now we'll use the Model Maker create() function to create a model based on our model spec and training dataset, and begin training.

If you're using a custom dataset, you might want to change the batch size as appropriate for the number of samples in your train set.

  # 
If your dataset has fewer than 100 samples per class, # 
you might want to try a smaller batch size
batch_size = 25
epochs = 25
model = audio_classifier.create(train_data, spec, validation_data, batch_size, epochs)

Review the model performance

Even if the accuracy/loss looks good from the training output above, it's important to also run the model using test data that the model has not seen yet, which is what the evaluate() method does here:

 model.evaluate(test_data)

View the confusion matrix

When training a classification model such as this one, it's also useful to inspect the confusion matrix . The confusion matrix gives you detailed visual representation of how well your classifier performs for each classification in your test data.

  def 
  
 show_confusion_matrix 
 ( 
 confusion 
 , 
  
 test_labels 
 ): 
  
 """Compute confusion matrix and normalize.""" 
  
 confusion_normalized 
  
 = 
  
 confusion 
 . 
 astype 
 ( 
 "float" 
 ) 
  
 / 
  
 confusion 
 . 
 sum 
 ( 
 axis 
 = 
 1 
 ) 
  
 sns 
 . 
 set 
 ( 
 rc 
  
 = 
  
 { 
 ' 
 figure 
 . 
 figsize 
 ' 
 :( 
 6 
 , 
 6 
 )}) 
  
 sns 
 . 
 heatmap 
 ( 
  
 confusion_normalized 
 , 
  
 xticklabels 
 = 
 test_labels 
 , 
  
 yticklabels 
 = 
 test_labels 
 , 
  
 cmap 
 = 
 ' 
 Blues 
 ' 
 , 
  
 annot 
 = 
 True 
 , 
  
 fmt 
 = 
 ' 
 .2 
 f 
 ' 
 , 
  
 square 
 = 
 True 
 , 
  
 cbar 
 = 
 False 
 ) 
  
 plt 
 . 
 title 
 ( 
 "Confusion matrix" 
 ) 
  
 plt 
 . 
 ylabel 
 ( 
 "True label" 
 ) 
  
 plt 
 . 
 xlabel 
 ( 
 "Predicted label" 
 ) 
 confusion_matrix 
  
 = 
  
 model 
 . 
 confusion_matrix 
 ( 
 test_data 
 ) 
 show_confusion_matrix 
 ( 
 confusion_matrix 
 . 
 numpy 
 (), 
  
 test_data 
 . 
 index_to_label 
 )

Export the model

The last step is exporting your model into the TensorFlow Lite format for execution on mobile/embedded devices and into the SavedModel format for execution elsewhere.

When exporting a .tflite file from Model Maker, it includes model metadata that describes various details that can later help during inference. It even includes a copy of the classification labels file, so you don't need to a separate labels.txt file. (In the next section, we show how to use this metadata to run an inference.)

 TFLITE_FILENAME = 'browserfft-speech.tflite'
SAVE_PATH = './models'

  print 
 ( 
 f 
 'Exporing the model to {SAVE_PATH}' 
 ) 
 model 
 . 
 export 
 ( 
 SAVE_PATH 
 , 
  
 tflite_filename 
 = 
 TFLITE_FILENAME 
 ) 
 model 
 . 
 export 
 ( 
 SAVE_PATH 
 , 
  
 export_format 
 = 
 [ 
 mm 
 . 
 ExportFormat 
 . 
 SAVED_MODEL 
 , 
  
 mm 
 . 
 ExportFormat 
 . 
 LABEL 
 ])

Run inference with TF Lite model

Now your TFLite model can be deployed and run using any of the supported inferencing libraries or with the new TFLite AudioClassifier Task API . The following code shows how you can run inference with the .tflite model in Python.

  # This library provides the TFLite metadata API 
 
   
pip  
install  
-q  
tflite_support

  from 
  
 tflite_support 
  
 import 
 metadata 
 import 
  
 json 
 def 
  
 get_labels 
 ( 
 model 
 ): 
  
 """Returns a list of labels, extracted from the model metadata.""" 
 displayer 
 = 
 metadata 
 . 
 MetadataDisplayer 
 . 
 with_model_file 
 ( 
 model 
 ) 
 labels_file 
 = 
 displayer 
 . 
 get_packed_associated_file_list 
 ()[ 
 0 
 ] 
 labels 
 = 
 displayer 
 . 
 get_associated_file_buffer 
 ( 
 labels_file 
 ) 
 . 
 decode 
 () 
 return 
 [ 
 line 
 for 
 line 
 in 
 labels 
 . 
 split 
 ( 
 ' 
 \n 
 ' 
 )] 
 def 
  
 get_input_sample_rate 
 ( 
 model 
 ): 
  
 """Returns the model's expected sample rate, from the model metadata.""" 
 displayer 
 = 
 metadata 
 . 
 MetadataDisplayer 
 . 
 with_model_file 
 ( 
 model 
 ) 
 metadata_json 
 = 
 json 
 . 
 loads 
 ( 
 displayer 
 . 
 get_metadata_json 
 ()) 
 input_tensor_metadata 
 = 
 metadata_json 
 [ 
 'subgraph_metadata' 
 ][ 
 0 
 ][ 
 'input_tensor_metadata' 
 ][ 
 0 
 ] 
 input_content_props 
 = 
 input_tensor_metadata 
 [ 
 'content' 
 ][ 
 'content_properties' 
 ] 
 return 
 input_content_props 
 [ 
 'sample_rate' 
 ]

To observe how well the model performs with real samples, run the following code block over and over. Each time, it will fetch a new test sample and run inference with it, and you can listen to the audio sample below.

  # 
  
 Get 
  
 a 
  
 WAV 
  
 file 
  
 for 
  
 inference 
  
 and 
  
 list 
  
 of 
  
 labels 
  
 from 
  
 the 
  
 model 
 tflite_file 
  
 = 
  
 os 
 . 
 path 
 . 
 join 
 ( 
 SAVE_PATH 
 , 
  
 TFLITE_FILENAME 
 ) 
 labels 
  
 = 
  
 get_labels 
 ( 
 tflite_file 
 ) 
 random_audio 
  
 = 
  
 get_random_audio_file 
 ( 
 test_dir 
 ) 
 # 
  
 Ensure 
  
 the 
  
 audio 
  
 sample 
  
 fits 
  
 the 
  
 model 
  
 input 
 interpreter 
  
 = 
  
 tf 
 . 
 lite 
 . 
 Interpreter 
 ( 
 tflite_file 
 ) 
 input_details 
  
 = 
  
 interpreter 
 . 
 get_input_details 
 () 
 output_details 
  
 = 
  
 interpreter 
 . 
 get_output_details 
 () 
 input_size 
  
 = 
  
 input_details 
 [ 
 0 
 ][ 
 'shape' 
 ][ 
 1 
 ] 
 sample_rate 
  
 = 
  
 get_input_sample_rate 
 ( 
 tflite_file 
 ) 
 audio_data 
 , 
  
 _ 
  
 = 
  
 librosa 
 . 
 load 
 ( 
 random_audio 
 , 
  
 sr 
 = 
 sample_rate 
 ) 
 if 
  
 len 
 ( 
 audio_data 
 ) 
 < 
 input_size 
 : 
  
 audio_data 
 . 
 resize 
 ( 
 input_size 
 ) 
 audio_data 
  
 = 
  
 np 
 . 
 expand_dims 
 ( 
 audio_data 
 [ 
 :input_size 
 ] 
 , 
  
 axis 
 = 
 0 
 ) 
 # 
  
 Run 
  
 inference 
 interpreter 
 . 
 allocate_tensors 
 () 
 interpreter 
 . 
 set_tensor 
 ( 
 input_details 
 [ 
 0 
 ][ 
 'index' 
 ] 
 , 
  
 audio_data 
 ) 
 interpreter 
 . 
 invoke 
 () 
 output_data 
  
 = 
  
 interpreter 
 . 
 get_tensor 
 ( 
 output_details 
 [ 
 0 
 ][ 
 'index' 
 ] 
 ) 
 # 
  
 Display 
  
 prediction 
  
 and 
  
 ground 
  
 truth 
 top_index 
  
 = 
  
 np 
 . 
 argmax 
 ( 
 output_data 
 [ 
 0 
 ] 
 ) 
 label 
  
 = 
  
 labels 
 [ 
 top_index 
 ] 
 score 
  
 = 
  
 output_data 
 [ 
 0 
 ][ 
 top_index 
 ] 
 print 
 ( 
 '---prediction---' 
 ) 
 print 
 ( 
 f 
 'Class: {label}\nScore: {score}' 
 ) 
 print 
 ( 
 '----truth----' 
 ) 
 show_sample 
 ( 
 random_audio 
 )

Download the TF Lite model

Now you can deploy the TF Lite model to your mobile or embedded device. You don't need to download the labels file because you can instead retrieve the labels from .tflite file metadata, as shown in the previous inferencing example.

  try 
 : 
 from 
  
 google.colab 
  
 import 
 files 
 except 
 ImportError 
 : 
 pass 
 else 
 : 
 files 
 . 
 download 
 ( 
 tflite_file 
 )

Check out our end-to-end example apps that perform inferencing with TFLite audio models on Android and iOS .

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-05-28 UTC.

Create a Mobile Website

View Site in Mobile | Classic

Share by: