
The MediaPipe Language Detector task lets you identify the language of a piece of text. This task operates on text data with a machine learning (ML) model and outputs a list of predictions, where each prediction consists of an ISO 639-1 language code and a probability.
Get Started
Start using this task by following one of these implementation guides for your target platform. These platform-specific guides walk you through a basic implementation of this task, including a recommended model, and code example with recommended configuration options:
- Android- Code example - Guide
- Python- Code example - Guide
- Web- Code example - Guide
Task details
This section describes the capabilities, inputs, outputs, and configuration options of this task.
Features
- Score threshold- Filter results based on prediction scores
- Label allowlist and denylist- Specify the categories detected
- String
- Language code: An ISO 639-1 (https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) language / locale code (e.g. "en" for English, "uz" for Uzbek, "ja-Latn” for Japanese (romaji)) as a string.
- Probability: the confidence score for this prediction, expressed as a probability between zero and one as floating point value.
Configurations options
This task has the following configuration options:
| Option Name | Description | Value Range | Default Value |
|---|---|---|---|
max_results
|
Sets the optional maximum number of top-scored language predictions to return. If this value is less than zero, all available results are returned. | Any positive numbers | -1
|
score_threshold
|
Sets the prediction score threshold that overrides the one provided in the model metadata (if any). Results below this value are rejected. | Any float | Not set |
category_allowlist
|
Sets the optional list of allowed language codes. If non-empty,
language predictions whose language code is not in this set will be
filtered out. This option is mutually exclusive with category_denylist
and using both results in an error. |
Any strings | Not set |
category_denylist
|
Sets the optional list of language codes that are not allowed. If
non-empty, language predictions whose language code is in this set will be filtered
out. This option is mutually exclusive with category_allowlist
and
using both results in an error. |
Any strings | Not set |
Models
We offer a default, recommended model when you start developing with this task.
Language detector model (recommended)
This model is built to be lightweight (315 KB) and uses embedding-based, neural network classification architecture. The model identifies language using an ISO 639-1 language code, and can identify 110 languages. For a list of languages supported by the model, see the label file , which lists languages by their ISO 639-1 code.
| Model name | Input shape | Quantization type | Model card | Versions |
|---|---|---|---|---|
| string UTF-8 | none (float32) | info | Latest | |
Task benchmarks
Here's the task benchmarks for the whole pipeline based on the above pre-trained models. The latency result is the average latency on Pixel 6 using CPU / GPU.
| Model Name | CPU Latency | GPU Latency |
|---|---|---|
|
Language Detector
|
0.31ms | - |

