Identify the language of text with ML Kit on Android

You can use ML Kit to identify the language of a string of text. You can get the string's most likely language as well as confidence scores for all of the string's possible languages.

ML Kit recognizes text in more than 100 different languages in their native scripts. In addition, romanized text can be recognized for Arabic, Bulgarian, Chinese, Greek, Hindi, Japanese, and Russian. See the complete list of supported languages and scripts.

Bundled Unbundled
Library name
com.google.mlkit:language-id com.google.android.gms:play-services-mlkit-language-id
Implementation
Model is statically linked to your app at build time. Model is dynamically downloaded via Google Play Services.
App size impact
About 900 KB size increase. About 200 KB size increase.
Initialization time
Model is available immediately. Might have to wait for model to download before first use.

Try it out

  • Play around with the sample app to see an example usage of this API.

Before you begin

  1. In your project-level build.gradle file, make sure to include Google's Maven repository in both your buildscript and allprojects sections.

  2. Add the dependencies for the ML Kit Android libraries to your module's app-level gradle file, which is usually app/build.gradle . Choose one of the following dependencies based on your needs:

    For bundling the model with your app:

      dependencies 
      
     { 
      
     // ... 
      
     // Use this dependency to bundle the model with your app 
      
     implementation 
      
     ' 
     com 
     . 
     google 
     . 
     mlkit 
     : 
     language 
     - 
     id 
     : 
     17.0.6 
     ' 
     } 
     
    

    For using the model in Google Play Services:

      dependencies 
      
     { 
      
     // 
      
     ... 
      
     // 
      
     Use 
      
     this 
      
     dependency 
      
     to 
      
     use 
      
     the 
      
     dynamically 
      
     downloaded 
      
     model 
      
     in 
      
     Google 
      
     Play 
      
     Services 
      
     implementation 
      
     'com.google.android.gms:play-services-mlkit-language-id:17.0.0' 
     } 
     
    
  3. If you choose to use the model in Google Play Services, you can configure your app to automatically download the model to the device after your app is installed from the Play Store. To do so, add the following declaration to your app's AndroidManifest.xml file:

     < application 
     ... 
    > ... 
    < meta 
     - 
     data 
     android 
     : 
     name 
     = 
     "com.google.mlkit.vision.DEPENDENCIES" 
     android 
     : 
     value 
     = 
     "langid" 
    >
          < ! 
     -- 
     To 
     use 
     multiple 
     models 
     : 
     android 
     : 
     value 
     = 
     "langid,model2,model3" 
     -- 
    >
    < / 
     application 
    > 
    

    You can also explicitly check the model availability and request download through Google Play services ModuleInstallClient API .

    If you don't enable install-time model downloads or request explicit download, the model is downloaded the first time you run the identifier. Requests you make before the download has completed produce no results.

Identify the language of a string

To identify the language of a string, call LanguageIdentification.getClient() to get an instance of LanguageIdentifier , and then pass the string to the identifyLanguage() method of LanguageIdentifier .

For example:

Kotlin

 val 
  
 languageIdentifier 
  
 = 
  
 LanguageIdentification 
 . 
 getClient 
 () 
 languageIdentifier 
 . 
 identifyLanguage 
 ( 
 text 
 ) 
  
 . 
 addOnSuccessListener 
  
 { 
  
 languageCode 
  
 - 
>  
 if 
  
 ( 
 languageCode 
  
 == 
  
 "und" 
 ) 
  
 { 
  
 Log 
 . 
 i 
 ( 
 TAG 
 , 
  
 "Can't identify language." 
 ) 
  
 } 
  
 else 
  
 { 
  
 Log 
 . 
 i 
 ( 
 TAG 
 , 
  
 "Language: 
 $ 
 languageCode 
 " 
 ) 
  
 } 
  
 } 
  
 . 
 addOnFailureListener 
  
 { 
  
 // Model couldn’t be loaded or other internal error. 
  
 // ... 
  
 } 
  

Java

 LanguageIdentifier 
  
 languageIdentifier 
  
 = 
  
 LanguageIdentification 
 . 
 getClient 
 (); 
 languageIdentifier 
 . 
 identifyLanguage 
 ( 
 text 
 ) 
  
 . 
 addOnSuccessListener 
 ( 
  
 new 
  
 OnSuccessListener<String> 
 () 
  
 { 
  
 @Override 
  
 public 
  
 void 
  
 onSuccess 
 ( 
 @Nullable 
  
 String 
  
 languageCode 
 ) 
  
 { 
  
 if 
  
 ( 
 languageCode 
 . 
 equals 
 ( 
 "und" 
 )) 
  
 { 
  
 Log 
 . 
 i 
 ( 
 TAG 
 , 
  
 "Can't identify language." 
 ); 
  
 } 
  
 else 
  
 { 
  
 Log 
 . 
 i 
 ( 
 TAG 
 , 
  
 "Language: " 
  
 + 
  
 languageCode 
 ); 
  
 } 
  
 } 
  
 }) 
  
 . 
 addOnFailureListener 
 ( 
  
 new 
  
 OnFailureListener 
 () 
  
 { 
  
 @Override 
  
 public 
  
 void 
  
 onFailure 
 ( 
 @NonNull 
  
 Exception 
  
 e 
 ) 
  
 { 
  
 // Model couldn’t be loaded or other internal error. 
  
 // ... 
  
 } 
  
 }); 
  

If the call succeeds, a BCP-47 language code is passed to the success listener, indicating the language of the text. If no language is confidently detected, the code und (undetermined) is passed.

By default, ML Kit returns a value other than und only when it identifies the language with a confidence value of at least 0.5. You can change this threshold by passing a LanguageIdentificationOptions object to getClient() :

Kotlin

 val 
  
 languageIdentifier 
  
 = 
  
 LanguageIdentification 
  
 . 
 getClient 
 ( 
 LanguageIdentificationOptions 
 . 
 Builder 
 () 
  
 . 
 setConfidenceThreshold 
 ( 
 0.34f 
 ) 
  
 . 
 build 
 ()) 
  

Java

 LanguageIdentifier 
  
 languageIdentifier 
  
 = 
  
 LanguageIdentification 
 . 
 getClient 
 ( 
  
 new 
  
 LanguageIdentificationOptions 
 . 
 Builder 
 () 
  
 . 
 setConfidenceThreshold 
 ( 
 0.34f 
 ) 
  
 . 
 build 
 ()); 
  

Get the possible languages of a string

To get the confidence values of a string's most likely languages, get an instance of LanguageIdentifier and then pass the string to the identifyPossibleLanguages() method.

For example:

Kotlin

 val 
  
 languageIdentifier 
  
 = 
  
 LanguageIdentification 
 . 
 getClient 
 () 
 languageIdentifier 
 . 
 identifyPossibleLanguages 
 ( 
 text 
 ) 
  
 . 
 addOnSuccessListener 
  
 { 
  
 identifiedLanguages 
  
 - 
>  
 for 
  
 ( 
 identifiedLanguage 
  
 in 
  
 identifiedLanguages 
 ) 
  
 { 
  
 val 
  
 language 
  
 = 
  
 identifiedLanguage 
 . 
 languageTag 
  
 val 
  
 confidence 
  
 = 
  
 identifiedLanguage 
 . 
 confidence 
  
 Log 
 . 
 i 
 ( 
 TAG 
 , 
  
 " 
 $ 
 language 
  
 $ 
 confidence 
 " 
 ) 
  
 } 
  
 } 
  
 . 
 addOnFailureListener 
  
 { 
  
 // Model couldn’t be loaded or other internal error. 
  
 // ... 
  
 } 
  

Java

 LanguageIdentifier 
  
 languageIdentifier 
  
 = 
  
 LanguageIdentification 
 . 
 getClient 
 (); 
 languageIdentifier 
 . 
 identifyPossibleLanguages 
 ( 
 text 
 ) 
  
 . 
 addOnSuccessListener 
 ( 
 new 
  
 OnSuccessListener<List<IdentifiedLanguage> 
> () 
  
 { 
  
 @Override 
  
 public 
  
 void 
  
 onSuccess 
 ( 
 List<IdentifiedLanguage> 
  
 identifiedLanguages 
 ) 
  
 { 
  
 for 
  
 ( 
 IdentifiedLanguage 
  
 identifiedLanguage 
  
 : 
  
 identifiedLanguages 
 ) 
  
 { 
  
 String 
  
 language 
  
 = 
  
 identifiedLanguage 
 . 
 getLanguageTag 
 (); 
  
 float 
  
 confidence 
  
 = 
  
 identifiedLanguage 
 . 
 getConfidence 
 (); 
  
 Log 
 . 
 i 
 ( 
 TAG 
 , 
  
 language 
  
 + 
  
 " (" 
  
 + 
  
 confidence 
  
 + 
  
 ")" 
 ); 
  
 } 
  
 } 
  
 }) 
  
 . 
 addOnFailureListener 
 ( 
  
 new 
  
 OnFailureListener 
 () 
  
 { 
  
 @Override 
  
 public 
  
 void 
  
 onFailure 
 ( 
 @NonNull 
  
 Exception 
  
 e 
 ) 
  
 { 
  
 // Model couldn’t be loaded or other internal error. 
  
 // ... 
  
 } 
  
 }); 
  

If the call succeeds, a list of IdentifiedLanguage objects is passed to the success listener. From each object, you can get the language's BCP-47 code and the confidence that the string is in that language. Note that these values indicate the confidence that the entire string is in the given language; ML Kit doesn't identify multiple languages in a single string.

By default, ML Kit returns only languages with confidence values of at least 0.01. You can change this threshold by passing a LanguageIdentificationOptions object to getClient() :

Kotlin

 val 
  
 languageIdentifier 
  
 = 
  
 LanguageIdentification 
  
 . 
 getClient 
 ( 
 LanguageIdentificationOptions 
 . 
 Builder 
 () 
  
 . 
 setConfidenceThreshold 
 ( 
 0.5f 
 ) 
  
 . 
 build 
 ()) 

Java

 LanguageIdentifier 
  
 languageIdentifier 
  
 = 
  
 LanguageIdentification 
 . 
 getClient 
 ( 
  
 new 
  
 LanguageIdentificationOptions 
 . 
 Builder 
 () 
  
 . 
 setConfidenceThreshold 
 ( 
 0.5f 
 ) 
  
 . 
 build 
 ()); 

If no language meets this threshold, the list has one item, with the value und .

Create a Mobile Website
View Site in Mobile | Classic
Share by: