Recognizing digital ink with ML Kit on Android

With ML Kit's digital ink recognition, you can recognize text handwritten on a digital surface in hundreds of languages, as well as classify sketches.

Try it out

  • Play around with the sample app to see an example usage of this API.

Before you begin

  1. In your project-level build.gradle file, make sure to include Google's Maven repository in both your buildscript and allprojects sections.
  2. Add the dependencies for the ML Kit Android libraries to your module's app-level Gradle file, which is usually app/build.gradle :
  dependencies 
  
 { 
  
 // ... 
  
 implementation 
  
 ' 
 com 
 . 
 google 
 . 
 mlkit 
 : 
 digital 
 - 
 ink 
 - 
 recognition 
 : 
 19.0.0 
 ' 
 } 
 

You are now ready to start recognizing text in Ink objects.

Build an Ink object

The main way to build an Ink object is to draw it on a touch screen. On Android, you can use a Canvas for this purpose. Your touch event handlers should call the addNewTouchEvent() method shown the following code snippet to store the points in the strokes the user draws into the Ink object.

This general pattern is demonstrated in the following code snippet. See the ML Kit quickstart sample for a more complete example.

Kotlin

 var 
  
 inkBuilder 
  
 = 
  
 Ink 
 . 
 builder 
 () 
 lateinit 
  
 var 
  
 strokeBuilder 
 : 
  
 Ink 
 . 
 Stroke 
 . 
 Builder 
 // Call this each time there is a new event. 
 fun 
  
 addNewTouchEvent 
 ( 
 event 
 : 
  
 MotionEvent 
 ) 
  
 { 
  
 val 
  
 action 
  
 = 
  
 event 
 . 
 actionMasked 
  
 val 
  
 x 
  
 = 
  
 event 
 . 
 x 
  
 val 
  
 y 
  
 = 
  
 event 
 . 
 y 
  
 var 
  
 t 
  
 = 
  
 System 
 . 
 currentTimeMillis 
 () 
  
 // If your setup does not provide timing information, you can omit the 
  
 // third paramater (t) in the calls to Ink.Point.create 
  
 when 
  
 ( 
 action 
 ) 
  
 { 
  
 MotionEvent 
 . 
 ACTION_DOWN 
  
 -> 
  
 { 
  
 strokeBuilder 
  
 = 
  
 Ink 
 . 
 Stroke 
 . 
 builder 
 () 
  
 strokeBuilder 
 . 
 addPoint 
 ( 
 Ink 
 . 
 Point 
 . 
 create 
 ( 
 x 
 , 
  
 y 
 , 
  
 t 
 )) 
  
 } 
  
 MotionEvent 
 . 
 ACTION_MOVE 
  
 -> 
  
 strokeBuilder 
 !! 
 . 
 addPoint 
 ( 
 Ink 
 . 
 Point 
 . 
 create 
 ( 
 x 
 , 
  
 y 
 , 
  
 t 
 )) 
  
 MotionEvent 
 . 
 ACTION_UP 
  
 -> 
  
 { 
  
 strokeBuilder 
 . 
 addPoint 
 ( 
 Ink 
 . 
 Point 
 . 
 create 
 ( 
 x 
 , 
  
 y 
 , 
  
 t 
 )) 
  
 inkBuilder 
 . 
 addStroke 
 ( 
 strokeBuilder 
 . 
 build 
 ()) 
  
 } 
  
 else 
  
 -> 
  
 { 
  
 // Action not relevant for ink construction 
  
 } 
  
 } 
 } 
 ... 
 // This is what to send to the recognizer. 
 val 
  
 ink 
  
 = 
  
 inkBuilder 
 . 
 build 
 () 

Java

 Ink 
 . 
 Builder 
  
 inkBuilder 
  
 = 
  
 Ink 
 . 
 builder 
 (); 
 Ink 
 . 
 Stroke 
 . 
 Builder 
  
 strokeBuilder 
 ; 
 // Call this each time there is a new event. 
 public 
  
 void 
  
 addNewTouchEvent 
 ( 
 MotionEvent 
  
 event 
 ) 
  
 { 
  
 float 
  
 x 
  
 = 
  
 event 
 . 
 getX 
 (); 
  
 float 
  
 y 
  
 = 
  
 event 
 . 
 getY 
 (); 
  
 long 
  
 t 
  
 = 
  
 System 
 . 
 currentTimeMillis 
 (); 
  
 // If your setup does not provide timing information, you can omit the 
  
 // third paramater (t) in the calls to Ink.Point.create 
  
 int 
  
 action 
  
 = 
  
 event 
 . 
 getActionMasked 
 (); 
  
 switch 
  
 ( 
 action 
 ) 
  
 { 
  
 case 
  
 MotionEvent 
 . 
 ACTION_DOWN 
 : 
  
 strokeBuilder 
  
 = 
  
 Ink 
 . 
 Stroke 
 . 
 builder 
 (); 
  
 strokeBuilder 
 . 
 addPoint 
 ( 
 Ink 
 . 
 Point 
 . 
 create 
 ( 
 x 
 , 
  
 y 
 , 
  
 t 
 )); 
  
 break 
 ; 
  
 case 
  
 MotionEvent 
 . 
 ACTION_MOVE 
 : 
  
 strokeBuilder 
 . 
 addPoint 
 ( 
 Ink 
 . 
 Point 
 . 
 create 
 ( 
 x 
 , 
  
 y 
 , 
  
 t 
 )); 
  
 break 
 ; 
  
 case 
  
 MotionEvent 
 . 
 ACTION_UP 
 : 
  
 strokeBuilder 
 . 
 addPoint 
 ( 
 Ink 
 . 
 Point 
 . 
 create 
 ( 
 x 
 , 
  
 y 
 , 
  
 t 
 )); 
  
 inkBuilder 
 . 
 addStroke 
 ( 
 strokeBuilder 
 . 
 build 
 ()); 
  
 strokeBuilder 
  
 = 
  
 null 
 ; 
  
 break 
 ; 
  
 } 
 } 
 ... 
 // This is what to send to the recognizer. 
 Ink 
  
 ink 
  
 = 
  
 inkBuilder 
 . 
 build 
 (); 

Get an instance of DigitalInkRecognizer

To perform recognition, send the Ink instance to a DigitalInkRecognizer object. The code below shows how to instantiate such a recognizer from a BCP-47 tag.

Kotlin

 // Specify the recognition model for a language 
 var 
  
 modelIdentifier 
 : 
  
 DigitalInkRecognitionModelIdentifier 
 try 
  
 { 
  
 modelIdentifier 
  
 = 
  
 DigitalInkRecognitionModelIdentifier 
 . 
 fromLanguageTag 
 ( 
 "en-US" 
 ) 
 } 
  
 catch 
  
 ( 
 e 
 : 
  
 MlKitException 
 ) 
  
 { 
  
 // language tag failed to parse, handle error. 
 } 
 if 
  
 ( 
 modelIdentifier 
  
 == 
  
 null 
 ) 
  
 { 
  
 // no model was found, handle error. 
 } 
 var 
  
 model 
 : 
  
 DigitalInkRecognitionModel 
  
 = 
  
 DigitalInkRecognitionModel 
 . 
 builder 
 ( 
 modelIdentifier 
 ). 
 build 
 () 
 // Get a recognizer for the language 
 var 
  
 recognizer 
 : 
  
 DigitalInkRecognizer 
  
 = 
  
 DigitalInkRecognition 
 . 
 getClient 
 ( 
  
 DigitalInkRecognizerOptions 
 . 
 builder 
 ( 
 model 
 ). 
 build 
 ()) 

Java

 // Specify the recognition model for a language 
 DigitalInkRecognitionModelIdentifier 
  
 modelIdentifier 
 ; 
 try 
  
 { 
  
 modelIdentifier 
  
 = 
  
 DigitalInkRecognitionModelIdentifier 
 . 
 fromLanguageTag 
 ( 
 "en-US" 
 ); 
 } 
  
 catch 
  
 ( 
 MlKitException 
  
 e 
 ) 
  
 { 
  
 // language tag failed to parse, handle error. 
 } 
 if 
  
 ( 
 modelIdentifier 
  
 == 
  
 null 
 ) 
  
 { 
  
 // no model was found, handle error. 
 } 
 DigitalInkRecognitionModel 
  
 model 
  
 = 
  
 DigitalInkRecognitionModel 
 . 
 builder 
 ( 
 modelIdentifier 
 ). 
 build 
 (); 
 // Get a recognizer for the language 
 DigitalInkRecognizer 
  
 recognizer 
  
 = 
  
 DigitalInkRecognition 
 . 
 getClient 
 ( 
  
 DigitalInkRecognizerOptions 
 . 
 builder 
 ( 
 model 
 ). 
 build 
 ()); 

Process an Ink object

Kotlin

 recognizer 
 . 
 recognize 
 ( 
 ink 
 ) 
  
 . 
 addOnSuccessListener 
  
 { 
  
 result 
 : 
  
 RecognitionResult 
  
 -> 
  
 // `result` contains the recognizer's answers as a RecognitionResult. 
  
 // Logs the text from the top candidate. 
  
 Log 
 . 
 i 
 ( 
 TAG 
 , 
  
 result 
 . 
 candidates 
 [ 
 0 
 ] 
 . 
 text 
 ) 
  
 } 
  
 . 
 addOnFailureListener 
  
 { 
  
 e 
 : 
  
 Exception 
  
 -> 
  
 Log 
 . 
 e 
 ( 
 TAG 
 , 
  
 "Error during recognition: 
 $ 
 e 
 " 
 ) 
  
 } 

Java

 recognizer 
 . 
 recognize 
 ( 
 ink 
 ) 
  
 . 
 addOnSuccessListener 
 ( 
  
 // `result` contains the recognizer's answers as a RecognitionResult. 
  
 // Logs the text from the top candidate. 
  
 result 
  
 -> 
  
 Log 
 . 
 i 
 ( 
 TAG 
 , 
  
 result 
 . 
 getCandidates 
 (). 
 get 
 ( 
 0 
 ). 
 getText 
 ())) 
  
 . 
 addOnFailureListener 
 ( 
  
 e 
  
 -> 
  
 Log 
 . 
 e 
 ( 
 TAG 
 , 
  
 "Error during recognition: " 
  
 + 
  
 e 
 )); 

The sample code above assumes that the recognition model has already been downloaded, as described in the next section.

Managing model downloads

While the digital ink recognition API supports hundreds of languages, each language requires some data to be downloaded prior to any recognition. Around 20MB of storage is required per language. This is handled by the RemoteModelManager object.

Download a new model

Kotlin

 import 
  
 com.google.mlkit.common.model.DownloadConditions 
 import 
  
 com.google.mlkit.common.model.RemoteModelManager 
 var 
  
 model 
 : 
  
 DigitalInkRecognitionModel 
  
 = 
  
 ... 
 val 
  
 remoteModelManager 
  
 = 
  
 RemoteModelManager 
 . 
 getInstance 
 () 
 remoteModelManager 
 . 
 download 
 ( 
 model 
 , 
  
 DownloadConditions 
 . 
 Builder 
 (). 
 build 
 ()) 
  
 . 
 addOnSuccessListener 
  
 { 
  
 Log 
 . 
 i 
 ( 
 TAG 
 , 
  
 "Model downloaded" 
 ) 
  
 } 
  
 . 
 addOnFailureListener 
  
 { 
  
 e 
 : 
  
 Exception 
  
 -> 
  
 Log 
 . 
 e 
 ( 
 TAG 
 , 
  
 "Error while downloading a model: 
 $ 
 e 
 " 
 ) 
  
 } 

Java

 import 
  
 com.google.mlkit.common.model.DownloadConditions 
 ; 
 import 
  
 com.google.mlkit.common.model.RemoteModelManager 
 ; 
 DigitalInkRecognitionModel 
  
 model 
  
 = 
  
 ...; 
 RemoteModelManager 
  
 remoteModelManager 
  
 = 
  
 RemoteModelManager 
 . 
 getInstance 
 (); 
 remoteModelManager 
  
 . 
 download 
 ( 
 model 
 , 
  
 new 
  
 DownloadConditions 
 . 
 Builder 
 (). 
 build 
 ()) 
  
 . 
 addOnSuccessListener 
 ( 
 aVoid 
  
 -> 
  
 Log 
 . 
 i 
 ( 
 TAG 
 , 
  
 "Model downloaded" 
 )) 
  
 . 
 addOnFailureListener 
 ( 
  
 e 
  
 -> 
  
 Log 
 . 
 e 
 ( 
 TAG 
 , 
  
 "Error while downloading a model: " 
  
 + 
  
 e 
 )); 

Check whether a model has been downloaded already

Kotlin

 var 
  
 model 
 : 
  
 DigitalInkRecognitionModel 
  
 = 
  
 ... 
 remoteModelManager 
 . 
 isModelDownloaded 
 ( 
 model 
 ) 

Java

 DigitalInkRecognitionModel 
  
 model 
  
 = 
  
 ...; 
 remoteModelManager 
 . 
 isModelDownloaded 
 ( 
 model 
 ); 

Delete a downloaded model

Removing a model from the device's storage frees up space.

Kotlin

 var 
  
 model 
 : 
  
 DigitalInkRecognitionModel 
  
 = 
  
 ... 
 remoteModelManager 
 . 
 deleteDownloadedModel 
 ( 
 model 
 ) 
  
 . 
 addOnSuccessListener 
  
 { 
  
 Log 
 . 
 i 
 ( 
 TAG 
 , 
  
 "Model successfully deleted" 
 ) 
  
 } 
  
 . 
 addOnFailureListener 
  
 { 
  
 e 
 : 
  
 Exception 
  
 -> 
  
 Log 
 . 
 e 
 ( 
 TAG 
 , 
  
 "Error while deleting a model: 
 $ 
 e 
 " 
 ) 
  
 } 

Java

 DigitalInkRecognitionModel 
  
 model 
  
 = 
  
 ...; 
 remoteModelManager 
 . 
 deleteDownloadedModel 
 ( 
 model 
 ) 
  
 . 
 addOnSuccessListener 
 ( 
  
 aVoid 
  
 -> 
  
 Log 
 . 
 i 
 ( 
 TAG 
 , 
  
 "Model successfully deleted" 
 )) 
  
 . 
 addOnFailureListener 
 ( 
  
 e 
  
 -> 
  
 Log 
 . 
 e 
 ( 
 TAG 
 , 
  
 "Error while deleting a model: " 
  
 + 
  
 e 
 )); 

Tips to improve text recognition accuracy

The accuracy of text recognition can vary across different languages. Accuracy also depends on writing style. While Digital Ink Recognition is trained to handle many kinds of writing styles, results can vary from user to user.

Here are some ways to improve the accuracy of a text recognizer. Note that these techniques do not apply to the drawing classifiers for emojis, autodraw, and shapes.

Writing area

Many applications have a well defined writing area for user input. The meaning of a symbol is partially determined by its size relative to the size of the writing area that contains it. For example, the difference between a lower or upper case letter "o" or "c", and a comma versus a forward slash.

Telling the recognizer the width and height of the writing area can improve accuracy. However, the recognizer assumes that the writing area only contains a single line of text. If the physical writing area is large enough to allow the user to write two or more lines, you may get better results by passing in a WritingArea with a height that is your best estimate of the height of a single line of text. The WritingArea object you pass to the recognizer does not have to correspond exactly with the physical writing area on the screen. Changing the WritingArea height in this way works better in some languages than others.

When you specify the writing area, specify its width and height in the same units as the stroke coordinates. The x,y coordinate arguments have no unit requirement - the API normalizes all units, so the only thing that matters is the relative size and position of strokes. You are free to pass in coordinates in whatever scale makes sense for your system.

Pre-context

Pre-context is the text that immediately precedes the strokes in the Ink that you are trying to recognize. You can help the recognizer by telling it about the pre-context.

For example, the cursive letters "n" and "u" are often mistaken for one another. If the user has already entered the partial word "arg", they might continue with strokes that can be recognized as "ument" or "nment". Specifying the pre-context "arg" resolves the ambiguity, since the word "argument" is more likely than "argnment".

Pre-context can also help the recognizer identify word breaks, the spaces between words. You can type a space character but you cannot draw one, so how can a recognizer determine when one word ends and the next one starts? If the user has already written "hello" and continues with the written word "world", without pre-context the recognizer returns the string "world". However, if you specify the pre-context "hello", the model will return the string " world", with a leading space, since "hello world" makes more sense than "helloword".

You should provide the longest possible pre-context string, up to 20 characters, including spaces. If the string is longer, the recognizer only uses the last 20 characters.

The code sample below shows how to define a writing area and use a RecognitionContext object to specify pre-context.

Kotlin

 var 
  
 preContext 
  
 : 
  
 String 
  
 = 
  
 ...; 
 var 
  
 width 
  
 : 
  
 Float 
  
 = 
  
 ...; 
 var 
  
 height 
  
 : 
  
 Float 
  
 = 
  
 ...; 
 val 
  
 recognitionContext 
  
 : 
  
 RecognitionContext 
  
 = 
  
 RecognitionContext 
 . 
 builder 
 () 
  
 . 
 setPreContext 
 ( 
 preContext 
 ) 
  
 . 
 setWritingArea 
 ( 
 WritingArea 
 ( 
 width 
 , 
  
 height 
 )) 
  
 . 
 build 
 () 
 recognizer 
 . 
 recognize 
 ( 
 ink 
 , 
  
 recognitionContext 
 ) 

Java

 String 
  
 preContext 
  
 = 
  
 ...; 
 float 
  
 width 
  
 = 
  
 ...; 
 float 
  
 height 
  
 = 
  
 ...; 
 RecognitionContext 
  
 recognitionContext 
  
 = 
  
 RecognitionContext 
 . 
 builder 
 () 
  
 . 
 setPreContext 
 ( 
 preContext 
 ) 
  
 . 
 setWritingArea 
 ( 
 new 
  
 WritingArea 
 ( 
 width 
 , 
  
 height 
 )) 
  
 . 
 build 
 (); 
 recognizer 
 . 
 recognize 
 ( 
 ink 
 , 
  
 recognitionContext 
 ); 

Stroke ordering

Recognition accuracy is sensitive to the order of the strokes. The recognizers expect strokes to occur in the order people would naturally write; for example left-to-right for English. Any case that departs from this pattern, such as writing an English sentence starting with the last word, gives less accurate results.

Another example is when a word in the middle of an Ink is removed and replaced with another word. The revision is probably in the middle of a sentence, but the strokes for the revision are at the end of the stroke sequence. In this case we recommend sending the newly written word separately to the API and merging the result with the prior recognitions using your own logic.

Dealing with ambiguous shapes

There are cases where the meaning of the shape provided to the recognizer is ambiguous. For example, a rectangle with very rounded edges could be seen as either a rectangle or an ellipse.

These unclear cases can be handled by using recognition scores when they are available. Only shape classifiers provide scores. If the model is very confident, the top result's score will be much better than the second best. If there is uncertainty, the scores for the top two results will be close. Also, keep in mind that the shape classifiers interpret the whole Ink as a single shape. For example, if the Ink contains a rectangle and an ellipse next to each other, the recognizer may return one or the other (or something completely different) as a result, since a single recognition candidate cannot represent two shapes.

Create a Mobile Website
View Site in Mobile | Classic
Share by: