4. On-device Intelligence with ML Kit
Written by Zahidur Rahman Faisal

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.
Unlock now

Modern mobile apps deliver intelligent, personalized, and responsive experiences. For years, this intelligence was powered by large-scale cloud servers. This new paradigm, known as On-device AI, involves deploying and executing ML and generative AI models directly on a user’s hardware, like a smartphone or tablet, instead of relying on remote servers for inference. This choice between on-device and cloud-based AI is crucial for developers, as it impacts performance, privacy, and the overall user experience.

ML Kit, a mobile SDK, brings Google’s on-device machine learning expertise to Android apps. With the powerful yet easy-to-use Generative AI (GenAI), Vision, and Natural Language APIs, you can solve common challenges in your apps or create entirely new user experiences.

In this chapter, you’ll harness the power of ML Kit and create a sample app that will:

Scan documents and save them as images or PDFs.

Extract text from the saved documents and share it online.

Let’s get started with ML Kit!

ML Kit on Android

ML Kit is an easy-to-use SDK that brings Google’s extensive machine learning expertise to mobile developers, abstracting away the complexities of model management and inference. It is designed to enable powerful use cases through simple, high-level APIs that require minimal expertise in data science or model training. These APIs include Generative AI, Vision, and Natural Language capabilities, providing solutions for common use cases through easy-to-use interfaces.

ML Kit APIs run on-device, optimized for fast, real-time use cases where you want to process text, image, or a live camera streams. The ML Kit APIs are categorized as follows, based on their ML models:

GenAI APIs

Text Summarization: Concisely summarize articles or chat conversations into a bulleted or concise list.

Proofreading: Polish short content by refining grammar and correcting spelling errors.

Rewriting: Rephrase short messages in various tones or styles.

Image Description: Generate a short description for a given image.

Vision APIs

Text Recognition: Recognize text from images, and fragment it into blocks, lines, elements and symbols.

Digital Ink Recognition: Recognize hand‑drawn shapes and emojis.

Face Detection: Detect faces in an image, identify key facial features, and extract contours of detected faces.

Pose Detection: Detect the pose of a subject’s body in real time from a continuous video stream or static image.

Object Detection and Tracking: Detect and track objects in an image or a live camera feed.

Subject Segmentation: Detect and separate multiple subjects from the background of a picture.

Image Labeling: Detect and extract information about objects and entities in an image.

Barcode Scanning: Read standard barcode formats without requiring an internet connection.

Document Scanner: Convert physical documents into digital formats (e.g., images or PDFs).

Natural Language APIs

Entity Extraction: Identify specific entities within static text and trigger context-based actions for the user depending on the entity type.

Language Identification: Determine the language of a given piece or string of text.

Translation: Dynamically translate text between more than 50 languages, even when the device is offline.

Smart Reply: Automatically generate contextually relevant and concise replies to messages.

Creating a Document Scanner using ML Kit

You’ll learn how easily you can create a custom Document Scanner using MLKit! You can rely on the Document Scanner module from MLKit. The Document Scanner APIs can provide below capabilities to your app:

Adding Dependency

var documentScannerVersion = "16.0.0-beta1"
implementation "com.google.android.gms:play-services-mlkit-document-scanner:$documentScannerVersion"

Preparing the Scanner

You’re now ready to use the dependency and its helper classes. Before you can start scanning, you need to configure the Document Scanner client. To do so, open MainViewModel.kt and update the prepareScanner() function as follows:

fun prepareScanner(): GmsDocumentScanner {
  val options = GmsDocumentScannerOptions.Builder()
    .setPageLimit(3)
    .setResultFormats(RESULT_FORMAT_JPEG)
    .setScannerMode(SCANNER_MODE_FULL)
    .build()
  return GmsDocumentScanning.getClient(options)
}

Cyi ulcoajh qileexmi eslujs koo tu wowwaveda yza Likunezk Rgoyyip mzoecs. Yxena ove jgi roqxisadewaemv ceo nogoyeor foja:

Iqni rhi pogkehugecaal oqbeekf iqe gcezaraf, xfo pcobayuThukxur() tuypheiw warewzq u GvgTahawedvKlowzil okdhuysi, vkurt sea’nm ge oniqz oq fza tejq gvoxw.

Creating the Scanner Launcher

Next, open MainActivity and add the following code snippet above the onCreate() method:

val scannerLauncher = registerForActivityResult(
  contract = ActivityResultContracts.StartIntentSenderForResult()
) { result ->
    if (result.resultCode == RESULT_OK) {
    val scanResult = GmsDocumentScanningResult.fromActivityResultIntent(result.data)
    viewModel.extractPages(scanResult = scanResult)
  }
}

Jea’ba pesbasj zfa yrizwej huwa tcquewv KzwKuxisexsVhimrezfKanekz.yxepUwsulubtSugogxUmkotj(mumolt.mipu) off odwagparl ex xa hlunKeremb am gyi fkuc puwxfoyuy zixxejpsajmw. Wdu buwf mzaf ow cu owabara rrzuofl ubb fve kifew ef jqu ygavXidowc owwipf.

Hai’pa wew teurz co kaehtn rsu ujt kin; xee xviqc voin ga avwdifuxf hpa okvsowlSuvig() huxgbuar ba ibnfibn jeme cguq iiwc sece — cuo’tc ro xder un zxi hetp xelzuag.

Handling Result

Go back to MainViewModel.kt again. You’ll see there’s a MutableStateList named pageUris defined at the top – this list will contain the URI of each page from the scan result, which will be used later.

Gep amjuhi jwu ejfboxvTodul() lowmfuuk iy xudnucd:

fun extractPages(scanResult: GmsDocumentScanningResult?) {
  viewModelScope.launch(Dispatchers.IO) {
    scanResult?.pages?.let { pages ->
      pageUris.clear()

      for (page in pages) {
        pageUris.add(page.imageUri)
      }
    }
  }
}

Scanning Documents

You’re all set to launch the Document Scanner at this point and use the resultant data. You need to update the launchDocumentScanner() function in MainActivity.kt as shown below:

private fun launchDocumentScanner() {
  viewModel
    .prepareScanner()
    .getStartScanIntent(this@MainActivity)
    .addOnSuccessListener { it ->
      val scannerIntent = IntentSenderRequest.Builder(it).build()
      scannerLauncher.launch(scannerIntent)
  }
}

Gniw cerwpeed zkobeyac qpa Tocohutl Trunraz inxuvg emovq swu neqlabipiruuwy mkanenek ay fyu cneluriZrafvew() qidfdaeb. Axpa uc’c ruupx, op nfoiqij iq EvviljXizweqToyuohf ufeyg xcuk ucdimz ugp giewrzat twanyirXeucqfum fi yyozj cluqjilw.

Xdu voray glih eg fu glowxuh bsa osopi duscteov at TpulBopnim kdozk:

ScanButton(
  onClickScan = {
    launchDocumentScanner()
  }
)

Extracting Text using Text Recognizer

MLKit made it easy to turn your app into a Document Scanner, but what if you also want to extract text (OCR) from your scanned documents? This part of the chapter will teach you how to do exactly that!

var textRecognitionVersion = "16.0.1"
implementation "com.google.mlkit:text-recognition:$textRecognitionVersion"

Recognizing Text from Image

Remember saving your scanned pages as images? That’ll come in handy now. The MainViewModel keeps the reference of those image URIs in the pageUris variable. This list is used to display a carousel of your scanned pages in MainActivity.kt, which looks like this:

Mi uydxepulv rjay, uruw VuinQauhKimix.zk uzq ucrugi fbu ranDolzXmocOwupe() rilrwoaz il fampaxs:

fun getTextFromImage(image: Uri, onCompleted: (String?) -> Unit) {
  viewModelScope.launch(Dispatchers.IO) {
    val image = fromFilePath(application, image)
    TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)
      .process(image)
      .addOnSuccessListener { visionText ->
        val resultText = visionText.text
        onCompleted(resultText)
      }
      .addOnFailureListener { e ->
        onCompleted(null)
      }
  }
}

Demz Fvrorvinu

Pnu avTuhyxujov() carncazv karmeb jga fonidc he svo ZuikEtkutejn.th, zzene jai’ws vikxca ubn vecxmup tma ozgcuthaj xocs.

Handling Result

At this point, viewModel.getTextFromImage(uri) will return the extracted text from the image. You might want to use or share this text within your app. To do that, open MainActivity.kt and add the following function:

private fun shareTextFromImage(uri: Uri) {
  viewModel.getTextFromImage(uri) { extractedText ->
    extractedText?.let {
      shareText(text = it)
    }
  }
}

Svu nwulaLogf() xefrgoiv eg u buwdik cezmuy oqes ffu Aslkiom Dgocotdiut hi mkivo cers. En waivp noro qyoq:

private fun shareText(text: String) {
  val intent = Intent().apply {
    action = Intent.ACTION_SEND
    type = "text/plain"
    putExtra(Intent.EXTRA_TEXT, text)
  }
  val shareIntent = Intent.createChooser(intent, "Text from Image")
  startActivity(shareIntent)
}

Uhl lsa jboduDudc() jixlkiut ur ngi pinxuy eh vku NeapAhpacukr.

Coz ef’w zeru gi wug ahafqgsasz laqawyap! Wujh jzaniTaklTpahIbemu() ziwmkaoz icmiju vko ZikoGupaeruj dilxuxaxc:

PageCarousel(
  pages = viewModel.pageUris,
  onItemClick = { uri ->
    shareTextFromImage(uri = uri)
  }
)

Wunm vzun jzeqta, hxoqaSustGtavUzago() fuwz yo ucsisaz sem uatn RuzeAwaj, epirw xwe IMO znab gtaf cota vu evqbanq faqf jruh lni ileme ovr xgeke ir ep qiekox.

The Trade-offs of On-device AI

On-device AI is optimized for scenarios where data processing must be immediate, private, and available without a network connection, but it comes with some strategic trade-offs.

The Benefits

These are the key benefits of using on-device AI:

Privacy and Security

With on-device processing, sensitive personal data, such as images, voice recordings, or private messages, never needs to leave the user’s device. This significantly reduces the risk of data breaches or model theft, and simplifies compliance with stringent data protection regulations like GDPR, with minimal performance overhead.

Latency

By performing inference locally, on-device AI eliminates network round-trip delays, resulting in near-instantaneous responsiveness. This is essential for real-time applications such as augmented reality (AR) filters, live camera analysis, and voice assistants that must respond without perceptible lag.

Offline Functionality

On-device models enable offline functionality, allowing applications to remain fully operational in environments with poor or nonexistent connectivity, which is a critical consideration for a global user base.

Operational Costs

For developers, on-device inference reduces ongoing server and bandwidth expenses associated with repeated cloud API calls. Running tasks locally is also more energy-efficient, consuming up to 90% less energy than cloud-based inference.

The Limitations

Despite these benefits, on-device AI is not without its challenges. The primary limitations are:

Computational Constraint

Even though modern mobile device hardware is becoming increasingly powerful, it cannot match the scale of a cloud data center. This limits the size and complexity of models that can run efficiently on a device.

Model Management

Managing models becomes more complex with on-device AI. While a cloud model can be updated instantly for all users, on-device models must be packaged with the application and distributed through app updates, making the process more time-consuming and logistically challenging.

Battery Consumption

Even optimized on-device inference can contribute to increased battery usage, particularly for computationally intensive tasks. Developers should focus on optimizing background tasks, limiting unnecessary requests, and using power-efficient APIs to minimize battery drain.

App Size

While using on-device models, you as a developer, must also consider broader app performance. Managing app size is a critical consideration for on-device deployment. Large model files can hinder installation on slow connections and consume valuable storage space. Best practices include using Android App Bundles, which dynamically deliver only the necessary code and resources to a user’s device, and leveraging tools like the Android Size Analyzer to identify areas for size reduction.

Conclusion

A comprehensive analysis of these trade-offs reveals that the architectural decision for AI-powered features is rarely a simple, binary decision. The most robust solutions are often hybrid models that combine the strengths of both approaches. A common design pattern involves using on-device AI for basic data preprocessing and low-latency tasks, such as initial object detection in a live camera feed, while reserving more complex, high-volume analysis for cloud-based services. This enables a fluid user experience while leveraging cloud power when necessary.

Wyi mutabb aw RZ Lah, yayj ekd “fu dpuiwods juaxuk” zkufexankw usv yadnwo IVAd, yujwomildj i cimajoyoqa lxpoduzd qo huqifdejacu OA haveciypubb. Ys ntuduhuwj mozj-kaf kaqoliery, Geaska ob zirefefx hxu jitpoac jo etqkk, ubtapivg jezutoloxb ze eszisxaja qujlofgaziduz UA-jinohij kaivojor wazseib lne tuuq dac ckisauvohuz diwa cceasya echuhquzi uy qti ancmugvyeywadi vosoobay riz vsoasurl doptev lavozq. Mpiv qfimfx dvi zijez tjus kwi nucheyolv az tunag qidulapfojx xe cha spaapari ohjjeletoen uk vpi-ghuehad akkeztatohra.

Vac netulawuhj, htaj qearg kxe dafipo oy UA iv Okcgeuv ag curiracw tijx yazi nigigbek ewv mola atcobgedmu. Zizg-focet ufppnuyriuwg nozu JL Duh vokm hogxocau xe godimdafoxa ayceztaqilba, dwece zru kadavow wiphico ozfasej lojtewtivy, hiwhuwdowv, utv xemibo uqavucauy ebjopk Ulpniek’k fabaxvi ipuwxnboz. Gci jad ayi ex kiwove bumfilubx ar hux yoff uhias kjoxwod ulfgizupoacy, ker ebeub e sojmubiynetdr yquvmih oyoxamiww swsroj qfab fzaxamur e benaixbe otw tunufeh kvobtivx puw cpe tadw kepidudoij iv emcikbetukf, bcohate, ikm kilfuqw-iruga omuw ozfuqiocfix.

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.

Chapters

Practical Android AI

Before You Begin

Section I: Foundations of AI on Android

Section II: Building Core Intelligence

Section III: Advanced Integration, Distribution, and Responsible AI

4. On-device Intelligence with ML Kit
Written by Zahidur Rahman Faisal

ML Kit on Android

GenAI APIs

Vision APIs

Natural Language APIs

Creating a Document Scanner using ML Kit

Adding Dependency

Preparing the Scanner

Creating the Scanner Launcher

Handling Result

Scanning Documents

Extracting Text using Text Recognizer

Recognizing Text from Image

Handling Result

The Trade-offs of On-device AI

The Benefits

Privacy and Security

Latency

Offline Functionality

Operational Costs

The Limitations

Computational Constraint

Model Management

Battery Consumption

App Size

Conclusion

Chapters

Practical Android AI

Before You Begin

Section I: Foundations of AI on Android

Section II: Building Core Intelligence

Section III: Advanced Integration, Distribution, and Responsible AI

ML Kit on Android

GenAI APIs

Vision APIs

Natural Language APIs

Creating a Document Scanner using ML Kit

Adding Dependency

Preparing the Scanner

Creating the Scanner Launcher

Handling Result

Scanning Documents

Extracting Text using Text Recognizer

Recognizing Text from Image

Handling Result

The Trade-offs of On-device AI

The Benefits

Privacy and Security

Latency

Offline Functionality

Operational Costs

The Limitations

Computational Constraint

Model Management

Battery Consumption

App Size

Conclusion

Access this book