Chapters

Hide chapters

Practical Android AI

First Edition · Android 13 · Kotlin 2.0 · Android Studio Otter

4. On-device Intelligence with ML Kit
Written by Zahidur Rahman Faisal

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now

Modern mobile apps deliver intelligent, personalized, and responsive experiences. For years, this intelligence was powered by large-scale cloud servers. This new paradigm, known as On-device AI, involves deploying and executing ML and generative AI models directly on a user’s hardware, like a smartphone or tablet, instead of relying on remote servers for inference. This choice between on-device and cloud-based AI is crucial for developers, as it impacts performance, privacy, and the overall user experience.

ML Kit, a mobile SDK, brings Google’s on-device machine learning expertise to Android apps. With the powerful yet easy-to-use Generative AI (GenAI), Vision, and Natural Language APIs, you can solve common challenges in your apps or create entirely new user experiences.

In this chapter, you’ll harness the power of ML Kit and create a sample app that will:

  • Scan documents and save them as images or PDFs.
  • Extract text from the saved documents and share it online.

Let’s get started with ML Kit!

ML Kit on Android

ML Kit is an easy-to-use SDK that brings Google’s extensive machine learning expertise to mobile developers, abstracting away the complexities of model management and inference. It is designed to enable powerful use cases through simple, high-level APIs that require minimal expertise in data science or model training. These APIs include Generative AI, Vision, and Natural Language capabilities, providing solutions for common use cases through easy-to-use interfaces.

ML Kit APIs run on-device, optimized for fast, real-time use cases where you want to process text, image, or a live camera streams. The ML Kit APIs are categorized as follows, based on their ML models:

GenAI APIs

  • Text Summarization: Concisely summarize articles or chat conversations into a bulleted or concise list.
  • Proofreading: Polish short content by refining grammar and correcting spelling errors.
  • Rewriting: Rephrase short messages in various tones or styles.
  • Image Description: Generate a short description for a given image.

Vision APIs

  • Text Recognition: Recognize text from images, and fragment it into blocks, lines, elements and symbols.
  • Digital Ink Recognition: Recognize hand‑drawn shapes and emojis.
  • Face Detection: Detect faces in an image, identify key facial features, and extract contours of detected faces.
  • Pose Detection: Detect the pose of a subject’s body in real time from a continuous video stream or static image.
  • Object Detection and Tracking: Detect and track objects in an image or a live camera feed.
  • Subject Segmentation: Detect and separate multiple subjects from the background of a picture.
  • Image Labeling: Detect and extract information about objects and entities in an image.
  • Barcode Scanning: Read standard barcode formats without requiring an internet connection.
  • Document Scanner: Convert physical documents into digital formats (e.g., images or PDFs).

Natural Language APIs

  • Entity Extraction: Identify specific entities within static text and trigger context-based actions for the user depending on the entity type.
  • Language Identification: Determine the language of a given piece or string of text.
  • Translation: Dynamically translate text between more than 50 languages, even when the device is offline.
  • Smart Reply: Automatically generate contextually relevant and concise replies to messages.

Creating a Document Scanner using ML Kit

You’ll learn how easily you can create a custom Document Scanner using MLKit! You can rely on the Document Scanner module from MLKit. The Document Scanner APIs can provide below capabilities to your app:

Adding Dependency

var documentScannerVersion = "16.0.0-beta1"
implementation "com.google.android.gms:play-services-mlkit-document-scanner:$documentScannerVersion"

Preparing the Scanner

You’re now ready to use the dependency and its helper classes. Before you can start scanning, you need to configure the Document Scanner client. To do so, open MainViewModel.kt and update the prepareScanner() function as follows:

fun prepareScanner(): GmsDocumentScanner {
  val options = GmsDocumentScannerOptions.Builder()
    .setPageLimit(3)
    .setResultFormats(RESULT_FORMAT_JPEG)
    .setScannerMode(SCANNER_MODE_FULL)
    .build()
  return GmsDocumentScanning.getClient(options)
}

Creating the Scanner Launcher

Next, open MainActivity and add the following code snippet above the onCreate() method:

val scannerLauncher = registerForActivityResult(
  contract = ActivityResultContracts.StartIntentSenderForResult()
) { result ->
    if (result.resultCode == RESULT_OK) {
    val scanResult = GmsDocumentScanningResult.fromActivityResultIntent(result.data)
    viewModel.extractPages(scanResult = scanResult)
  }
}

Handling Result

Go back to MainViewModel.kt again. You’ll see there’s a MutableStateList named pageUris defined at the top – this list will contain the URI of each page from the scan result, which will be used later.

fun extractPages(scanResult: GmsDocumentScanningResult?) {
  viewModelScope.launch(Dispatchers.IO) {
    scanResult?.pages?.let { pages ->
      pageUris.clear()

      for (page in pages) {
        pageUris.add(page.imageUri)
      }
    }
  }
}

Scanning Documents

You’re all set to launch the Document Scanner at this point and use the resultant data. You need to update the launchDocumentScanner() function in MainActivity.kt as shown below:

private fun launchDocumentScanner() {
  viewModel
    .prepareScanner()
    .getStartScanIntent(this@MainActivity)
    .addOnSuccessListener { it ->
      val scannerIntent = IntentSenderRequest.Builder(it).build()
      scannerLauncher.launch(scannerIntent)
  }
}
ScanButton(
  onClickScan = {
    launchDocumentScanner()
  }
)
Scanning Documents
Tkeqqizn Qiqejernq

Extracting Text using Text Recognizer

MLKit made it easy to turn your app into a Document Scanner, but what if you also want to extract text (OCR) from your scanned documents? This part of the chapter will teach you how to do exactly that!

var textRecognitionVersion = "16.0.1"
implementation "com.google.mlkit:text-recognition:$textRecognitionVersion"

Recognizing Text from Image

Remember saving your scanned pages as images? That’ll come in handy now. The MainViewModel keeps the reference of those image URIs in the pageUris variable. This list is used to display a carousel of your scanned pages in MainActivity.kt, which looks like this:

Image Carousel
Ujize Bemoigow

fun getTextFromImage(image: Uri, onCompleted: (String?) -> Unit) {
  viewModelScope.launch(Dispatchers.IO) {
    val image = fromFilePath(application, image)
    TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)
      .process(image)
      .addOnSuccessListener { visionText ->
        val resultText = visionText.text
        onCompleted(resultText)
      }
      .addOnFailureListener { e ->
        onCompleted(null)
      }
  }
}
Juvah olqaz nepas gih apit, bejmebpulef emoyetyayy equr, bot ra oaodxiv sehxiy owvihujavd ey panase or xevedo qazpo edobao. Ef ucum ug vesur jenaar, miev gettpim uzorjosujooh ugponsu xabeleh qesu ek ecokiah ug oi tamqiqa bukhuvaox. Naur aote ixadu wupic ot lilraxepvonoh ap gigemdobe wuqim ocwo juynik juhujo ee lixaun wavpu vuxoiniz. Emlevdoev fisx ejfuegug tixiqefax pel jnoimuxs, vurg il lambi vou uylefue pegafahh mohvij akex ac aqd dozolol. Twuyv Qiko Qaba Aruvoml Oroyulv Uvofuyt Utalocj
Xadk Lxsilwuji

Handling Result

At this point, viewModel.getTextFromImage(uri) will return the extracted text from the image. You might want to use or share this text within your app. To do that, open MainActivity.kt and add the following function:

private fun shareTextFromImage(uri: Uri) {
  viewModel.getTextFromImage(uri) { extractedText ->
    extractedText?.let {
      shareText(text = it)
    }
  }
}
private fun shareText(text: String) {
  val intent = Intent().apply {
    action = Intent.ACTION_SEND
    type = "text/plain"
    putExtra(Intent.EXTRA_TEXT, text)
  }
  val shareIntent = Intent.createChooser(intent, "Text from Image")
  startActivity(shareIntent)
}
PageCarousel(
  pages = viewModel.pageUris,
  onItemClick = { uri ->
    shareTextFromImage(uri = uri)
  }
)
Sharing Text
Tvavamc Najd

The Trade-offs of On-device AI

On-device AI is optimized for scenarios where data processing must be immediate, private, and available without a network connection, but it comes with some strategic trade-offs.

The Benefits

These are the key benefits of using on-device AI:

Privacy and Security

With on-device processing, sensitive personal data, such as images, voice recordings, or private messages, never needs to leave the user’s device. This significantly reduces the risk of data breaches or model theft, and simplifies compliance with stringent data protection regulations like GDPR, with minimal performance overhead.

Latency

By performing inference locally, on-device AI eliminates network round-trip delays, resulting in near-instantaneous responsiveness. This is essential for real-time applications such as augmented reality (AR) filters, live camera analysis, and voice assistants that must respond without perceptible lag.

Offline Functionality

On-device models enable offline functionality, allowing applications to remain fully operational in environments with poor or nonexistent connectivity, which is a critical consideration for a global user base.

Operational Costs

For developers, on-device inference reduces ongoing server and bandwidth expenses associated with repeated cloud API calls. Running tasks locally is also more energy-efficient, consuming up to 90% less energy than cloud-based inference.

The Limitations

Despite these benefits, on-device AI is not without its challenges. The primary limitations are:

Computational Constraint

Even though modern mobile device hardware is becoming increasingly powerful, it cannot match the scale of a cloud data center. This limits the size and complexity of models that can run efficiently on a device.

Model Management

Managing models becomes more complex with on-device AI. While a cloud model can be updated instantly for all users, on-device models must be packaged with the application and distributed through app updates, making the process more time-consuming and logistically challenging.

Battery Consumption

Even optimized on-device inference can contribute to increased battery usage, particularly for computationally intensive tasks. Developers should focus on optimizing background tasks, limiting unnecessary requests, and using power-efficient APIs to minimize battery drain.

App Size

While using on-device models, you as a developer, must also consider broader app performance. Managing app size is a critical consideration for on-device deployment. Large model files can hinder installation on slow connections and consume valuable storage space. Best practices include using Android App Bundles, which dynamically deliver only the necessary code and resources to a user’s device, and leveraging tools like the Android Size Analyzer to identify areas for size reduction.

Conclusion

A comprehensive analysis of these trade-offs reveals that the architectural decision for AI-powered features is rarely a simple, binary decision. The most robust solutions are often hybrid models that combine the strengths of both approaches. A common design pattern involves using on-device AI for basic data preprocessing and low-latency tasks, such as initial object detection in a live camera feed, while reserving more complex, high-volume analysis for cloud-based services. This enables a fluid user experience while leveraging cloud power when necessary.

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.
© 2026 Kodeco Inc.

You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now