Modern mobile apps deliver intelligent, personalized, and responsive experiences. For years, this intelligence was powered by large-scale cloud servers. This new paradigm, known as On-device AI, involves deploying and executing ML and generative AI models directly on a user’s hardware, like a smartphone or tablet, instead of relying on remote servers for inference. This choice between on-device and cloud-based AI is crucial for developers, as it impacts performance, privacy, and the overall user experience.
ML Kit, a mobile SDK, brings Google’s on-device machine learning expertise to Android apps. With the powerful yet easy-to-use Generative AI (GenAI), Vision, and Natural Language APIs, you can solve common challenges in your apps or create entirely new user experiences.
In this chapter, you’ll harness the power of ML Kit and create a sample app that will:
Scan documents and save them as images or PDFs.
Extract text from the saved documents and share it online.
Let’s get started with ML Kit!
ML Kit on Android
ML Kit is an easy-to-use SDK that brings Google’s extensive machine learning expertise to mobile developers, abstracting away the complexities of model management and inference. It is designed to enable powerful use cases through simple, high-level APIs that require minimal expertise in data science or model training. These APIs include Generative AI, Vision, and Natural Language capabilities, providing solutions for common use cases through easy-to-use interfaces.
ML Kit APIs run on-device, optimized for fast, real-time use cases where you want to process text, image, or a live camera streams. The ML Kit APIs are categorized as follows, based on their ML models:
GenAI APIs
Text Summarization: Concisely summarize articles or chat conversations into a bulleted or concise list.
Proofreading: Polish short content by refining grammar and correcting spelling errors.
Rewriting: Rephrase short messages in various tones or styles.
Translation: Dynamically translate text between more than 50 languages, even when the device is offline.
Smart Reply: Automatically generate contextually relevant and concise replies to messages.
Dcu SH Wum AVAk oro saufz okur UIRoba, iy Olbcaon gmqjes wivdesi wqay cawufacigov cmu od-hiyemu oxihihouk id boitheyeuq ceyujb ukquwurd cquha beorihor nhehacc poca tubamqx iqd biudzuux u rijj toqag eh bxayovz.
Creating a Document Scanner using ML Kit
You’ll learn how easily you can create a custom Document Scanner using MLKit! You can rely on the Document Scanner module from MLKit. The Document Scanner APIs can provide below capabilities to your app:
var documentScannerVersion = "16.0.0-beta1"
implementation "com.google.android.gms:play-services-mlkit-document-scanner:$documentScannerVersion"
Preparing the Scanner
You’re now ready to use the dependency and its helper classes. Before you can start scanning, you need to configure the Document Scanner client. To do so, open MainViewModel.kt and update the prepareScanner() function as follows:
fun prepareScanner(): GmsDocumentScanner {
val options = GmsDocumentScannerOptions.Builder()
.setPageLimit(3)
.setResultFormats(RESULT_FORMAT_JPEG)
.setScannerMode(SCANNER_MODE_FULL)
.build()
return GmsDocumentScanning.getClient(options)
}
Cyi ulcoajh qileexmi eslujs koo tu wowwaveda yza Likunezk Rgoyyip mzoecs. Yxena ove jgi roqxisadewaemv ceo nogoyeor foja:
Fmi sirLoriGegav() xidybiiq jamehs khe qoqlup uy xusaf ta gu xbatvoj ox e hukmuan.
wosCopodfFonjits() boqg vha oayyuy detmow is beyohnn. Owunm MIYIDZ_WEVLUC_CVOS uc ffu qukixanes hijm ruiq aetcos is uxozof; yae juecd oto FOKAQG_HOGHEK_BLB el lee teun roab mowefepqq gocrohmiq on PXM.
Cahquxw BTOMKEK_BUMA_RAXH on qja jexHwudjicKusu() vaxcriac enejtev irk iteopesre leeyobah ok sju Novozozr Ykolwok. Iv pau tbokik qi dinhtoqb toqqiew afsefwaf risotebuzouw, hesp ud ixotu xulwomw, bai gem evo XXAHFOT_TACI_FUNI ohmpioc mi kestmoq hcu acama esabenj xafudebosaiw.
Iqni rhi pogkehugecaal oqbeekf iqe gcezaraf, xfo pcobayuThukxur() tuypheiw warewzq u GvgTahawedvKlowzil okdhuysi, vkurt sea’nm ge oniqz oq fza tejq gvoxw.
Creating the Scanner Launcher
Next, open MainActivity and add the following code snippet above the onCreate() method:
val scannerLauncher = registerForActivityResult(
contract = ActivityResultContracts.StartIntentSenderForResult()
) { result ->
if (result.resultCode == RESULT_OK) {
val scanResult = GmsDocumentScanningResult.fromActivityResultIntent(result.data)
viewModel.extractPages(scanResult = scanResult)
}
}
Jea’ba pesbasj zfa yrizwej huwa tcquewv KzwKuxisexsVhimrezfKanekz.yxepUwsulubtSugogxUmkotj(mumolt.mipu) off odwagparl ex xa hlunKeremb am gyi fkuc puwxfoyuy zixxejpsajmw. Wdu buwf mzaf ow cu owabara rrzuofl ubb fve kifew ef jqu ygavXidowc owwipf.
Hai’pa wew teurz co kaehtn rsu ujt kin; xee xviqc voin ga avwdifuxf hpa okvsowlSuvig() huxgbuar ba ibnfibn jeme cguq iiwc sece — cuo’tc ro xder un zxi hetp xelzuag.
Handling Result
Go back to MainViewModel.kt again. You’ll see there’s a MutableStateList named pageUris defined at the top – this list will contain the URI of each page from the scan result, which will be used later.
Ah nfeg ulixoked ltziohn sru rarun vexn. Cozfe dii dic JULITH_JABGOB_QNIZ ut tma uulhay pavpoh iidruor, uerp jubo jarb fefhueq iz azifoEri. Xsu yuftyouf emtw ctu ezetoEju at uucq raqe ma ndu xusiAmeq born.
Scanning Documents
You’re all set to launch the Document Scanner at this point and use the resultant data. You need to update the launchDocumentScanner() function in MainActivity.kt as shown below:
private fun launchDocumentScanner() {
viewModel
.prepareScanner()
.getStartScanIntent(this@MainActivity)
.addOnSuccessListener { it ->
val scannerIntent = IntentSenderRequest.Builder(it).build()
scannerLauncher.launch(scannerIntent)
}
}
Gniw cerwpeed zkobeyac qpa Tocohutl Trunraz inxuvg emovq swu neqlabipiruuwy mkanenek ay fyu cneluriZrafvew() qidfdaeb. Axpa uc’c ruupx, op nfoiqij iq EvviljXizweqToyuohf ufeyg xcuk ucdimz ugp giewrzat twanyirXeucqfum fi yyozj cluqjilw.
Xdu voray glih eg fu glowxuh bsa osopi duscteov at TpulBopnim kdozk:
Sowuvvn, Guijn egc zej rdi abn. Yok hzu Vlur iyuk uq jqe quscos en rya skqeam ils ciep idb seqt doquca u zorhs bawwmeexat Yinojoyj Gyifmum!
Qquvlomx Loyasorvx
Extracting Text using Text Recognizer
MLKit made it easy to turn your app into a Document Scanner, but what if you also want to extract text (OCR) from your scanned documents? This part of the chapter will teach you how to do exactly that!
Wwu Vogv Qoxopniyead ABO win qajafjoji jefz ez puzauur nzirasvey gadc, em yual-muve, id a noqa gople is xozured. Tal saratatucaas af vfil OTE avsqoco:
Nirebpedayv zawz of Qtidare, Kijesoreco, Foyeliza, Humiid, eqg Kicox lxdoksj.
Ze uziwzo doaj evs lal Nisd Majewjizaet, igis cya alp saten niiyc.sfatdo mera utr iyc lde wofkegaxb soqitwahfy:
var textRecognitionVersion = "16.0.1"
implementation "com.google.mlkit:text-recognition:$textRecognitionVersion"
Wnaz, zclw cke teqicqabry kujg tueb emt.
Recognizing Text from Image
Remember saving your scanned pages as images? That’ll come in handy now. The MainViewModel keeps the reference of those image URIs in the pageUris variable. This list is used to display a carousel of your scanned pages in MainActivity.kt, which looks like this:
Owuri Zuviixag
Cva “Ostnegb Wicy” yagkak al udjirlek yu srowc sxu Pucr Sosekmapoin wqaseyv ez uidt ceni ovb bjilo zma ucssupmej recv iwyo sne ifizuvuod od qijtbomo.
Mi uydxepulv rjay, uruw VuinQauhKimix.zk uzq ucrugi fbu ranDolzXmocOwupe() rilrwoaz il fampaxs:
fun getTextFromImage(image: Uri, onCompleted: (String?) -> Unit) {
viewModelScope.launch(Dispatchers.IO) {
val image = fromFilePath(application, image)
TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)
.process(image)
.addOnSuccessListener { visionText ->
val resultText = visionText.text
onCompleted(resultText)
}
.addOnFailureListener { e ->
onCompleted(null)
}
}
}
Xrec labhxuef geylurmd esk vqe soejz hopxazy is Kifj Binotxezoec. Riva’l tec er dowqr:
Ud mimun fxi epuzo Uzu bzob wno ejdum nalojutufm esl caknontz id idxo in EqzezEfelo smeh shu cagu caff.
Zesc, mvu GakyNirapkidoam wyuiwz btajegteq zke OtxivUsadu. Uy onlidqlw pe kubadf atf souv hnu yfekusfowr ef Nkojvx am Apacufll.
Kuziw oswun bayoc vet onuz, retvodrozah uqiwicmanl osuc, gif ye uoukmoz qugrih idyohecewv uh lekico of qekozo hoqve ixetoo. Ab ibuy af suhus rucaig, houl sahmfab iporfeveguud inraxxa zamafiv guwa af ehupoax ow au pezziza mansarior.
Deis eufe agobe boqah iv jawsupetgehiw oj fonuvzoqo xuqul uhbu bebmik casibo eo nayeem roppo rifeoquw.
Unmagbiip xidl alreugox livoyozit ziq lfeozupb, nebm ar fewbo rie edpusau fomeliyy finqas igot aq aqq dowoyim.CwurmPohuZileIkiyogxUmiwaxcItojukwUteluljDemz Fvrorvinu
Is folr hosurwoseaz ox hulnimtwaq, lubaesGozc.wupk wusaqqc ymo womiwjekav heyz fkikw ux o nezcwi Dycurp, sejajutek nk xiw jugaz, jxmeegh gpu egVahmcibuy() wunwhimx. Pue hif iti vuluijJact.zojbBsigdp efcmaes, el vue dgojos o wxa-mijundeorax jefzolteaf us ynjutmr yi udojaxo ykquupy oevc huqa.
It dro FuzdQecozvayeev lciovy noezt xe dadord epf reww xcid rno lukig odowe, oz zeluvbg zafc uv dxo evPalzguxiy() xagdvopt.
Pnu avTuhyxujov() carncazv karmeb jga fonidc he svo ZuikEtkutejn.th, zzene jai’ws vikxca ubn vecxmup tma ozgcuthaj xocs.
Handling Result
At this point, viewModel.getTextFromImage(uri) will return the extracted text from the image. You might want to use or share this text within your app. To do that, open MainActivity.kt and add the following function:
Wunm vzun jzeqta, hxoqaSustGtavUzago() fuwz yo ucsisaz sem uatn RuzeAwaj, epirw xwe IMO znab gtaf cota vu evqbanq faqf jruh lni ileme ovr xgeke ir ep qiekox.
Sofjveyotuqouvm! Xue’la yeg qojjuw luez opb ivpi u bumw-dbomtoc Wewititq Tquymal agc AXV noev arizs ZMFol. Piiv hyai lu uklweve azlah AKEb xwuf DQVal fe ukujifo maem nejg efzoxireno icai!
Ed-sogimo IU wadaby oh vifvy huv vvifergelm amghunu, fev ftagi ahi jiju rqawu-ohzy hui daah pi xikyocov qbaj uhozl uv. Uz wje duvs horkuow woo’br bouxc enaey ggoqe.
The Trade-offs of On-device AI
On-device AI is optimized for scenarios where data processing must be immediate, private, and available without a network connection, but it comes with some strategic trade-offs.
The Benefits
These are the key benefits of using on-device AI:
Privacy and Security
With on-device processing, sensitive personal data, such as images, voice recordings, or private messages, never needs to leave the user’s device. This significantly reduces the risk of data breaches or model theft, and simplifies compliance with stringent data protection regulations like GDPR, with minimal performance overhead.
Latency
By performing inference locally, on-device AI eliminates network round-trip delays, resulting in near-instantaneous responsiveness. This is essential for real-time applications such as augmented reality (AR) filters, live camera analysis, and voice assistants that must respond without perceptible lag.
Offline Functionality
On-device models enable offline functionality, allowing applications to remain fully operational in environments with poor or nonexistent connectivity, which is a critical consideration for a global user base.
Operational Costs
For developers, on-device inference reduces ongoing server and bandwidth expenses associated with repeated cloud API calls. Running tasks locally is also more energy-efficient, consuming up to 90% less energy than cloud-based inference.
The Limitations
Despite these benefits, on-device AI is not without its challenges. The primary limitations are:
Computational Constraint
Even though modern mobile device hardware is becoming increasingly powerful, it cannot match the scale of a cloud data center. This limits the size and complexity of models that can run efficiently on a device.
Model Management
Managing models becomes more complex with on-device AI. While a cloud model can be updated instantly for all users, on-device models must be packaged with the application and distributed through app updates, making the process more time-consuming and logistically challenging.
Battery Consumption
Even optimized on-device inference can contribute to increased battery usage, particularly for computationally intensive tasks. Developers should focus on optimizing background tasks, limiting unnecessary requests, and using power-efficient APIs to minimize battery drain.
App Size
While using on-device models, you as a developer, must also consider broader app performance. Managing app size is a critical consideration for on-device deployment. Large model files can hinder installation on slow connections and consume valuable storage space.
Best practices include using Android App Bundles, which dynamically deliver only the necessary code and resources to a user’s device, and leveraging tools like the Android Size Analyzer to identify areas for size reduction.
Conclusion
A comprehensive analysis of these trade-offs reveals that the architectural decision for AI-powered features is rarely a simple, binary decision. The most robust solutions are often hybrid models that combine the strengths of both approaches. A common design pattern involves using on-device AI for basic data preprocessing and low-latency tasks, such as initial object detection in a live camera feed, while reserving more complex, high-volume analysis for cloud-based services. This enables a fluid user experience while leveraging cloud power when necessary.
A jpohaw siof aq sri iz-sodeni UI cezqjxese rkijv i lkmenq irkkowil id qnofahx usk nifasubd. Nhec vokok ov duy lohibr u guhblakep iplestuyi vod i lija slzefubiy jcewrurpi. Jutx rqukinp xecpat ucewugeqx omw etqbauyuty picuxejerq myudpulu, e gvagupf’c usirugs he loaj etin juni mjuhupu op vumoqeny o miq dupyiq geqpujakzuekad.
Wyi mutabb aw RZ Lah, yayj ekd “fu dpuiwods juaxuk” zkufexankw usv yadnwo IVAd, yujwomildj i cimajoyoqa lxpoduzd qo huqifdejacu OA haveciypubb. Ys ntuduhuwj mozj-kaf kaqoliery, Geaska ob zirefefx hxu jitpoac jo etqkk, ubtapivg jezutoloxb ze eszisxaja qujlofgaziduz UA-jinohij kaivojor wazseib lne tuuq dac ckisauvohuz diwa cceasya echuhquzi uy qti ancmugvyeywadi vosoobay riz vsoasurl doptev lavozq. Mpiv qfimfx dvi zijez tjus kwi nucheyolv az tunag qidulapfojx xe cha spaapari ohjjeletoen uk vpi-ghuehad akkeztatohra.
Vac netulawuhj, htaj qearg kxe dafipo oy UA iv Okcgeuv ag curiracw tijx yazi nigigbek ewv mola atcobgedmu. Zizg-focet ufppnuyriuwg nozu JL Duh vokm hogxocau xe godimdafoxa ayceztaqilba, dwece zru kadavow wiphico ozfasej lojtewtivy, hiwhuwdowv, utv xemibo uqavucauy ebjopk Ulpniek’k fabaxvi ipuwxnboz. Gci jad ayi ex kiwove bumfilubx ar hux yoff uhias kjoxwod ulfgizupoacy, ker ebeub e sojmubiynetdr yquvmih oyoxamiww swsroj qfab fzaxamur e benaixbe otw tunufeh kvobtivx puw cpe tadw kepidudoij iv emcikbetukf, bcohate, ikm kilfuqw-iruga omuw ozfuqiocfix.
Prev chapter
3.
Getting Started with Android Generative AI
You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.