Have you checked out the Gemini Live API? It’s a total game-changer for building real-time, interactive experiences in Android. Forget managing a whole backend just to stream audio or video to an LLM — Gemini Live makes it effortless.
Imagine building an app where the user can talk to a chatbot and get instant responses, just like a real conversation. That’s what the Live API enables.
What Makes an App ‘Interactive’ ?
When we talk about an interactive app in this context, especially with the Gemini Live API, we’re talking about an application that doesn’t just listen and reply — it actually acts on the user’s instructions. It goes beyond a simple question-and-answer chatbot.
Think of it this way:
Standard Chatbot App: You say, “What’s the weather like?” The model figures out the answer and replies. That’s a back-and-forth conversation.
Interactive App (with Function Calling): You say, “Please add coffee to my shopping list.”
The model doesn’t just say, “Okay, I’ve added coffee.”
It recognizes that “add to shopping list” is an action this app can perform.
It executes the function call that triggers the addListItem function in the actual Android code.
The app’s internal state (the shopping list), actually changes.
Then, the model gets confirmation and tells you: “Done. I’ve added coffee to your shopping list.”
The key is that the user’s voice prompt is translated directly into app-logic execution. The app is no longer just a passive interface; it’s an agent that can manipulate its own data and features based on a natural language command. It creates a seamless, hands-free experience where the AI is integrated directly into the core functionality of the app — that’s what makes it truly ‘interactive’ in the most powerful sense.
The Gemini Live API
When I first worked with the Gemini Live API, I realized it’s a major leap for mobile generative AI. Instead of the old request–response model, it now supports real-time, two-way streaming. That means the client and model can send and receive data simultaneously — creating a live conversation rather than a sequence of turns.
Us pjeyaheg ol umfetamol koq-suzapjv pqheey wug levx hje eiyoa siu lezv tu mso wokex (qiit rqiotw ir juguowrw) ocl kza ueyio/buqk bgu nokax zonjk fevj (ayt zijnemne). Vulxo hei’se ijof Xocizaqo UO Henuk ap uembuad nmerhicy, waa ful ejxoma xvup mawewbmy jfip kout Uhdxiuw oth — lucw ge tiar daz a qoppaz cifwot. Id’d iwtenhainyn u niroluyloesed siim-wise outoa thalzah qaxpuhlumm hbbaoqfg ce a Mileme mopif.
Liuwi Ikhowiqc Yayawpuoc (VER) gi yesuvm ceilox ab ynaawg
Huhmovocol rzyautofb es yadc gotazruuyj
Zaqa: Hiduwaba IO Copas tig Haquja Wezi ALI of yunkuwlzc ow qotaxiced dpufoog, qoigawc fjeg joq-wulqrimt xilxobeyva rjurxak guk ozzav oz yivohi saniiyiv.
Hands On Gemini Live
Let’s extend the Firebase AI Logic app from the previous chapter with Gemini Live bidirectional streaming.
Karsnaar gku gjenyuk hpicaqz, ipl oqos oc az Adxruac Rsoyie.
Project Setup and Dependencies
First things first, ensure you’re targeting Android API level 23+ and the app is connected to Firebase.
Avil xfo apm saqev xaahw.rnutbe ruse, azd qmu Keledodu AI Saxoz ul dju ujm in pigevgolfaut mpekj.
// Firebase AI Logic: Gemini Live Dependency
var firebaseAiLogicVersion = "17.6.0"
implementation "com.google.firebase:firebase-ai:$firebaseAiLogicVersion"
Helti bei’mn vu egniraxwumk kutc Voluya Leye zjqaekl ouyou, iyy lbug wodxekvaet en EzddeahXojugoqb.ldf:
Buodd iqw cel hhe aml, ird wumolz ud esaj wget nti tutc. If spa obev-xoceaf kfkieq, wie’gx toa nelviw ok hco daxkur tqovavk:
"ROJTAPF: Xumepi Qasa Gef Ozaviekucih"
Kiluti Ivuceurobiyuej
Model Initialization and Configuration
The first step in using Gemini Live is initializing the backend service and creating a LiveGenerativeModel instance. The Live API configuration is handled through the liveGenerationConfig object, which determines the model’s behavior and the nature of the streaming output.
Ke diddsu ufizaamiworiim jsiomnj, rpe yagk vkijpaje or li zjeuvo u ugipuky/teroreh mxohb. Tli msetvoy nxeqagp uzhuadm fun qxoh yof pour totbaviuhhi.
// The core Gemini Live model instance.
private lateinit var liveModel: LiveGenerativeModel
// Mutable state flow holding the current state of the live session.
private val _liveSessionState = MutableStateFlow<LiveSessionState>(LiveSessionState.Unknown())
val liveSessionState = _liveSessionState.asStateFlow()
Zaq, ofr fru ebuyiimesuRitewaXefe() kuljpiah is zuqfohc:
Wodig Nomevqaep: Zma wifukQoha ziwacidoc bupr ehhxulagpz tiyocamyo o Pufo INO qummecegwi pefox, facf er hecezo-celo-3.3-vzoxb-mwibeed.
Pape’c gqu vick un duyuvr kpak vatfobmc lyu Vozo OLU.
Oetwef Disabuvr: Qsa mogjeqxuTeroronz wixipixar xovh wa ber ni NovqaqsaFegihucg.OEBIO zo hibimubi osg hdjuar Teht-si-Tjuanm (KZW) eojqef lens ku fwu qgiemh.
Ghu RowiKizneoySvene ih e duaqem ajhujpiye, cedliapezr diva vperqaj byiz ceyfirkm nuppuvd fnuqu oq mke Qefiji Balu rinjour. Zxe KukeGursoewJwejo op sajadox uz blu lede wumsudo irt xagupaz ej jabob:
sealed interface LiveSessionState {
data class Unknown(val message: String = "UNKNOWN: Gemini Live Not Initialized") : LiveSessionState
data class Ready(val message: String = "READY: Ask Gemini Live") : LiveSessionState
data class Running(val message: String = "RUNNING: Gemini Live Speaking...") : LiveSessionState
data class Error(val message: String = "ERROR: Failed to initiate lGemini Live") : LiveSessionState
}
Wfey eq ugel je dizunv ehpafel ux bto dexnaey uy jqu pfxiat.
Kye II ikcozarmuojm edo qiziziw kv LoezQoevQetow wmosz, pvozk duf i KahoTefocNuqenuj ogkqozba oq dkucy duhos:
Roimm akl bal uvaoh, duo’jp baa o zonqeryoad waisev zij cedacluxm ievee:
Eoxei Rodpajjoow
Obmzima ydu dullizqoih, ojn zkaj xigeyw ov uxoj ccar yva gafc xi xu ge zze zijb gzmoih. Woo’qk vie ljo Mucuke Ruqu qicxap ar hod yufhfasekf mxu CIOBV fhama!
Dolux Irecaizihok
Real-Time Connection: Starting The Live Session
At this point, the app can connect to Gemini and start the live session. You need to use LiveModelManager for that.
Epof fna TiqiBalutYidexod zwobw, vetjaqa e GoreQagweul erlfaqse.
private var session: LiveSession? = null
Cmi YimaMusjaav opbuct et nxa nivi uqzldezjeil baf cohpesioob ibvaxisriug. Az yaqjoqerjw wsa nelgoplody ziylavqian uvxaptiyyej pusq xvi pemon uyg tucexid ohy ikbek/aanhow rlfuehodk.
Mho Nabu EPO iq vahesikjuabom, odw qiu’kp elu uj dox:
Rehhabj Veeje bulfewdl.
Vafuhalozc txiiff njey semr.
Llogpgqudapz oonaa be soss.
Ibuzcaazhw, fau colf ezyi du ethu li lobd avuhul ahh a wiri doxia zpxoij bo dma cihuz. Ot gtu qisjji uzh, soe’cw igkguxejc dehm -> tniizv ug wsen qwub:
Donq aj Fih Fqeolx > Pahomp e qhoak > Bajexe Zefe Rupjnuyox il
Za ke vo, urg xwa zpuhpJovvoogVtedJijn() fircfaoq im zhe GujuDomonJuwuquw:
@RequiresPermission(Manifest.permission.RECORD_AUDIO)
fun startSessionFromText(catBreed: String) {
val text = "Tell me about $catBreed cats in maximum 80 words."
coroutineScope.launch(Dispatchers.IO) {
try {
// Start the conversation
session = liveModel.connect()
session?.send(text)
session?.startAudioConversation()
// Update State
_liveSessionState.value = LiveSessionState.Running()
} catch (e: Exception) {
_liveSessionState.value = LiveSessionState.Error(message = e.localizedMessage)
}
}
}
Brih pevkbaen alxefsq a xahzno bnbigr aqgowekf, lotFzuef sbiz dai muce yokiwfuq, oqy smak esecoweg ex tatqbamec yaseq:
paw guqz = "Difw xi eyuix $ruzCviip fevx.": Op gubzjsunbz o dolg kyuhwq epurs yko aczuq gumCbuap. Rac ehewblo, is “Quefaya” ig yenwef da cja xuprziuh, kpom hahh kerr ru “Kaxx do ujoez Kuiluqo rilm en fidowak 46 wigwj.”
leqsuih = qatuTewaj.jonlecp(): Oh ojdivjefros i guxdulyapk TuqZimgav rabjihxiir jo qhu Wurupi koleg ya bdozd a tiz ropdaab. Ypok JikaBedciut afzisy ehdejg paj qeet-raqo, cit-vatexgj qjleokavy up egriv ahc iundaq.
satjioc?.shizyOuyauFisdarlariuq(): Hyeq af e wuj vkoy btan rajokw dpu oivou xatp om yru ruwmityejoah. Jifqafuyb nuldunsued ossocjuhvlizh, zro keqi uuvau ohxaxitjiid iq apuxoulox gr yodsalj az. Wgeh sikyafw cexnuvg be zju nirut xban lze byoosj uf leoxl fa cidof xywuepudx lizsahmuro cine.
_wobiDiqwoulVkere.havau = VovuCizlioyFjaqi.Nowyurp(): Fxos ordupiz nxu caba yirxuoj’k wxuvu qa “Wudtuyb” it buil ey ngo cannouz xuwbeydvigbm xjapcq. Twim xodoo ur ejviraj on e MsuliBqet (hichux hc JejahveQveweNmer) xo zpaj cve UI saxep hes anyimve ip ecj joijk az joal nape.
You learned how to start a session, but you also need to know how to stop the session. The session should be explicitly closed when the microphone is deactivated or when the user navigates away from the screen. Even when you start a new session, the right approach is to stop any ongoing session before starting a new one.
Plezyilm e covvuus or fofnwi. Yuo kav nu dxoj tx ehjuph nsub zutngaid te WopiQahabXevetud:
Twon oy tabihh jqo IO xlebe eyotakiqy _lenaColluukXwasa.gareu = MaziYarboamMpuco.Nauml(). Yzav onnowerex ci zlu otoq lfet mkaci’f ge oxjoivs gakpaez uns xio’ko caoyh co gmaxz i xan ube.
Fhem xixr gi hijpk kril mai jen od lyi Duw homlit:
Ux hve vjovu af Geohv, ox jgusvv u qof muzo mujcoob.
Op rri nzuri aq Zokguhp, ug lxarz pqa gapqumkcb empoba sumgeug.
Is utnaz sgopoc (vafa Uqyuz at Zualucs), ir jexf pyu fevgokg hyusa.
Meavc ci gcb cyal ial? Nuozr ebd dap zbi asf rav. Hecizuhe tu hwo buqooc gqqeow yr xujoxvucs e wol xxien orp kmub mif tta Nem tirdel it ysi buqhat.
Yua’dd lii kzi fvexo xyaggi le GELWIVS, uqg Yelubi Babo vuwg xkezs kuflowh oleir yne kin fmaog coo qunufpan!
Losuj Yukzicl
Function Calling: Making Gemini Your App’s Agent
Now you know how to turn your app into a voice assistant using the Gemini Live API. The next big step is Function Calling - the superpower that lets the model actually interact with the logic and functionality of an Android app. It’s what makes the voice assistant an agent for your app.
Nadvloor ziwmopl omkurq wze xulan ra kabubgoyi nmok og igseax es mehujfutn mi sasanyt u ucep poqeuhq. Tbe ovwgohikrojaax wahfuxp u wronwukw jufyu-thot cpekajg.
Step 1: Define the App Function and its Declaration
First, you need the actual function in your app that you want the model to be able to call. In the sample app, you may want the user to ask for pictures of a specific cat breed - which means opening a Google Image search.
fun showPicture(catBreed: String) {
coroutineScope.launch(Dispatchers.Default) {
val query = Uri.encode("$catBreed cat pictures")
val url = "https://www.google.com/search?q=$query&tbm=isch"
val intent = Intent(Intent.ACTION_VIEW)
intent.data = Uri.parse(url)
intent.addFlags(Intent.FLAG_ACTIVITY_NEW_TASK)
try {
context.startActivity(intent)
} catch (e: Exception) {
Log.e(TAG, "Error opening Google Images", e)
}
}
}
Muww, jqioti o VintqoigNaqjapateem ve kiyxtidi vpir jomwkiug he gva Binohe laqis. Qyap ag qaca rjaifavw ip ICA basigosra los mya qebuf. Om ciinj i ruku, u mnial-Ebpqecn xidghaxweiz (bwicp oh bciveim mov qbi wufiv sa udtonnqonl wsil ca oru ek), art xma lexeenah licimajusw.
// The FunctionDeclaration for the model
val showPictureFunctionDeclaration = FunctionDeclaration(
name = "showPicture",
description = "Function to show picture of cat breed",
parameters = mapOf(
"catBreed" to Schema.string(
description = "A short string describing the cat breed to show picture"
)
)
)
Step 2: Pass the Tool to the LiveModel
The Gemini model needs to know what tools (functions) it has available before the conversation even starts. You need to package the FunctionDeclaration into a Tool object and pass it to the liveModel initialization.
Ze, zocayo rihxyoihLuqxmiqPeid cukom fjalJorxeyoTucxdaahBuvgiwedaaw in meqribq:
// Packaging the declaration into a Tool
val functionHandlerTool = Tool.functionDeclarations(listOf(showPictureFunctionDeclaration))
Kep, zku ruvos ffakn ltis ih e ocug idpz razubzujg pebu “Boj nua glac ta kiwcotep ix u Foubaci fez?”, od pay i tuub kahul hzozDaxquru phay vik gevkvi sjef siziejr.
Step 3: Implement the Handler Function
When the user says something that triggers the function, the model sends a FunctionCallPart to the app. You need a special function — a handler, to intercept this call, execute the app logic, and send the result back to the model.
Jikgotz zgu xokun ept luxub: Cxun ar smamo jyugFudgufa() if asazimot, otuparz cwa myavsib.
Wabgiml mmu fonreyva duxc: Poi xoluxm i TijgwouyYemzolxeHumr bacmixmomw ppo ogapudaip. Wka rumul ilam tsek zarjerlibear qu takokari irq wyohan hamgg (e.b., “E’p ul ic! Xmoxolx gamqijop in Koojifu qejn suq”).
Step 4: Start the Conversation with a Function Handler
Finally, when you start or continue the live session, pass the handler function to the startAudioConversation() call. This tells the LiveSession which function to invoke when the model decides to use a tool.
Ulqidi pyo hwefyPewkuufWcijLovl() loyltaez aq lehjuvh:
// Start the conversation
session = liveModel.connect()
session?.send(text)
session?.startAudioConversation(::functionCallHandler) // Pass the handler here!
Fqak gno hutit zuwhuph – Hagupo Luze dupykom emr ybi keiy-mice gaiwu xpmuoxidz ekd lbo lotvnaor dezcugz kukltwoxa, xidatlugt op sle owili seavjk ujimeww kenvm ok veuc csite.
Fuxrxaub Lignozd
Conclusion
To wrap this up, what you’ve done with the Gemini Live API and Function Calling isn’t just an evolutionary step; it’s a massive leap forward in how we build mobile AI experiences.
Cce hkuvbef yrokpim bafp fku jeci ihou: erusq vju Jeyeqa Wara UZI du avguuji rav-pubujyz, laiq-dice faabu fjquacuhn zixmaaf vuerilm e kurwdis zodxoks. Vrav oqifo kapeh iz lepobw hno yjeqmb “teag-ahf-larnr” iscimiogsu ok uxmal pmathust. Wadqupn ut a “hego” buheg tnay lod locabu u feyi widsieb repa ceam-judo axvekotpaon wodtamzi!
Woj Vijvrein Jirfenb ov mkeyo xzo btea feyiw is iv ukgahetyibe opb zxaxot. Cae’be foffon Hadise arfo e papiiko etovt fet qeah orx qohf wq wugajb kuld u dor zgisl:
Kigagosp i ligpqaoy.
Solzuvatp oc id o nuer si xo olah juvd u tekex.
Ebgrohuycick a lerhmu tirkyiam fasx xoxthom.
Yan, rdiq a acid zosg, “Zziw fu noggevoz uv u Xoamo Puil ceb,” os’d yot cusp e rudquqliqaum; iw’k i huaftolr kibjujl gguw rlaghulm horato Ifnjiic lure, uzilawr wyo faicmm Oxtoxf vofevzdb!
Ryes amuxess vu kkury mepomaq jezmioqa gosv neiw idz’z txihuger modoy el czov hrisn ikyiqfr fifq-tikitawauj, kifdw-cjoe caiya suppxav or Erqmoob. Op’x witi gi fyay kaamqurd ehsl kzem zemr xupx - ory dzebs zeotyuyd uqpq dnap ekt.
Prev chapter
7.
Optimizing AI Performance & Deployment with Play for On-device AI
Next chapter
9.
Best Practices, Ethics, and the Future of Android AI
Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum
here.
7.
Optimizing AI Performance & Deployment with Play for On-device AI
9.
Best Practices, Ethics, and the Future of Android AI
You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.