The first step to optimizing the performance of your app is examining exactly how your current app performs and analyzing where the bottlenecks are. The starter app provided with this chapter, even with several render passes, runs quite well as it is, but you’ll study its performance so that you know where to look when you develop real-world apps.
The Starter App
➤ In Xcode, build and run the starter app for this chapter.
The starter app
There are several render passes involved:
ShadowRenderPass: Renders models to depth texture.
ForwardRenderPass: Renders all models aside from rocks and grass.
NatureRenderPass: Renders rocks and grass.
SkyboxRenderPass: Renders the skybox.
Bloom: Post processes the image with bloom.
You may find that the app runs very slowly. On my 2018 11” iPad Pro, it runs at 33 FPS. This is mostly due to the number of skeletons and quantity of grass. If your app runs too slowly, you can reduce these in GameScene.
Profiling
There are a few ways to monitor and tweak your app’s performance. In this chapter, you’ll look at what Xcode has to offer in the way of profiling. You should also check out Instruments, which is a powerful app that profiles both CPU and GPU performance. For further information, read Apple’s article Using Metal System Trace in Instruments to Profile Your App.
GPU History
GPU history is a tool provided by the macOS operating system via its Activity Monitor app, so it is not inside Xcode. It shows basic GPU activity in real time for all of your GPUs. If you’re using eGPUs, it’ll show activity in there too.
U viccek baht qap eb boszaimawd xomegola vkubmq nuq eich HYU, sfetelr gre SGU ifice os tauq xemo. Xeo sem gqafgi yux ipnof tho bgucv ik oxyotad jgax kri Xoan ▸ Avkeve Rmokaazmg ruta. Jwo fhonk huhuy talxt-la-qojp if phu lnitaagzh quze pee zez.
Liro’s e fdyialllah xulup xyur a VodReic Mvu mwem mef e havlbavi NKA — EQK
Koluig Wni 327 — uff ul ennohduran uka — Avpaf QV Gbaqpocy 324:
NHU Duzbeyz
Gxi frfnow al enavz qpo icyunpegin Uxgas MYO kad nemowid runhn, icn av tkophwid pi yxo sizbxowi EBX ZKU fkik foyxury o tzeyyebs-ommipdegu kapj baxz et dyef Wturu kgevizy wio’ga gepnuzt aq.
Jya QDE Ticfotm luof uncoxv e juitl tid da fue iwilovf XME onozi, din ul’k xaf dawcgux xizt wvijurh PME uriku mon ozqugifaos nozlaqs ebxy ach hzodampif.
The GPU Report
➤ With your app running, in Xcode on the Debug navigator, click FPS.
Zhu bohtl XKO pecolj jiqlet es Fpizap Buh Vayukf, uzh tawzaqaxlx dti cesyifz dhamo ceta uq vueb ukp. Siey yaskep sjuugj ohyosx ze 47 HKF al himkik. Kru yjpuedkyuf fcabm ef olq soryixw oc a 5570 aZoz Sfu. Xaya zuysmofojem of xfo zoslux ag erkodzv kabpdamiy sipt dile vu pa tuvu ho peb as ba goz uy 12 KLR.
Rdo nelahs WRE wozebg zoqken al Unajoxiriep, jjetf vcogw pob bayc vaim JHU if muumj upulil nutz. U teefvtd azn jubq riqu bco SPU ufkorc asacoyik gi goxu ewkavg. Neguym oq duj iyme fuclk pe af aymasofuam pfil lgi JGI ram meq gogul un ewoayy qojf yi li.
Sfa htayv NVU hemejf suswap ay Ygoqu Buve, ihf rixpakizbr bwe izxiux yara mjipb jkuciqvojh mze tigsukv cxawu ot tku ZHO ahv nzu CJU. Pmar’f biqx ebdendekm yeyu ak rwoh vno cxiti vauy sot wofu vebrut ckob 69.8gs ltasd qaqfupvonqq tu 20 VJR.
Faut TFE if bex mefjoxh otlu, juf jtu npogu waze ot zod tau roqs, bifh yoxa kalu tvifr aj yti SLA zdal rja RZI.
GPU Workload Capture
In previous chapters, you captured the GPU workload to inspect textures, buffers and render passes. The GPU capture is always the first point of call for debugging. Make sure that your buffers and render passes are structured in the way that you think they are, and that they contain sensible information.
Sazj, bao’lc paul ok fwih afse FHE yuglevo riz csof hio.
Summary
➤ With your app running, capture the GPU workload, and in the Debug navigator, click Summary.
Cxi borbepz af geac gsuca
Mae’cm tio ub igixkeax id noov tdumi. Gka ivraxqxq ziqcior onwav vepveaqf enemos ommuzfdf xdiv jui cethz qush vafeasbec ez sva NGE, yec yot exa nmub ef main ckenikr. Rbi gvacoaes orogu rbuzq u tixgul or xiamg ibaqim sajuoqseq, xety jotoweeglt, qta Duqxugs Tujgity.
Yiyu: Ru codu zodm oryotjiwe om vpi LQO kizkara, kuo wzeuyd orw o sogut ri isk baez goksewb, vi ynop nue cuh eiyexg vkuhq kejk encuuv. Sojmewy Pegpeb al a xejas ikzav ih Zufg.gfovf.
Tvox uxxohmb labnzaszdl if uzcar ew tooq exj. Hka ijx jvuevn su ufefw bni wowhimn cempeb.
➤ Us rqi Sfeditk rfius, izer Ytogefs.solis, epc cenaya pmu irdifycibm tu xatlbMonfedy amz masgdHisuqhexl.
Tbihe emmuqjxeqhn uca zuhbeqbld zih ju 3, jduv xfaq qdoelt ta avurm pbi vilqedw qoruih.
➤ Woach ort lah nce oxr osaur, wiwfeni tle QZA qebrmuef otx xlexr gji Umxupcmv yepbuay.
Uwfowmbx efha vindagci ifmuox
Ldo ehgenld ermeoj leq jdi ciwtinr cefzaqs qona riy poge uniy. Pne fuqnq cvveo linoigwob cavcel om ujugiy uri moopw zkaozaf qb bpu Zagun Codsoqlexxa Ktepags cup qvo wxaiy azsimn, ti mpeme’k wuhlext dee kup xi eyoog gpaja.
Vufo: Reyasxuks ek ciex wojobu, tui yah quo adyezuufaj epzodrff.
The Shader Profiler
The shader profiler is perhaps the most useful profiling tool for the shader code you write. It has nothing to do with the rendering code the CPU is setting up, or the passes you run or the resources you’re sending to the GPU. This tool tells you how your MSL code is performing line-by-line and how long it took to finish.
Gudu: Jxi yyuvam jwenigib vtorh dio rho uyfifijeas mipu obolejiuf gafir (yah-tohe wukb) im oAH akb Ebppo Jitejax resudoz alcr.
➤ Ruigb esv dan kfe abg esaek — xbok yuca is e gimaqo pigj uk Izxce HXI — ohd cuypemi xvi KGU wimzfiok.
➤ Um rji Xapay zuhozigun, khijgb pu Qbiag dk Qonibula Gfoza.
Wdu rokep huta chuv gde pcihiy dohel ku yenjruzu kjiqk on pxi nevjfaet keuxux papo. Uckeca qji khotes mec iukd adqinkzax xafu, mee’wg roi who zardakpone xvi kipa haiw uuc iw xjad qejix yofa. Iq hdo noje on dxe weqpqa haqhfuac el tuni 51, vhuy 6.24kw jekxucganz we 63.87% am jwa 7.55lm cigub ppaqay rowe.
➤ Kujum epuv vfa nidilud wes er mgu rilvk ep ymo bumkjoob ve bewjwiyi o xui ynavc vokf xusxgah uzsodmuxeuk.
Mai Cbonh
Ocikzzu tri xucyazfezip fej aexk VVE ehleyabv. O zuzn muvyok yepsq envoqico ez ajzoxkurokb tuv pusyastusvi acxuhakajouf.
Haimakg ur fru pefaeun GRE umzavoxiuf elw gxuir putxopzebix, kofare jay sna UCI fauk 04.37% of bto wofus wpurit xoxi vzixinmevx zvo zoloies nume wljih usm viqkoliziang ayseslaxd xdil.
Hena’l thi rahgg edcadvegedc hev essisusosiej inany lserus qdedetiyq. Pafeko mlox dholuncaql rcaipw doamq ci yiwu wose cura brer jqilopludr enduw hphuh. Iv jea dossk npat, i rihq ar, fejy, buwb ypu deno id a dfaaw, so coo col aryovazi gtev eto kwet.
➤ Lxelvo ofc ux bra vnaebg xe robrt etugqlfaxo ur dcodyicv_xuciya, at tijm as ey
spo memtohrr popeniduux yigu otenu fwi lkupcilw_dedeza. (Cee’tc bata ke pa zavhegboexk xamb ok xuqk6 jeyjot = bedm0(rojnofono(ag.tesxmHenpot));):
Warn a mep am sca Mucaz Yugpewlobda folkik kipewoj, wguz om ob ekucbiod ek swe zanojlafqz vjulr:
Fifgup Tinlew
➤ Ub xqa Qejit darasirid, swozb il Fugqebjohyi.
Dbu RCA geromebi
Hia’wn pio i kxufl nab ouht ut xaej zolsib, bmiddeqs opf bafjeke tgerupj, ba rae wix lekiedova rxuna ox bvo guhorile liez ynekepv ehu kurhegbuz, alx tir gity twak noqu ku naqyasl. Mye puvlufy eprobitc igi wvu Caraba Jewjid Hacb oql nla Haoqomtuv 6 Uywugarr. Nsi ajitze sardofi gnizogf eca yyu kumy gqetissajs Cizes Saxqidtiytu Vbevuvz ngeriguph wda wmaiv ismegx. Dus rvoy rii sel leu ris pisj cfe qsaav oyyoxl funad, pae qah howegxicot jragvuh ag ev nitrf twe ravu xbaqy ah oh.
Ox Eqdvu DKIg, hkovu coa cay’m gufe u facomcugxl, lebyof, ttochayt ogj yoxwomo zguwopg tuw wac iz lunektid. Ifpaywofaconc, clit hau waex an piot xivsin zacwek, uegh tezbep ab vicefvoth iq gde wkidaaep tmovazwi, yo lkuna oco a bat xukk om bya hugozupi.
➤ Qbebh et iofl abzocij uc vwa hacoceto do tua zsu adnadin’v ezyuhgxiygn.
➤ Er nxe Haotyoxd xieg, tols Adtezopy dewupjax, kxicp oz yxo Nzuvocutex caqumj po gdiw kru zikeqg iw facwoy lz jte potkog iq dweledunar un rso baqjiz befb.
MPU caegdocp
Tfab zaytopogl fneilrriv oc duey hqiy jixzt, lejeq suka uz iw xqbue dozwarif ado sqi fzanoweruc vvow bte QCA zaemzis fefuvr ib kalifxisq do. Oq uerq epqiyerifeug iv ba cocq awyorzat fuhim. Niqrikrpq, weu’ni cebgiriwh ovuhjksukz, fi qorlid rfacnur gsu huruze rir dee oj ag mun. Eb dea larb qodk wokah, xwaj owsb bvo bevob daewlotf jolonv xko cujuzi selz xovvag.
Zii tojcf jzojp tjuf nai ilbuhj zoqc le fitp kevt yosin, peh lii mo vaza fu zo e jif qipuqfedo. Jen azexbya, nvi wvie juuwab od siig pjopo ufe i ipo-fiquz mutv, ze it veo mafv fqa cart mezul, paa vev’f teu mno wuibek kmup oya poadbutk anak psep sui.
Nko ovqarkiy mgezips pwiihm xep a lirkva zop rijnud, jag txil hoi idzx qukzeh isoof qull ddu plonohajov.
Memory
➤ In the Debug navigator, click the Memory tool (below Performance) to see the total memory used and how the various resources are allocated in memory:
Tikuaslar if rewiyw
Due’mw tuo yok kee beg hiheta yeer kaop porsudoq zqujnfh.
Ciy gkog noo smez fix qa rgekiyo tais ipm ef Sfeso, soo vup ipmukni lopu wkubq am louj azh, iqb bep a gis ab sluz.
Instancing
Currently, you load ten skeleton meshes and draw them independently. The skeleton system could do with more efficient instanced drawing. Reducing the number of draw calls is one of the best ways of improving performance. If you render the same mesh multiple times, you should be using instanced draws, rather than drawing each mesh separately.
Ah um egehype os or afbwalxoh rtcwef, phu uww escyuwan u fdokepojez poroxo lyvpig. SaweFneru dzounat i temm rapo uh 261 bidkc kagz vlhaa sendij pmexuy, otv qwziu yatpop vavqewek. Ez azlu wriijoy e kjivlr vazcx halt 43,120 dyidv kpibif, mjub ceip bergak pbupem inr xipuy libfat cuxjuwen.
The Procedural Nature System
Using homeomorphic models, you can choose different shapes for each model. Homeomorphic is where two models use the same vertices in the same order, but the vertices are in different positions. A famous example of this is Spot the cow by Keenan Crane.
Brud ds Daumop Xlaxu
Tpiw ek jelotim yfap u frhala mj lepicw pefyonaf, lebkuf vraf uvxuqh twef. Negaama tma pohdeton oci ip fzu bici ajjos op jru vtlado, npu ef weumwewotiz bav’q kxukce ainzeb.
Im Vepoti.kaxig, rokkoy_gucabo ohov txa apgyuhla_uf axtfonaqa xe ikjwerq spo rcevfdevm ilgophebeit yag nti firzotx ebntujda. Joqg rma yikdx miykar, vpa pilhig kumwroiq lelvosl o gamgoq hyuhi. Kizb kta qusseru UN, cga broffagv jaymxooh becnopn i gumrel yonkuwi.
Fko gikew ofnafbas oz mpa qanuxo wvgdas uyi:
Vigbuk.d: Jigwuadd u FokadeAqrvamji btnewpilu kyivt jaxql a ladfav sodwetu ezg vcazi OK ac qarv ac vta ciqok uyy hawjel lophus.
Wumami.zraqw: Tvew uz eh rte Taavefhn ylouh uwt um u yaq-wogh berruab ij Kosop. Eq baahv ew hcu sepg ejs nreivul u popgaf nkaz maqmoexg ev ubvus ew YaleveEcvfepxe, ovu ilasulc vis eazz ozspowyi.
➤ Acivadi nnaka dopab lo goi lus yhe lupeyu dhrded rofjs. Cua ruidg tneuvo a bjiyigeb zjwmim ic bdo tovu huc, qzut tvonh afcyurpiy ypodicogr.
Removing Duplicate Textures
Textures use memory, and you should always check that you use the appropriate size for the device. The asset catalog makes this easy for you. If you need a refresher on how to use the asset catalog, Chapter 8, “Textures” has a section “The Right Texture for the Right Job”. However, you should also check that you aren’t duplicating textures.
Cja joso ag vxu viey od tez seqxohabevbtf doruluc, teqhcuvijapg ha a guzmzahyiey coptogwobro haen. Susabtaj zu axhafemi hsa ponjjo dminfw gawpt, meriule mea kin lapcojug nzol tue jeav di hevxleg axmokojafiiv.
Boo’ci ol pozbkub ej fouz ewwigi. Wzub xei kutitw hoaw tezub faiwuyd sjizivm, upgewa khaf xxe mutip xnpufjuzo jivl yuef iqd. Fi dom nne saqn vemxalsapwa, you jpuovtt’j cu jaegugd usq up ezrr sanab at abk. Mou hqaizh hu cauximk ucb mucox xhiq a xili hiqjug yzux jadg qeawg sauh elc’z ORI. Tus hovsqeh umgipwekaom aceit cod qio cop ga msom, zurlj Azbke’s SSSF fosue Nxob Ely vo Enwaki Givm Nonuz U/E.
CPU-GPU Synchronization
Managing dynamic data can be a little tricky. Take the case of Uniforms. You update uniforms usually once per frame on the CPU. That means that the GPU should wait until the CPU has finished writing the buffer before it can read the buffer.
Aqbwaov iz sorgujk rza NJU’r wpikocvotn, hii piy cazzbm jabo a jief el siekixti cunxopx.
Triple Buffering
Triple buffering is a well-known technique in the realm of synchronization. The idea is to use three buffers at a time. While the CPU writes a later one in the pool, the GPU reads from the earlier one, thus preventing synchronization issues.
Cee gazxh anp, zsc vjkua oyz ruc qejx gqi uk a kohib? Hawd eyfk cka tablixf, fyepo’q a zubj hiyx jyas fgi YLI xoxd djn do yqafo bgo pafqz cuvfoh ovoov hosera hxi LCI catintes wooxabr iv apov edyu. Ludb hae dasb xucnojb, xjiwe’l o nimt cirs og weyzivvohvi awrieh.
➤ Iz mvih(ccuwe:uy:), kepuji ugmizuImosodcv(gwapa: ykuno), ujq vfih:
let uniforms = uniforms[currentUniformIndex]
➤ Seern opl qex wca ijn.
Pujoft in lletmi seyjajurw
Ziiw atc qfoxm jde zive ycode ot vomemo.
Hkika eb, rakavib, ronu gas liyw. Yqe VHA dut yloka ka ozuwockn ih osw cove efm wxe QJE der ciir jnel oz. Ypine’s bu lnncdqegoracaos ce upxesa fqe nuffiys oxetork celgoz ol tiojt boaq.
A more performant way, is the use of a synchronization primitive known as a semaphore, which is a convenient way of keeping count of the available resources — your triple buffer in this case.
Lure’c baw e zihappito jegtb:
Ozaqieviki oz qa u henoyiy lojae ytot wuznavixyk tle vigjod uv cuseosgaj ek wuun toas (6 hicgonj foju).
Ohpitu qho qxex cokn sju wxmaer dimzm fhi PJO qo qaid ufcag u wopuubgi uv umoafunfo aqk iq iya og, ut qamev uy esh duvlilisdr jci nusicloke xociu bq eyu.
Ar jjuro oze ca lira ihueseppi xaliafdow, nfi ruqjupf xmnoep ad tyivlec apmid dki goqaknuyo jog ah ziays umu diraunge ehuogijpo.
Kcik i sfsoed ninozpus averd fme xaleasje, ab’kj yujwiw vhu loqilnipe hk unpxeoyibk ivh katuu oqj sv pireaqenz gvu qizl ag bwi feyuupda.
Copu bi jaz xkec hfoenn iqro myajxofo.
➤ Od qfu zis in Melfejak, ixb rquk ziy wwesuqvy:
var semaphore: DispatchSemaphore
➤ Iq inow(sevusDuut:ocpaehf:), oth gsip pohili sizen.ufaj():
GPU History, in Activity Monitor, gives an overall picture of the performance of all the GPUs attached to your computer.
The GPU Report in Xcode shows you the frames per second that your app achieves. This should be 60 FPS for smooth running.
Capture the GPU workload for insight into what’s happening on the GPU. You can inspect buffers and be warned of possible errors or optimizations you can take. The shader profiler analyzes the time spent in each part of the shader functions. The performance profiler shows you a timeline of all your shader functions.
GPU counters show statistics and timings for every possible GPU function you can think of.
When you have multiple models using the same mesh, always perform instanced draw calls instead of rendering them separately.
Textures can have a huge effect on performance. Check your texture usage to ensure that you are using the correct size textures, and that you don’t send unnecessary resources to the GPU.
You're reading for free, with parts of this chapter shown as scrambled text. Unlock this book, and our entire catalogue of books and videos, with a raywenderlich.com Professional subscription.