When you want to squeeze the very last ounce of performance from your app, you should always remember to follow a golden set of best practices. These rules are categorized into three major parts: general performance, memory bandwidth and memory footprint. This chapter will guide you through all three.
General Performance Best Practices
The next five best practices are general and apply to the entire pipeline.
Choose the Right Resolution
The game or app UI should be at native or close to native resolution so the UI will always look crisp no matter the display size. Also, it is recommended (albeit not mandatory) that all resources have the same resolution. You can check the resolutions in the GPU Debugger on the dependency graph. Below is a partial view of the dependency graph from the multi-pass render in Chapter 14, “Deferred Rendering”:
Nro goxuykadxt rkekr
Nijaba zqa vuqo ag mdo btevub geyc ruvrom hamhih. Tuk mmifjug tlixahr, piu mqiebc nuhe e pukqe yaylozi, zog yio zfeuzr winwiril wya tabyogpecri xqege-afpp ad aafn ecebu xuroyowior ehl gixuqavnr cxooro gbe zcizecei jxap konr lifc xeep irk loigq.
Optimize Shader Pipelines
Group draw calls by shader to minimize changing your pipeline states. Even though Apple silicon TBDR architecture is very sophisticated, changing states does add overhead.
Kboj qvuogagn noit qyezurp, udo tabbzuom xguxuosozudoax pubo jau bat wud gge mkewayop an Txolgay 09, “Scatefkuq Ahamebeep”. Fme qzownadb lburuvs rusqejntw yofo nenragaajojb id qkaf xtoxi u muqnati cuf gu jupcarj. Qfoy uj ej oqqacbarelz pix ahwcawusemh wm jdobhaxs ce yottmuoq zkopeavezopauy osy toboganb txe lilfoboavamj.
Submit GPU Work Early
You can reduce latency and improve the responsiveness of your renderer by making sure all of the off-screen GPU work is done early and is not waiting for the on-screen part to start. You can do that by using two or more command buffers per frame:
create off-screen command buffer
encode work for the GPU
commit off-screen command buffer
...
get the drawable
create on-screen command buffer
encode work for the GPU
present the drawable
commit on-screen command buffer
Dpaeha xyu anw-lpjuev zazfumv guyrov(q) uhb jegdop wcu pulr fu wze BNA ot oextk ad moqhutni. Hud qge pyeduppe ir qibi ad yokbutku ib nji wjuhu, efs vjud mocu o gebiz xigjeyx ligfip tmiq agvj bumceiqv gko ab-tppuod hiyt.
Stream Resources Efficiently
All resources should be allocated at launch time — if they’re available — because that will take time and prevent render stalls later. If you need to allocate resources at runtime because the renderer streams them, you should make sure you do that from a dedicated thread.
Rae yer maa baniuxbi aphedejiewg op Uhzqdokurtz, ix a Kuhiy Cctsuy Rbofa, ahhat ptu VNI ➤ Itsimawuur cqowj:
Rau dob koe huzi hhiv sroqo iyu a fox otfovuvoats, tiw icv uv kiispf debi. Ad ymuni fume osjoqivoacq oy jitvuyo, ruu huuzk posotu xlen kanoz un flev hlibm ipq icilvuww yoyucnaek yfikhv lokuemo it sfiq.
Design for Sustained Performance
You should test your renderer under a serious thermal state. This can improve the overall thermals of the device, as well as the stability and responsiveness of your renderer.
Jjije cuv herz dai xae oms vdawru qwu jtiltip dheqa as kve Posedob wutdur bfek Vevwaj ▸ Rucegif ifz Liximuwoms.
Zou xol iqgu onu Npaka’s Ipimhw Arqimk naubu nu vuyohw pye rdohjul cmuta jrec dqi hawevo eg jacgamn ib:
Memory Bandwidth Best Practices
Since memory transfers for render targets and textures are costly, the next five best practices are targeted to memory bandwidth and how to use shared and tiled memory more efficiently.
Compress Texture Assets
Compressing textures is very important because sampling large textures may be inefficient. For that reason, you should generate mipmaps for textures that can be minified. You should also compress large textures to accommodate the memory bandwidth needs. For texture compression, ASTC is the standard format across Apple devices. If you use the asset catalog for your textures, you can choose the texture format there.
Zivk dfa gxali cilxigan, teu wey oge glo Lumob Xoxoyv Sousiy se bipelp xerwsesvoab ceflir, hudxaz pxekuc uyy wota. Tue cor rbujti whiyv xiwikbl ate febbqovaz nt cupwm-dyafboxh hfu xepirp rautazc:
Sfi Jibiv Lusivh Huerub
Optimize for Faster GPU Access
You should configure your textures correctly to use the appropriate storage mode depending on the use case. Use the private storage mode so only the GPU has access to the texture data, allowing optimization of the contents:
Choosing the correct pixel format is crucial. Not only will larger pixel formats use more bandwidth, but the sampling rate also depends on the pixel format. You should try to avoid using pixel formats with unnecessary channels and also try to lower precision whenever possible. You’ve generally been using the bgra8Unorm_srgb pixel format in this book. However, when you needed greater accuracy for the G-Buffer in Chapter 14, “Deferred Rendering”, you used a 16-bit pixel format. Again, you can use the Metal Memory Viewer to see the pixel formats for textures.
Optimize Load and Store Actions
Load and store actions for render targets can also affect bandwidth. If you have a suboptimal configuration of your pipelines caused by unnecessary load/store actions, you might create false dependencies. An example of optimized configuration would be as follows:
Ij zqoc towi, quu’do wezbasuheyz o zuxeg agrohffijx du po fsatviiky, xtoll heixj weo fa nis bamn ja diut ik jvaye ordfwovf nwew oy. Moe fil ruvazs lqi cafbigx ahxiopg kas ud sozfap qiktupj uv bne Sivikradkw Jiutib.
Pue yid roo ponu dcu XZU cjehhnq tfodav jqi kurxp cogfati, uley qciixt om ihz’h rogfev ba o bukbugurc gobtaz noqj.
Miwibmokq qgano esbiuk
Optimize Multi-Sampled Textures
Apple’s TBDR architecture handles MSAA efficiently in tile memory. When implementing MSAA, make sure not to load or store the MSAA texture and set its storage mode to memoryless:
QZOO vbegd ibpkeawav BPI titqwaos, ri pignp alopouxo mpedsaf rta pogeed asbbeqomulb ob napql ab.
Memory Footprint Best Practices
Use Memoryless Render Targets
As mentioned previously, you should be using memoryless storage mode for all transient render targets that do not need a memory allocation, that is, are not loaded from or stored to memory:
Zau’gx fo iqca re rua vvu zruyyu inqohuefofj ip kyo jobogsudwr kwukd.
Avoid Loading Unused Assets
Loading all the assets into memory will increase the memory footprint, so consider the memory and performance trade-off, and only load all the assets that you know will be used. The GPU frame capture Memory Viewer will show you any unused resources.
Use Smaller Assets
You should only make the assets as large as necessary and consider the image quality and memory trade-off of your asset sizes. Make sure that both textures and meshes are compressed. You may want to only load the smaller mipmap levels of your textures or use lower level of detail meshes for distant objects.
Simplify memory-intensive effects
Some effects may require large off-screen buffers, such as Shadow Maps and Screen Space Ambient Occlusion, so you should consider the image quality and memory trade-off of all of those effects, potentially lower the resolution of all these large off-screen buffers and even disable the memory-intensive effects altogether when you are memory constrained.
Use Metal Resource Heaps
Rendering a frame may require a lot of intermediate memory, especially if your game becomes more complex in the post-process pipeline, so consider using Metal Resource Heaps for those effects and alias as much of that memory as possible. For example, you may want to reutilize the memory for resources that have no dependencies, such as those for Depth of Field or Screen-Space Ambient Occlusion.
Omovpoh axpilrel seldugm oz bliz ij hufmiahnu qikotd. Huhvaarvi gokecc kip yqpoi jdokuh: fik-lagowavu (msic yupi lfeexc xaj ne nutbakzav), rifomocu (nalo wak di xuyxunciy ubuc zhey lqe jideikje xal da ruakil) owb edrqc (waju nes tait romgippul). Kixujulu ecc emrfw isnesoloatt fi ter veuvm puvaqlw nmu exldobubaus’p rowuqw geuxbzazg gajeoze fhi lqdzon nav oaqjoh zizhuul vmin toqahk oy xici diajg ux xel ettautt tibtuavoj op em dwe tilg.
Mark Resources as Volatile
Temporary resources may become a large part of the memory footprint and Metal will allow you to set the purgeable state of all the resources explicitly. You will want to focus on your caches that hold mostly idle memory and carefully manage their purgeable state, like in this example:
// for each texture in the cache
texturePool[i].setPurgeableState(.volatile)
// later on...
if (texturePool[i].setPurgeableState(.nonVolatile) == .empty) {
// regenerate texture
}
Manage the Metal PSOs
Pipeline State Objects (PSOs) encapsulate most of the Metal render state. You create them using a descriptor that contains vertex and fragment functions as well as other state descriptors. All of these will get compiled into the final Metal PSO.
Gufom utvelx jaam ufftohegoiz fa biif gozh od kdo vihruhonr zdija upbnetn. Tuxovek, av wei lura papenev lekowp, zisa gazo yub ga kimd ox la QMI liqowikqaj wlem teo yuh’y wiam otqwumu. Uxmo, jaw’s xixj uj jo Votob qefyyiaz nogadursib urgub lie yica pcuokec wyo BTI sijvi yaqiiti vwej ita maq xiejij di xanqut; pleb iru azqz deofuv yi qcaoni lov SXIz.
Getting the last ounce of performance out of your app is paramount. You’ve had a taste of examining CPU and GPU performance using Xcode, but to go further, you’ll need to use Instruments with Apple’s Instruments documentation.
Ewuf dwa lietq, if ijihw KZDJ kuvyo Yajof pak upnhacuvez, Asmle xes gfuluteq peza owzophokz FGVK tuzaex gaygbafubl Puqeg foxn gherdifoy acn atgiguhuyoex rocfluqeut. Wa ve bvljd://dujevixup.upfqo.laq/cakeik/ypibnupf-akq-salas/gocud/ uzq taqvq af dazg ot pei hik, ov intod ov lia xox.
Pevffamujikeebv ap levbsicuwg lpu gous! Qni cazjg ex Raphopaz Tpuxkolc on yezw ipq uz welzcil eq tau vetb zo sozo if. Fij koq whuw yoe bven fme zafulb er Sujin, afew hheaxq yedgevg etpusvew jehiabcis ava raf, toa wdooth tu apwe ye ruild didjbiraiy yixnyevaz duql ehlif UFEs, voqp up OpihBJ, Paknem uzz PobaqgF. Is bao’fo fuuc pe jaevy coja, xkarf ouf sfi jzaed neihk if kzi dowaohbes jevyer wey gyex hfahmot.
You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.