Chapters

Hide chapters

Metal by Tutorials

Fifth Edition · macOS 26, iOS 26 · Swift 6, Metal 3 · Xcode 26

Section I: Beginning Metal

Section 1: 10 chapters
Show chapters Hide chapters

Section II: Intermediate Metal

Section 2: 8 chapters
Show chapters Hide chapters

Section III: Advanced Metal

Section 3: 8 chapters
Show chapters Hide chapters

30. Profiling
Written by Marius Horga & Caroline Begbie

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now

The first step to optimizing the performance of your app is examining exactly how your current app performs and analyzing where the bottlenecks are.

Imagine this scenario: You’ve started development on the first level of a new game, Phoenix Island: Rising from the ashes. You’ve created a basic scene, and now you want to find out how well it runs before adding the action.

The app runs fine at 60 FPS on macOS M1 Max and M3 iPad Air, but you’re horrified to discover that the iPad mini 6, with its older chip and lower memory, runs the app at a mere 40 FPS.

In this chapter, you’ll look at some tools to help you analyze performance and find where your bottlenecks are.

Note: Credit for the phoenix model in this app goes to: NORBERTO-3D at Sketchfab. All the other models and HDRI sky were created by the folks at Poly Haven

The Starter App

➤ In Xcode, review the project for this chapter. There are a number of interesting features.

Assets

First, there are two Assets folders. The one directly under the top level Profiling contains a lot of data, so it points to a folder outside of the Profiling hierarchy. If the content names are red, select both Assets and game-scene.usda, and in the File inspector, click the folder icon. Then, locate and select the assets folder to reconnect the files. The assets folder is the folder that contains both Assets and game-scene.usda.

Reconnect asset files
Bigumyiml oywuy bicat

The USD Scene

assets/game-scene.usda is an editable text file that describes the scene. If your scene is running too slow or you want to isolate an object, you can remove elements from the file. For example, to remove the landscape, delete the following lines:

def Mesh "Landscape" (
    prepend references ...
  )
{
    token visibility = "inherited"
    matrix4d xformOp:transform ...
    uniform token[] xformOpOrder = ["xformOp:transform"]
}

The Render Passes

In Renderer.swift, you can see the usual render passes, along with these new ones:

The starter app
Tqe rvuyzot ist

Profiling

There are a few ways to monitor and tweak your app’s performance. In this chapter, you’ll look at what Xcode has to offer in the way of profiling. You can also use Instruments, which is a powerful app that profiles both CPU and GPU performance. For further information, read Apple’s article Analyzing the performance of your Metal app.

Metal Performance HUD

A great place to start is looking at information about how your app is running is the Metal Performance HUD.

Edit scheme
Ucek dfnava

Diagnostics
Jiansesgezv

The Metal Performance HUD
Fpo Nutit Jehxopkoqwa MIG

Culling Back Faces

You can achieve a quick performance win by not rendering so many vertices. Currently, you’re rendering everything, no matter whether the primitive is facing the camera or not. Culling faces means getting rid of the primitives that face away from the camera, so that only the faces pointing toward the camera will render.

let cullFaces = true
Face culling implemented
Vogo hayxebj enmdokohsig

The GPU Report

➤ With your app running, and in Xcode on the Debug navigator, click FPS.

The GPU report
Lfo PQU wowuxq

GPU Workload Capture

In previous chapters, you captured the GPU workload to inspect textures, buffers and render passes. The GPU capture is always the first point of call for debugging. Make sure that your buffers and render passes are structured in the way that you think they are, and that they contain sensible information.

Summary

➤ With your app running, capture the GPU workload, and in the Debug navigator, click Summary.

The summary of your frame
Bwi fegholg az quuz clome

.worldTangent = model->normalMatrix * in.tangent,
.worldBitangent = model->normalMatrix * in.bitangent,
Tangent buffer enabled
Yumtoqt fogtit iwazfeh

Bandwidth issues
Yavxqonhp atraef

API Usage insights
AMO Adole ejdihslj

Encoded Command Performance

The next place to look at profiling your app is in the Debug navigator, which details the performance of render passes and pipeline states.

Group by Pipeline State
Sdiud sb Vulobona Hziko

Large draw call
Punfi lzob kolj

Memory

Inefficient use of memory can do a lot of damage to performance.

Resources in memory
Nehauxzuh as parefw

Landscape texture
Yaxqkluyi dimqido

GPU Timeline

The GPU timeline tool gives you an overview of how your vertex, fragment and compute functions perform, broken down by render pass.

Capture the GPU workload
Xoftage jqo HDU jajvyuup

Render Passes
Juvdar Peppis

The GPU timeline
Sxo JQO jumajepo

bloom.postProcess(
  view: view,
  commandBuffer: commandBuffer,
  inputTexture: descriptor.colorAttachments[0].texture)
Without the bloom render pass
Pohbuuk wyi svaoy resdoj hogd

Instancing

Reducing the number of draw calls is one of the best ways of improving performance. Whenever you render the same mesh multiple times, you should be using instanced draws, rather than drawing each mesh separately.

Procedural rocks
Ddilagolok hoxxs

The Procedural Nature System

Using homeomorphic models, you can choose different shapes for each model. Homeomorphic is where two models use the same vertices in the same order, but the vertices are in different positions. A famous example of this is Spot the cow by Keenan Crane.

Spot by Keenan Crane
Jxud sw Baejud Mvize

Homeomorphic rocks
Sewuotacycan meyjy

 encoder.drawIndexedPrimitives(
   type: .triangle,
   indexCount: submesh.indexCount,
   indexType: submesh.indexType,
   indexBuffer: submesh.indexBuffer.buffer,
   indexBufferOffset: submesh.indexBuffer.offset,
   instanceCount: instanceCount)

Inspecting Shaders

It’s easy to debug Swift code by using break points and printing out values. But how do you find out what your Metal Shading Language code is doing? The Shader editor has you covered. You can profile your shaders and find out how long each line of code takes to execute. You can examine your vertex shader code values line by line for a particular vertex, or fragment shader code for a particular pixel.

An ocean view
El uduiy loiw

The Water Render Pass
Pta Yonoc Welwoh Zonx

The Debug Shader icon
Vti Karow Jxepub ehel

Choose vertex or fragment
Cyoefa vetkod ev dverkiyr

Shader function values
Pjiheg luzbmaes ficaeh

float3 nearColor = float3(1, 0, 0);
The Reload Shaders icon
Twa Zexeej Kyotebj orit

A red sea
E wum xoi

The Shader Profiler

➤ Click the clock icon next to the Refresh Shaders in the toolbar above the Debug console, and click Profile in the pop-up window.

The Shader Profiler
Fce Wquyak Czowacey

return half4(half3(color), alpha);

CPU-GPU Synchronization

Measuring GPU performance is important, but you should also consider interaction between CPU and GPU. Poor coordination can cause stalls, where the GPU waits for the CPU work to complete, or the CPU idles while the GPU finishes a task. Synchronization issues can also cause frame stutters.

Triple Buffering

Triple buffering is a well-known technique in the realm of synchronization. The idea is to use three buffers at a time. While the CPU writes a later one in the pool, the GPU reads from the earlier one, thus preventing synchronization issues.

let maxFramesInFlight = 3
Self.currentFrameIndex =
  (Self.currentFrameIndex + 1) % maxFramesInFlight
Result of triple buffering
Tenopc ux pyuwso detjayofh

Resource Contention
Comoaxmi Gactuhpiek

commandBuffer.waitUntilCompleted()

Semaphores

A more performant way, is the use of a synchronization primitive known as a semaphore, which is a convenient way of keeping count of the available resources. In this case, your triple buffer.

var semaphore: DispatchSemaphore
semaphore = DispatchSemaphore(value: maxFramesInFlight)
_ = semaphore.wait(timeout: .distantFuture)
commandBuffer.addCompletedHandler { _ in
  self.semaphore.signal()
}
commandBuffer.waitUntilCompleted()

MetalFX Upscaling

You probably noticed that when you run your app full-screen rather than a small window, your frame rate drops. What if you could get the performance of a smaller window, but still enjoy a full-screen experience?

let doUpscaling = true
Result of upscaling
Mocibs ed epqlepovh

Visibility Culling

The fastest geometry to render is geometry that you don’t have to render because it’s not in the frame. Currently you render all objects in the app, whether they can be seen by the camera or not. You process the fire particles even though they might not be on screen. Implementing frustum culling is one of the most important ways of speeding up your app. When you refactor your app to do GPU indirect rendering, as described in Chapter 27, “GPU Command Encoding”, you should ensure that you only create indirect commands for on-screen geometry.

Key Points

  • The Metal Performance HUD is the easiest way to profile your app.
  • Cull the primitives facing away from the camera using back-face culling.
  • Capture the GPU workload for insight into what’s happening on the GPU. You can inspect buffers and be warned of possible errors or optimizations you can take. The shader profiler analyzes the time spent in each part of the shader functions. The performance profiler shows you a timeline of all your shader functions.
  • When you have multiple models using the same mesh, always perform instanced draw calls instead of rendering them separately.
  • Textures can have a huge effect on performance. Check your texture usage to ensure that you are using the correct size textures, and that you don’t send unnecessary resources to the GPU.

Where to go From Here

The resources for this chapter contain a list of the Apple articles and videos on profiling. There are many advanced methods, including using Instruments, or examining GPU counters. The Apple documentation and videos are very good on this topic. The resources also contain links to blog posts where they tear down and examine render passes in games.

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.
© 2026 Kodeco Inc.

You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now