Data Structures & Algorithms in Dart, First Edition!

Dive into stacks, queues, trees, graphs, efficient sorting and searching algorithms, and more—with Dart, a delightful programming language.

Home iOS & Swift Tutorials

Vision Tutorial for iOS: What’s New With Face Detection?

Learn what’s new with Face Detection and how the latest additions to Vision framework can help you achieve better results in image segmentation and analysis.


  • Swift 5.5, iOS 15, Xcode 13

Taking passport photos is a pain. There are so many rules to follow, and it can be hard to know if your photo is going to be acceptable or not. Luckily, you live in the 21st century! Say goodbye to the funny kiosks, and take control of your passport photo experience by using face detection from the Vision framework. Know if your photo will be acceptable before sending it off to the Passport Office!

Note: The project for this tutorial requires access to the camera. It can only run on a real device, not the simulator. Additionally, some existing knowledge of the Vision framework is assumed. You may wish to start with this earlier tutorial if you’ve never worked with the Vision framework before.

In this tutorial you will:

  • Learn how to detect roll, pitch and yaw of faces.
  • Add quality calculation to face detection.
  • Use person segmentation to mask an image.

Let’s get started!

Getting Started

Download the starter project by clicking the Download Materials button at the top or bottom of the tutorial.

The materials contain a project called PassportPhotos. In this tutorial, you’ll build a simple photo-taking app that will only allow the user to take a photo when the resulting image would be valid for a passport photo. The validity rules you’ll be following are for a UK passport photo, however it would be easy to replicate for any other country.

Rules for passport photos

Open the starter project, select your phone from the available run targets and build and run.

Running on a real device

Note: You may need to set up Xcode to sign the app before you can run it on your device. The easiest way to do this is to open the Signing & Capabilities editor for your target. Then, select Automatically Manage Signing.
Managing app signing

The app displays a front-facing camera view. A red rectangle and green oval are overlaid in the center of the screen. A banner across the top contains instructions. The bottom banner contains controls.

The starter project showing the red rectangle and green oval in the center of the screen, banner across the top and controls at the bottom.

The center button is a shutter release for taking photos. On the left, the top button toggles the background, and the bottom — represented by a ladybug — toggles a debug view on and off. The button to the right of the shutter is a placeholder that gets replaced with a thumbnail of the last photo taken.

Bring the phone up to your face. A yellow bounding box starts tracking your face. Some face detection is already happening!

Face Detection

A Tour of the App

Let’s take a tour of the app now to get you oriented.

In Xcode, open PassportPhotosAppView.swift. This is the root view for the application. It contains a stack of views. A CameraView is at the bottom. Then, a LayoutGuide view (that draws the green oval on the screen) and optionally a DebugView. Finally, a CameraOverlayView is on top.

There are some other files in the app. Most of them are simple views used for various parts of the UI. In this tutorial, you will mainly update three classes: CameraViewModel, the CameraViewController UIKit view controller and FaceDetector.

Open CameraViewModel.swift. This class controls the state for the entire app. It defines some published properties that views in the app can subscribe to. Views can update the state of the app by calling the single public method – perform(action:).

Next, open CameraView.swift. This is a simple SwiftUI UIViewControllerRepresentable struct. It instantiates a CameraViewController with a FaceDetector.

Now open CameraViewController.swift. CameraViewController configures and controls the AV capture session. This draws the pixels from the camera on to the screen. In viewDidLoad(), the delegate of the face detector object is set. It then configures and starts the AV capture session. configureCaptureSession() performs most of the setup. This is all basic setup code that you have hopefully seen before.

This class also contains some methods related to setting up Metal. Don’t worry about this just yet.

Finally, open FaceDetector.swift. This utility class has a single purpose — to be the delegate for the AVCaptureVideoDataOutput setup in the CameraViewController. This is where the face detection magic happens. :] More on this below.

Feel free to nose around the rest of the app. :]

Reviewing the Vision Framework

The Vision framework has been around since iOS 11. It provides functionality to perform a variety of computer vision algorithms on images and video. For example, face landmark detection, text detection, barcode recognition and others.

Before iOS 15, the Vision framework allowed you to query the roll and yaw of detected faces. It also provided the positions of certain landmarks like eyes, ears and nose. An example of this is already implemented in the app.

Open FaceDetector.swift and find captureOutput(_:didOutput:from:). In this method, the face detector sets up a VNDetectFaceRectanglesRequest on the image buffer provided by the AVCaptureSession.

When face rectangles are detected, the completion handler, detectedFaceRectangles(request:error:), is called. This method pulls the bounding box of the face from the face observation results and performs the faceObservationDetected action on the CameraViewModel.

Looking Forward

It’s time to add your first bit of code!

Passport regulations require people to look straight at the camera. Time to add this functionality. Open CameraViewModel.swift.

Find the FaceGeometryModel struct definition. Update the struct by adding the following new properties:

let roll: NSNumber
let pitch: NSNumber
let yaw: NSNumber

This change allows you to store the roll, pitch and yaw values detected in the face in the view model.

At the top of the class, under the hasDetectedValidFace property, add the following new published properties:

@Published private(set) var isAcceptableRoll: Bool {
  didSet {
@Published private(set) var isAcceptablePitch: Bool {
  didSet {
@Published private(set) var isAcceptableYaw: Bool {
  didSet {

This adds three new properties to store whether the roll, pitch and yaw of a detected face are acceptable for a passport photo. When each one updates, it will call the calculateDetectedFaceValidity() method.

Next, add the following to the bottom of init():

isAcceptableRoll = false
isAcceptablePitch = false
isAcceptableYaw = false

This simply sets the initial values of the properties you just added.

Now, find the invalidateFaceGeometryState() method. It’s a stub currently. Add the following code into that function:

isAcceptableRoll = false
isAcceptablePitch = false
isAcceptableYaw = false

Because no face is detected, you set the acceptable roll, pitch and yaw values to false.

Processing Faces

Next, update processUpdatedFaceGeometry() by replacing the faceFound() case with the following:

case .faceFound(let faceGeometryModel):
  let roll = faceGeometryModel.roll.doubleValue
  let pitch = faceGeometryModel.pitch.doubleValue
  let yaw = faceGeometryModel.yaw.doubleValue
  updateAcceptableRollPitchYaw(using: roll, pitch: pitch, yaw: yaw)

Here you pull the roll, pitch and yaw values for the detected face as doubles from the faceGeometryModel. You then pass these values to updateAcceptableRollPitchYaw(using:pitch:yaw:).

Now add the following into the implementation stub of updateAcceptableRollPitchYaw(using:pitch:yaw:):

isAcceptableRoll = (roll > 1.2 && roll < 1.6)
isAcceptablePitch = abs(CGFloat(pitch)) < 0.2
isAcceptableYaw = abs(CGFloat(yaw)) < 0.15

Here, you set the state for acceptable roll, pitch and yaw based on the values pulled from the face.

Finally, replace calculateDetectedFaceValidity() to use the roll, pitch and yaw values to determine if the face is valid:

hasDetectedValidFace =
  isAcceptableRoll &&
  isAcceptablePitch &&

Now, open FaceDetector.swift. In detectedFaceRectangles(request:error:), replace the definition of faceObservationModel with the following:

let faceObservationModel = FaceGeometryModel(
  boundingBox: convertedBoundingBox,
  roll: result.roll ?? 0,
  pitch: result.pitch ?? 0,
  yaw: result.yaw ?? 0

This simply adds the now required roll, pitch and yaw parameters to the initialization of the FaceGeometryModel object.

Debug those Faces

It would be nice to add some information about the roll, pitch and yaw to the debug view so that you can see the values as you're using the app.

Open DebugView.swift and replace the DebugSection in the body declaration with:

DebugSection(observation: model.faceGeometryState) { geometryModel in
  DebugText("R: \(geometryModel.roll)")
    .debugTextStatus(status: model.isAcceptableRoll ? .passing : .failing)
  DebugText("P: \(geometryModel.pitch)")
    .debugTextStatus(status: model.isAcceptablePitch ? .passing : .failing)
  DebugText("Y: \(geometryModel.yaw)")
    .debugTextStatus(status: model.isAcceptableYaw ? .passing : .failing)

This has updated the debug text to print the current values to the screen and set the text color based on if the value is acceptable or not.

Build and run.

Look straight at the camera and note how the oval is green. Now rotate your head from side to side and note how the oval turns red when you aren't looking directly at the camera. If you have debug mode turned on, notice how the yaw number changes both value and color as well.

The oval is green when the user is facing forward

The oval is red when the user is not facing forward

Selecting a Size

Next, you want the app to detect how big or small a face is within the frame of the photo. Open CameraViewModel.swift and add the following property under isAcceptableYaw declaration:

@Published private(set) var isAcceptableBounds: FaceBoundsState {
  didSet {

Then, set the initial value for this property at the bottom of init():

isAcceptableBounds = .unknown

As before add the following to the end of invalidateFaceGeometryState():

isAcceptableBounds = .unknown

Next, in processUpdatedFaceGeometry(), add the following to the end of the faceFound case:

let boundingBox = faceGeometryModel.boundingBox
updateAcceptableBounds(using: boundingBox)

Then fill in the stub of updateAcceptableBounds(using:) with the following code:

// 1
if boundingBox.width > 1.2 * faceLayoutGuideFrame.width {
  isAcceptableBounds = .detectedFaceTooLarge
} else if boundingBox.width * 1.2 < faceLayoutGuideFrame.width {
  isAcceptableBounds = .detectedFaceTooSmall
} else {
  // 2
  if abs(boundingBox.midX - faceLayoutGuideFrame.midX) > 50 {
    isAcceptableBounds = .detectedFaceOffCentre
  } else if abs(boundingBox.midY - faceLayoutGuideFrame.midY) > 50 {
    isAcceptableBounds = .detectedFaceOffCentre
  } else {
    isAcceptableBounds = .detectedFaceAppropriateSizeAndPosition

With this code, you:

  1. First check to see if the bounding box of the face is roughly the same width as the layout guide.
  2. Then, check if the bounding box of the face is roughly centered in the frame.

If both these checks pass, isAcceptableBounds is set to FaceBoundsState.detectedFaceAppropriateSizeAndPosition. Otherwise, it is set to the corresponding error case.

Finally, update calculateDetectedFaceValidity() to look like this:

hasDetectedValidFace =
  isAcceptableBounds == .detectedFaceAppropriateSizeAndPosition &&
  isAcceptableRoll &&
  isAcceptablePitch &&

This adds a check that the bounds are acceptable.

Build and run. Move the phone toward and away from your face and note how the oval changes color.

The user is too far away

The user is properly distanced

Detecting Differences

Currently, the FaceDetector is detecting face rectangles using VNDetectFaceRectanglesRequestRevision2. iOS 15 introduced a new revision, VNDetectFaceRectanglesRequestRevision3. So what's the difference?

Version 3 provides many useful updates for detecting face rectangles, including:

  1. The pitch of the detected face is now determined. You may not have noticed, but the value for the pitch so far was always 0 because it wasn't present in the face observation.
  2. Roll, pitch and yaw values are reported in continuous space. With VNDetectFaceRectanglesRequestRevision2, the roll and yaw was provided within discrete bins only. You can observe this yourself using the app and rolling your head from side to side. The yaw always jumps between 0 and ±0.785 radians.
  3. When detecting face landmarks, the location of the pupils is accurately detected. Previously, the pupils would be set to the center of the eyes even when looking out to the side of your face.

Time to update the app to use VNDetectFaceRectanglesRequestRevision3. You'll make use of detected pitch and observe the continuous space updates.

Open FaceDetector.swift. In captureOutput(_:didOutput:from:), update the revision property of detectFaceRectanglesRequest to revision 3:

detectFaceRectanglesRequest.revision = VNDetectFaceRectanglesRequestRevision3

Build and run.

Hold your phone up to your face. Note how the values printed in the debug output update on every frame. Pitch your head (look up to the ceiling, and down with your chin on your chest). Note how the pitch numbers also update.

The user is looking down

Masking Mayhem

Unless you've been living under a rock, you must have noticed that more and more people are wearing masks. This is great for fighting COVID, but terrible for face recognition!

Luckily, Apple has your back. With VNDetectFaceRectanglesRequestRevision3, the Vision framework can now detect faces covered by masks. While this is nice for general-purpose face detection, it's a disaster for your passport photos app. Wearing a mask is absolutely not allowed in your passport photo! So how then should you prevent people who are wearing masks from taking photos?

Luckily for you, Apple has also improved face capture quality. Face capture quality provides a score for a detected face. It takes into account attributes like lighting, occlusion (like masks!), blur, etc.

Please note that quality detection compares the same subject against copies of themselves. It does not compare one person against another. Capture quality varies between 0 to 1. The latest revision in iOS 15 is VNDetectFaceCaptureQualityRequestRevision2.

Assuring Quality

Before requesting a quality score, your app needs a place to store the quality of the current frame. First, update the model to hold information about face quality.

Open CameraViewModel.swift. Underneath the FaceGeometryModel struct, add the following to store the quality state:

struct FaceQualityModel {
  let quality: Float

This struct contains a float property to store the most recent detected quality.

Under the declaration of faceGeometryState, add a property to publish face quality state:

// 1
@Published private(set) var faceQualityState: FaceObservation<FaceQualityModel> {
  didSet {
    // 2
  1. This follows a pattern like the faceGeometryState property above. A FaceObservation enum wraps the underlying model value. FaceObservation is a generic wrapper providing type safety. It contains three states: face found, face not found and error.
  2. Updates to faceQualityState call processUpdatedFaceQuality().

Don't forget to initialize the faceQualityState in init():

faceQualityState = .faceNotFound

This sets the initial value of faceQualityState to .faceNotFound.

Next, add a new published property for acceptable quality:

@Published private(set) var isAcceptableQuality: Bool {
  didSet {

As with the other properties, initialize it in the init() method:

isAcceptableQuality = false

Now, you can write the implementation for processUpdatedFaceQuality():

switch faceQualityState {
case .faceNotFound:
  isAcceptableQuality = false
case .errored(let error):
  isAcceptableQuality = false
case .faceFound(let faceQualityModel):
  if faceQualityModel.quality < 0.2 {
    isAcceptableQuality = false

  isAcceptableQuality = true

Here, you enumerate over the different states of FaceObservation. An acceptable quality has a score of 0.2 or higher.

Update calculateDetectedFaceValidity() to account for acceptable quality by replacing the last line with:

isAcceptableYaw && isAcceptableQuality

Handling Quality Result

The faceQualityState property is now set up to store detected face quality. But, there isn't a way for anything to update that state. Time to fix that.

In the CameraViewModelAction enum, add a new action after faceObservationDetected:

case faceQualityObservationDetected(FaceQualityModel)

And, update the perform(action:) method switch to handle the new action:

case .faceQualityObservationDetected(let faceQualityObservation):

Here, you're calling publishFaceQualityObservation() whenever the model performs the faceQualityObservationDetected action. Replace the function definition and empty implementation of publishFaceQualityObservation() with:

// 1
private func publishFaceQualityObservation(_ faceQualityModel: FaceQualityModel) {
  // 2
  DispatchQueue.main.async { [self] in
    // 3
    faceDetectedState = .faceDetected
    faceQualityState = .faceFound(faceQualityModel)

Here, you're:

  1. Updating the function definition to pass in a FaceQualityModel.
  2. Dispatching to the main thread for safety.
  3. Updating the faceDetectedState and faceQualityState to record a face detection. The quality state stores the quality model.

Detecting Quality

Now the view model is all set up, and it's time to do some detecting. Open FaceDetector.swift.

Add a new request in captureOutput(_:didOutput:from:) after setting the revision for detectFaceRectanglesRequest:

let detectCaptureQualityRequest =
  VNDetectFaceCaptureQualityRequest(completionHandler: detectedFaceQualityRequest)
detectCaptureQualityRequest.revision =

Here, you create a new face quality request with a completion handler that calls detectedFaceQualityRequest. Then, you set it to use revision 2.

Add the request to the array passed to sequenceHandler a few lines below:

[detectFaceRectanglesRequest, detectCaptureQualityRequest],

Finally, write the implementation for the completion handler, detectedFaceQualityRequest(request:error:):

// 1
guard let model = model else {

// 2
  let results = request.results as? [VNFaceObservation],
  let result = results.first
else {
  model.perform(action: .noFaceDetected)

// 3
let faceQualityModel = FaceQualityModel(
  quality: result.faceCaptureQuality ?? 0

// 4
model.perform(action: .faceQualityObservationDetected(faceQualityModel))

This implementation follows the pattern of the face rectangles completion handler above.

Here, you:

  1. Make sure the view model isn't nil, otherwise return early.
  2. Check to confirm the request contains valid VNFaceObservation results and extract the first one.
  3. Pull out the faceCaptureQuality from the result (or default to 0 if it doesn't exist). Use it to initialize a FaceQualityModel.
  4. Finally, perform the faceQualityObservationDetected action you created, passing through the new faceQualityModel.

Open DebugView.swift. After the roll/pitch/yaw DebugSection, at the end of the VStack, add a section to output the current quality:

DebugSection(observation: model.faceQualityState) { qualityModel in
  DebugText("Q: \(qualityModel.quality)")
    .debugTextStatus(status: model.isAcceptableQuality ? .passing : .failing)

Build and run. The debug text now shows the quality of the detected face. The shutter is only enabled if the quality rises above 0.2.

Showing the quality score in the debug view

Offering Helpful Hints

The app always displays the same message if one of the acceptability criteria fails. Because the model has state for each, you can make the app more helpful.

Open UserInstructionsView.swift and find faceDetectionStateLabel(). Replace the entire faceDetected case with the following:

if model.hasDetectedValidFace {
  return "Please take your photo :]"
} else if model.isAcceptableBounds == .detectedFaceTooSmall {
  return "Please bring your face closer to the camera"
} else if model.isAcceptableBounds == .detectedFaceTooLarge {
  return "Please hold the camera further from your face"
} else if model.isAcceptableBounds == .detectedFaceOffCentre {
  return "Please move your face to the centre of the frame"
} else if !model.isAcceptableRoll || !model.isAcceptablePitch || !model.isAcceptableYaw {
  return "Please look straight at the camera"
} else if !model.isAcceptableQuality {
  return "Image quality too low"
} else {
  return "We cannot take your photo right now"

This code picks a specific instruction depending on which criteria has failed. Build and run the app and play with moving your face into and out of the acceptable region.

Improved user instructions

Segmenting Sapiens

New in iOS 15, the Vision framework now supports person segmentation. Segmentation just means separating out a subject from everything else in the image. For example, replacing the background of an image but keeping the foreground intact — a technique you've certainly seen on a video call in the last year!

In the Vision framework, person segmentation is available using GeneratePersonSegmentationRequest. This feature works by analyzing a single frame at a time. There are three quality options available. Segmentation of a video stream requires analyzing the video frame by frame.

The results of the person segmentation request include a pixelBuffer. This contains a mask of the original image. White pixels represent a person in the original image and black represent the background.

Passport photos need the person photographed against a pure white background. Person segmentation is a great way to replace the background but leave the person intact.

Using Metal

Before replacing the background in the image, you need to know a bit about Metal.

Metal is a very powerful API provided by Apple. It performs graphics-intensive operations on the GPU for high performance image processing. It is fast enough to process each frame in a video in real time. This sounds pretty useful!

Open CameraViewController.swift. Look at the bottom of configureCaptureSession(). The camera view controller displays the preview layer from the AVCaptureSession.

The class supports two modes. One where Metal is used and one where Metal is not used. Currently it's set up to not use Metal. You'll change that now.

In viewDidLoad(), add the following code before the call to configureCaptureSession():


This configures the app to use Metal. The view controller now draws the result from Metal instead of the AVCaptureSession. This isn't a tutorial on Metal, though, so the setup code is already written. Feel free to read the implementation in configureMetal() if you're curious.

With Metal configured to draw the view, you have complete control over what the view displays.

Building Better Backgrounds

The Hide Background button is above the debug button on the left-hand side of the camera controls. Toggling the button on does nothing — yet. :]

Open FaceDetector.swift. Find captureOutput(_:didOutput:from:). Underneath the declaration of detectCaptureQualityRequest, add a new face segmentation request:

// 1
let detectSegmentationRequest = VNGeneratePersonSegmentationRequest(completionHandler: detectedSegmentationRequest)
// 2
detectSegmentationRequest.qualityLevel = .balanced

Here, you:

  1. Create the segmentation request. Call the detectedSegmentationRequest method upon completion.
  2. There are three quality levels available for a VNGeneratePersonSegmentationRequest: accurate, balanced and fast. The faster the algorithm runs, the lower the quality of the mask produced. Both fast and balanced quality levels run quick enough to be used with video data. The accurate quality level requires a static image.

Next, update the call to the sequenceHandler's perform(_:on:orientation:) method to include the new segmentation request:

[detectFaceRectanglesRequest, detectCaptureQualityRequest, detectSegmentationRequest],

Handling the Segmentation Request Result

Then add the following to detectedSegmentationRequest(request:error:):

// 1
  let model = model,
  let results = request.results as? [VNPixelBufferObservation],
  let result = results.first,
  let currentFrameBuffer = currentFrameBuffer
else {

// 2
if model.hideBackgroundModeEnabled {
  // 3
  let originalImage = CIImage(cvImageBuffer: currentFrameBuffer)
  let maskPixelBuffer = result.pixelBuffer
  let outputImage = removeBackgroundFrom(image: originalImage, using: maskPixelBuffer)
  viewDelegate?.draw(image: outputImage.oriented(.upMirrored))
} else {
  // 4
  let originalImage = CIImage(cvImageBuffer: currentFrameBuffer).oriented(.upMirrored)
  viewDelegate?.draw(image: originalImage)

In this code, you:

  1. Pull out the model, the first observation and the current frame buffer, or return early if any are nil.
  2. Query the model for the state of the background hiding mode.
  3. If hiding, create a core image representation of the original frame from the camera. Also, create a mask of the person segmentation result. Then, use those two to create an output image with the background removed.
  4. Otherwise, the original image is recreated without change when not hiding the background.

In either case, a delegate method on the view is then called to draw the image in the frame. This will use the Metal pipeline discussed in the previous section.

Removing the Background

Replace the implementation of removeBackgroundFrom(image:using:):

// 1
var maskImage = CIImage(cvPixelBuffer: maskPixelBuffer)

// 2
let originalImage = image.oriented(.right)

// 3.
let scaleX = originalImage.extent.width / maskImage.extent.width
let scaleY = originalImage.extent.height / maskImage.extent.height
maskImage = maskImage.transformed(by: .init(scaleX: scaleX, y: scaleY)).oriented(.upMirrored)

// 4
let backgroundImage = CIImage(color: .white).clampedToExtent().cropped(to: originalImage.extent)

// 5
let blendFilter = CIFilter.blendWithRedMask()
blendFilter.inputImage = originalImage
blendFilter.backgroundImage = backgroundImage
blendFilter.maskImage = maskImage

// 6
if let outputImage = blendFilter.outputImage?.oriented(.left) {
  return outputImage

// 7
return originalImage

Here, you:

  1. Create a core image of the mask using the segmentation mask from the pixel buffer.
  2. Then, you rotate the original image to the right. The segmentation mask results are rotated by 90 degrees relative to the camera. Thus, you need to align the image and the mask before blending.
  3. Similarly, the mask image isn't the same size as the video frame pulled straight from the camera. So, scale the mask image to fit.
  4. Next, create a pure-white image the same size as the original image. clampedToExtent() creates an image with infinite width and height. You then crop it to the size of the original image.
  5. Now comes the actual work. Create a core image filter that blends the original image with the all-white image. Use the segmentation mask image as the mask.
  6. Finally, re-rotate the output from the filter left and return
  7. Or, return the original image if blended image couldn't be created.

Build and run. Toggle the Hide Background button on and off. Watch as the background around your body disappears.

Person segmentation

Saving the Picture

Your passport photo app is almost complete!

One more task remains — taking and saving a photo. Start by opening CameraViewModel.swift and adding a new published property underneath isAcceptableQuality property declaration:

@Published private(set) var passportPhoto: UIImage?

passportPhoto is an optional UIImage that represents the last photo taken. It's nil before the first photo is taken.

Next, add two more actions as cases to CameraViewModelAction enum:

case takePhoto
case savePhoto(UIImage)

The first action performs when the user presses the shutter button. The second action performs after processing the image when it's ready to save to the camera roll.

Next, add handlers for the new actions to the end of the switch statement in perform(action:):

case .takePhoto:
case .savePhoto(let image):

Then, add the implementation to takePhoto() method. This one is very simple:


shutterReleased is a Combine PassthroughSubject that publishes a void value. Any part of the app holding a reference to the view model can subscribe to an event of the user releasing the shutter.

Add the implementation of savePhoto(_:), which is nearly as simple:

// 1
UIImageWriteToSavedPhotosAlbum(photo, nil, nil, nil)
// 2
DispatchQueue.main.async { [self] in
  // 3
  passportPhoto = photo

Here, you:

  1. Write the provided UIImage to the photo album on your phone.
  2. Dispatch to the main thread as needed for all the UI operations.
  3. Set the current passport photo to the photo passed into the method.

Next, open CameraControlsFooterView.swift and wire up the controls. Replace the print("TODO") in the ShutterButton action closure with the following:

model.perform(action: .takePhoto)

This tells the view model to perform the shutter release.

Then, update the ThumbnailView to show the passport photo by passing it from the model:

ThumbnailView(passportPhoto: model.passportPhoto)

Finally, open FaceDetector.swift and make the necessary changes to capture and process the photo data. First, add a new property to the class after defining the currentFrameBuffer property:

var isCapturingPhoto = false

This flag indicates that the next frame should capture a photo. You set this whenever the view model's shutterReleased property publishes a value.

Find the weak var model: CameraViewModel? property and update it like so:

weak var model: CameraViewModel? {
  didSet {
    // 1
    model?.shutterReleased.sink { completion in
      switch completion {
      case .finished:
      case .failure(let error):
        print("Received error: \(error)")
    } receiveValue: { _ in
      // 2
      self.isCapturingPhoto = true
    .store(in: &subscriptions)

Here, you:

  1. Observe updates to model's shutterReleased property after it's set.
  2. Set the isCapturingPhoto property to true when the shutter gets released.

Saving to Camera Roll

Next, in captureOutput(_:didOutput:from:), immediately before initializing detectFaceRectanglesRequest, add the following:

if isCapturingPhoto {
  isCapturingPhoto = false
  savePassportPhoto(from: imageBuffer)

Here, you reset the isCapturingPhoto flag if needed and call a method to save the passport photo with the data from the image buffer.

Finally, write the implementation for savePassportPhoto(from:):

// 1
guard let model = model else {

// 2
imageProcessingQueue.async { [self] in
  // 3
  let originalImage = CIImage(cvPixelBuffer: pixelBuffer)
  var outputImage = originalImage

  // 4
  if model.hideBackgroundModeEnabled {
    // 5
    let detectSegmentationRequest = VNGeneratePersonSegmentationRequest()
    detectSegmentationRequest.qualityLevel = .accurate

    // 6
    try? sequenceHandler.perform(
      on: pixelBuffer,
      orientation: .leftMirrored

    // 7
    if let maskPixelBuffer = detectSegmentationRequest.results?.first?.pixelBuffer {
      outputImage = removeBackgroundFrom(image: originalImage, using: maskPixelBuffer)

  // 8
  let coreImageWidth = outputImage.extent.width
  let coreImageHeight = outputImage.extent.height

  let desiredImageHeight = coreImageWidth * 4 / 3

  // 9
  let yOrigin = (coreImageHeight - desiredImageHeight) / 2
  let photoRect = CGRect(x: 0, y: yOrigin, width: coreImageWidth, height: desiredImageHeight)

  // 10
  let context = CIContext()
  if let cgImage = context.createCGImage(outputImage, from: photoRect) {
    // 11
    let passportPhoto = UIImage(cgImage: cgImage, scale: 1, orientation: .upMirrored)

    // 12
    DispatchQueue.main.async {
      model.perform(action: .savePhoto(passportPhoto))

It looks like a lot of code! Here's what's happening:

  1. First, return early if the model hasn't been set up.
  2. Next, dispatch to a background queue to keep the UI snappy.
  3. Create a core image representation of the input image and a variable to store the output image.
  4. Then, if the user has requested the background to be removed...
  5. Create a new person segmentation request, this time without a completion handler. You want the best possible quality for the passport photo, so set the quality to accurate. This works here because you're only processing a single image, and you're performing it on a background thread.
  6. Perform the segmentation request.
  7. Read the results synchronously. If a mask pixel buffer exists, remove the background from the original image. Do this by calling removeBackgroundFrom(image:using:), passing it the more accurate mask.
  8. At this point, outputImage contains the passport photo with the desired background. The next step is to set the width and height for the passport photo. Remember the passport photo may not have the same aspect ratio as the camera.
  9. Calculate the frame of the photo, using the full width and the vertical center of the image.
  10. Convert the output image (a Core Image object) to a Core Graphics image.
  11. Then, create a UIImage from the core graphics image.
  12. Dispatch back to the main thread and ask the model to perform the save photo action.


Build and run. Align your face properly and take a photo with and without background hiding enabled. After taking a photo, a thumbnail will appear on the right-hand side of the footer. Clicking the thumbnail will load a detail view of the image. If you open the Photos app, you'll also find your photos saved to the camera roll.

Note how the quality of the background replacement is better in the still image that it was in the video feed.

Primary view showing a thumbnail

Photo detail view

Where to Go From Here?

In this tutorial, you learned how to use the updated Vision framework in iOS 15 to query roll, pitch and yaw in real time. You also learned about the new person segmentation APIs.

There are still ways to improve the app. For example, you could look at using Core Image's smile detector to prevent smiling photos. Or you could invert the mask to check if the real background is white when not hiding the background.

You could also look at publishing hasDetectedValidFace through a Combine stream. By throttling the stream, you could stop the UI from flickering fast when a face is on the edge of being acceptable.

The Apple documentation is a great resource for learning more about the Vision framework. If you want to learn more about Metal, try this excellent tutorial to get you started.

We hope you enjoyed this tutorial. If you have any questions or comments, please join the forum discussion below!


More like this