Chapters

Hide chapters

Machine Learning by Tutorials

Second Edition · iOS 13 · Swift 5.1 · Xcode 11

Before You Begin

Section 0: 3 chapters
Show chapters Hide chapters

Section I: Machine Learning with Images

Section 1: 10 chapters
Show chapters Hide chapters

13. Sequence Classification
Written by Chris LaPollo

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now

If you’ve followed along with the last couple chapters, you’ve learned some things about how working with sequences differs from other types of data, and you got some practice collecting and cleaning datasets. You also trained a neural network to recognize user gestures from iPhone sensor data. Now you’ll use your trained model in a game where players have just a few seconds to perform an activity announced by the app. When you’ve finished, you’ll have learned how to feed data from your device into your model to classify user activity.

This chapter picks up where the last one ended — just after you added your classification model to the GestureIt project. If you didn’t go through the previous chapter and train your own model, don’t fret! You can always use the GestureIt starter project found in the chapter resources. Either way, once you have the project open in Xcode, you’re ready to go!

Classifying human activity in your app

You trained a model and added it to the GestureIt project in the last chapter, and you learned a bit about how that model works. Now take a quick look through the project to see what else is there. The project’s Info.plist file already includes the keys necessary to use Core Motion, explained earlier when you built the GestureDataRecorder project.

GestureIt’s interface (not shown here) is even simpler than GestureDataRecorder’s — it’s just two buttons: Play and Instructions. Choosing Instructions shows videos of each gesture, and Play starts a game.

While playing, the game speaks out gestures for the player to make, awarding one point for each correctly recognized gesture. The game ends when the app recognizes an incorrect gesture or if the player takes too long.

The project already includes the necessary gameplay logic, but if you play it now you’ll always run out of time before scoring any points. If you want it to recognize what the player is doing, you’ll need to wire up its brain.

All the code you write for the rest of this chapter goes in GameViewController.swift, so open that file in Xcode to get started.

This file already imports the Core Motion framework and includes all the necessary code to use it. Its implementations of enableMotionUpdates and disableMotionUpdates are almost identical to what you wrote in the GestureDataRecorder project. The differences are minor and you should have no problem understanding them. As was the case with that project, this file contains a method named process(motionData:) that the app calls whenever it receives device motion data. At the moment it’s empty, but you’ll implement it later. For now, import the Core ML framework by adding the following line with the other imports near the top of the file:

import CoreML

In order to keep your code tidy and more easily maintainable, you’ll store numeric configuration values as constants in the Config struct at the top of the class, just like you did in the GestureDataRecorder project. To start, add the following three constants to that struct:

static let samplesPerSecond = 25.0
static let numberOfFeatures = 6
static let windowSize = 20

These values must match those of the model you trained. You’ll use samplesPerSecond to ensure the app processes motion data at the same rate your model saw it during training. The dataset provided in this chapter’s resources was collected at 25 samples per second, so that’s the value used here. However, change this value if you train your own model using data fed to it at a different rate.

Note: In case it’s not clear why the app’s samplesPerSecond must match that of the dataset used to train your model, consider this example: Imagine you trained your model using a prediction window of 200 samples, on data collected at 100 samples per second. That means the model would learn to recognize actions seen in highly detailed, two-second chunks. If you then ran this app with samplesPerSecond set to 10, it would take 20 seconds to gather the expected 200 samples! Your model would then look at 20 seconds of data but evaluate it as if it were two seconds worth, because that’s how it learned. This would almost certainly make the patterns in these sequences appear different from what the model saw during training. Remember, machine learning models only work well with data that is similar to what they saw during training, so getting the sampling rate wrong here could make a perfectly good model seem completely broken.

Likewise, the model discussed in this chapter expects data in blocks of 20 samples at a time, with six features for each sample. The windowSize and numFeatures constants capture those expectations.

Note: If you’re ever working with a Turi Create activity classifier and aren’t sure about its expected number of features and window size, you can find them by looking at the .mlmodel file in Xcode’s Project Navigator. However, this does not include information about the rate at which motion data needs to be processed, so that you’ll just need to know.

Now that you’ve added those constants, you can complete the starter code’s implementation of enableMotionUpdates by setting the CMMotionManager’s update interval. To do so, add the following line inside enableMotionUpdates, just before the call to startDeviceMotionUpdates:

motionManager.deviceMotionUpdateInterval = 1.0 / Config.samplesPerSecond

Just like you did in GestureDataRecorder, this tells motionManager to deliver motion updates to your app 25 times per second — once every 0.04 seconds.

Core ML models, such as GestureClassifier, expect their input in the form of MLMultiArray objects. Unfortunately, working with these objects involves quite a bit of type casting. Swift’s type safety is great, and explicit type casting forces developers to be more thoughtful about their code — but I think we can all agree code gets pretty ugly when there’s too much casting going on. To keep that ugliness — and the extra typing it requires — to a minimum, you’ll be isolating any MLMultiArray-specific code within convenience methods. Add the first of these methods below the MARK: - Core ML methods comment in GameViewController:

static private func makeMLMultiArray(numberOfSamples: Int) -> MLMultiArray? {
  try? MLMultiArray(
    shape: [1, numberOfSamples, Config.numberOfFeatures] as [NSNumber],
    dataType: .double)
}

This function takes as input the number of samples the array should contain. It then attempts to make an MLMultiArray with a shape and data type that will work with our model: [1, numSamples, Config.numFeatures] and double, respectively. Notice how the shape needs to be cast as an array of NSNumbers — you’ll see a lot of those types of casts when dealing with MLMultiArrays.

Attempting to create an MLMultiArray can fail by throwing an exception. If that occurs here, the try? causes this function to return nil. This might occur in situations such as when there is insufficient memory to create the requested array. Hopefully it doesn’t ever happen, but you’ll add some code to deal with that possibility a bit later.

Now that you have that handy function, you’ll use it to create space to store motion data to use as input to your model. Add the following property, this time to the area under the // MARK: - Core ML properties comment:

let modelInput: MLMultiArray! =
  GameViewController.makeMLMultiArray(numberOfSamples: Config.windowSize)

This creates the modelInput array, appropriately sized for the model you trained. Later you’ll populate this array with motion data prior to passing it to your model for classification.

Note: You may have noticed that modelInput is declared as an implicitly unwrapped optional, but makeMLMultiArray can return nil. Doesn’t that mean you run the risk of crashing your app elsewhere if you try to unwrap modelInput when it’s nil? Normally, that would be a problem, but later you’ll add some code that ensures this can never happen.

Overlapping prediction windows

Now, you could work with just a single MLMultiArray like modelInput, repeatedly filling it up over time and passing it to the model.

Reusing a single array to make predictions
Raigijp i liyvti uhsow to qodu wfobudfoatl

What if an activity spans across predictions?
Gyoh id aw aqrakell gjibs iyfiny lmicomfoiyb?

What if one prediction sees data for multiple activities?
Fqil el uwu srijoldoel luom veva pox celxifde izdemifuew?

Overlapping predictions
Ajeldihcert rxovutxuawp

static let windowOffset = 5
static let numberOfWindows = windowSize / windowOffset
Gesture It’s overlapping predictions — windowSize=20, windowOffset=5
Mosgewa Aw’m apiwbesgonm dbocibteezy — nadrulFepe=13, pimzafEfvsag=2

static let bufferSize =
  windowSize + windowOffset * (numberOfWindows - 1)
let dataBuffer: MLMultiArray! =
  GameViewController.makeMLMultiArray(numberOfSamples: Config.bufferSize)
var bufferIndex = 0
var isDataAvailable = false
Buffer contents over time
Jawhag nemlipww ameh pape

Buffering motion data

Now you’re going to add code to handle MLMultiArrays that end up as nil. Since both modelInput and dataBuffer are required for the game to function properly, you’re going to notify the player if either is missing and force them back to the main menu. However, you may want to make your own apps more robust. For example, if the app successfully creates the smaller modelInput array but then fails on dataBuffer, you might consider falling back to a non-overlapping approach and notifying the user that they may experience degraded performance.

guard modelInput != nil, dataBuffer != nil else {
  displayFatalError("Failed to create required memory storage")
  return
}
@inline(__always) func addToBuffer(
  _ sample: Int, _ feature: Int, _ value: Double) {
  dataBuffer[[0, sample, feature] as [NSNumber]] =
    value as NSNumber
}
// 1
func buffer(motionData: CMDeviceMotion) {
  // 2
  for offset in [0, Config.windowSize] {
    let index = bufferIndex + offset
    if index >= Config.bufferSize {
      continue
    }
    // 3
    addToBuffer(index, 0, motionData.rotationRate.x)
    addToBuffer(index, 1, motionData.rotationRate.y)
    addToBuffer(index, 2, motionData.rotationRate.z)
    addToBuffer(index, 3, motionData.userAcceleration.x)
    addToBuffer(index, 4, motionData.userAcceleration.y)
    addToBuffer(index, 5, motionData.userAcceleration.z)
  }
}
static let windowSizeAsBytes = doubleSize * numberOfFeatures * windowSize
static let windowOffsetAsBytes = doubleSize * numberOfFeatures * windowOffset
// 1
guard expectedGesture != nil else {
  return
}
// 2
buffer(motionData: motionData)
// 3
bufferIndex = (bufferIndex + 1) % Config.windowSize
// 4
if bufferIndex == 0 {
  isDataAvailable = true
}
// 5
if isDataAvailable &&
   bufferIndex % Config.windowOffset == 0 &&
   bufferIndex + Config.windowOffset <= Config.windowSize {
  // 6
  let window = bufferIndex / Config.windowOffset
  // 7
  memcpy(modelInput.dataPointer,
         dataBuffer.dataPointer.advanced(
           by: window * Config.windowOffsetAsBytes),
         Config.windowSizeAsBytes)
  // 8
  // TODO: predict the gesture
}

Making predictions with your model

At long last, your project is ready to start recognizing gestures. Almost. So far the app contains a lot of data processing and business logic — it still needs the machine learning bit!

let gestureClassifier = GestureClassifier()
var modelOutputs = [GestureClassifierOutput?](
  repeating: nil,
  count: Config.numberOfWindows)
static let predictionThreshold = 0.9
func predictGesture(window: Int) {
  // 1
  let previousOutput = modelOutputs[window]
  let modelOutput = try?
    gestureClassifier.prediction(
      features: modelInput,
      hiddenIn: previousOutput?.hiddenOut,
      cellIn: previousOutput?.cellOut)
  // 2
  modelOutputs[window] = modelOutput

  guard
    // 3
    let prediction = modelOutput?.activity,
    let probability = modelOutput?.activityProbability[prediction],
    // 4
    prediction != Config.restItValue,
    // 5
    probability > Config.predictionThreshold
  else {
      return
  }

  // 6
  if prediction == expectedGesture {
    updateScore()
  } else {
    gameOver(incorrectPrediction: prediction)
  }
  // 7
  expectedGesture = nil
}
predictGesture(window: window)
func resetPredictionWindows() {
  // 1
  bufferIndex = 0
  // 2
  isDataAvailable = false
  // 3
  for i in 0..<modelOutputs.count {
    modelOutputs[i] = nil
  }
}
resetPredictionWindows()

Challenges

Challenge 1: Expanding Gesture

It would be a good way to get some practice with activity recognition. Adding new gesture types to the GestureDataRecorder project is a straightforward process, so start there, and then collect some data. Next, add your new data to the provided dataset and train a new model. Replace the model in the GestureIt project with your newly trained model, and make the few modifications necessary to add your new gesture to the game.

Challenge 2: Recognizing activites

After that, you could try recognizing activities other than gestures. For example, you could make an app that automatically tracks the time a user spends doing different types of exercises. Building a dataset for something like that will be more difficult, because you have less control over the position of the device and more variation in what each activity looks like. In those cases, you’ll need to collect a more varied dataset from many different people to train a model that will generalize well.

Challenge 3: Using other devices

Keep in mind, these models work on other devices, too. The Apple Watch is a particularly fitting choice — a device containing multiple useful sensors, that remains in a known position on the user and is worn for all or most of the day. If you have access to one, give it a try!

Key points

  • Use overlapping prediction windows to provide faster, more accurate responses.
  • Call your model’s prediction method to classify data.
  • Pass multi-feature inputs to your models via MLMultiArray objects.
  • Arrange input feature values in the same order you used during training. The model will produce invalid results if you arrange them in any other order.
  • When processing sequences over multiple calls to prediction, pass the hidden and cell state outputs from one timestep as additional inputs to the next timestep.
  • Ignore predictions made with probabilities lower than some reasonable threshold. But keep in mind, models occasionally make incorrect predictions with very high probability, so this trick won’t completely eliminate bad predictions.
Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.
© 2025 Kodeco Inc.

You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now