If ‘Hey Siri’ Fails Once You Don’t Have To Worry – The Device Deals With This Failure By Becoming Extra Sensitive To Commands


A new entry in Apple’s Machine Learning Journal brings us a step closer to understanding how hardware, software and internet services work and bring forward the hands-free Hey Siri feature. This feature is available on the latest iPhone and iPad Pro models. So how does this combination work? How can we access this feature so readily and easily? The entire process has a certain element of grace in it, but that’s just my opinion.

The Hey Siri sensitivity process

A tiny speech recognizer has been built into the embedded motion co-processor which runs all the time and waits for the two magic words, Hey Siri. When these words are detected, Siri then takes anything said after this as a command or a query. The detector uses a special Deep Neural Network that converts the acoustic pattern of a person’s voice into a probability distribution. After this a temporal integration process computes a confidence score to ensure that the phrase Hey Siri was actually used. If the score is high, then Siri executes the command.

Voice Assistants: A Big R&D Bet Which People Rarely Use

So what happens when the score is low? According to Apple, if the score is low but exceeds the lower threshold then the device becomes more sensitive for a few seconds. This helps Siri recognize the phrase much more effectively if the phrase is repeated again.

"This second-chance mechanism improves the usability of the system significantly, without increasing the false alarm rate too much because it is only in this extra-sensitive state for a short time," said Apple.

To prevent Siri from getting triggered by strangers, the company has invited users to complete a short enrollment session in which they will say 5 phrases that begin with ‘Hey Siri’. The examples have been saved on the device.

We compare the distances to the reference patterns created during enrollment with another threshold to decide whether the sound that triggered the detector is likely to be "Hey Siri" spoken by the enrolled user.This process not only reduces the probability that "Hey Siri" spoken by another person will trigger the iPhone, but also reduces the rate at which other, similar-sounding phrases trigger Siri.

The company has also created ‘Hey Siri’ recordings from both close and far environments like kitchen, car, bedroom, restaurant etc. They have also taken into consideration the native languages of different places around the world.

News Sources: Apple Says '