Alexa, please stop listening to me.

Amazon's patent application describing how Alexa could listen and record 24/7.

Sep 04, 2020

Hi there! Thank you so much for reading Patents Today! We are a small community of 369 curious readers. I’d like to set a goal of 1000 subscribers before the end of this year, we have great growth and believe we can do so! Please share this newsletter! Thank you, enjoy this weeks article :)

Brief Summary

Amazon has submitted a patent application (not yet granted) detailing how Alexa could constantly listen and record users’ speech before they say ‘Alexa’.

Date of patent application: May 23, 2019

[2] Link to the patent application

Disclaimer: This is my personal take on the patent and opinions from other people may differ.

Full Summary

Problem with Alexa currently

Amazon dominates the smart speaker market owning 70% of the market. Currently, to use Alexa, users must say a ‘wakeword’ to trigger Alexa to start recording speech and processing commands. For example, when I say ‘Alexa, what’s the weather today?’, the wakeword ‘Alexa’ triggers the smart speaker to start recording my speech until I pause. Alexa then takes my recorded speech and sends it to Amazon servers to process my command.

“A user may not always structure a spoken command in the form of a wakeword followed by a command (e.g. “Alexa play some music”). Instead a user may include the command before the wakeword (e.g. “play some music Alexa”) or even insert the wakeword in the middle of a command (e.g. “play some music Alexa, the Beatles please”). While such phrasing may be natural for a user, curtain speech processing systems are not configured to handle commands that are not preceded by a wakeword.” [1]

Amazon acknowledges that this isn’t the most ideal way to interface with Alexa because colloquial speech is not always structured. When users forget to say the wakeword before the command, they must say the command again in the proper structure for Alexa to understand which makes for a non-seamless experience.

Currently, smart speakers and voice assistants are used for very rudimentary tasks and are not integrated seamlessly into out daily lives. If Amazon is working towards Alexa becoming an actual ‘assistant’, it must understand non-structured human speech.

Proposed solution

The problem isn’t necessarily that we have to use a wakeword to trigger Alexa, it’s the fact that we must always structure our speech in the same way (wakeword must precede the command) that makes it non-ideal.

Amazon details a solution where Alexa would constantly listen and record speech, and once it hears a wakeword, it would analyze the speech before and after the wakeword to determine what the command was. This makes for a much more natural interaction with Alexa and all the examples from quote [1] would be understood by Alexa.

This figure shows the general architecture of this proposed speech processing .

“As part of a distributed speech processing system, the local device may be configured to continuously send all detected audio to the remote device.”

For example, if I were to say “play some music Alexa, the Beatles please”, Alexa would be recording my entire speech and once it hears the wakeword ‘Alexa’, it looks at the words before and after the wakeword to determine what the command was. Essentially, it seems Alexa would listen and record everything it hears and once it hears the wakeword, it begins processing what it should do.

You can see in this figure that after the wakeword is said (#708), Alexa has a buffer of recorded speech before and after the wakeword. Alexa would then send this entire recorded speech to Amazon servers to process the command.

Security concern

While it may be interesting and useful for Alexa to be able to understand non-structured speech, a big issue concerning many people would be around security and privacy. With hacks and data leaks becoming more common, having every conversation recorded and sent to Amazon servers is a huge security risk and a breach of privacy for many people. For years, people have been speculating that large tech companies already listen to them 24/7, and now, this pending patent could make these speculations true.

“Another drawback to such an approach is that privacy concerns may make it undesirable for a local device to send all captured audio to a remote device.”

Amazon has acknowledged these security and privacy concerns around ‘always on speech recording and processing’ and detailed some possibilities to reduce these concerns. A simple and probable option is around letting the user control how much audio Alexa can record. For example, Alexa can be customized to only record 30 second (or whatever time frame) snippets of audio before clearing its memory. This would mean that Alexa could only have at most 30 seconds of recorded audio of the most recent conversation (similar to the sliding window concept in computer science).

Although this may reduce the amount Alexa can record, I would imagine Amazon would have to do far more before people would be okay with this technology in their houses and offices.

Finishing Thoughts

I wonder if adding GTP-3 to Alexa would produce better and more ‘human’ like results? GPT-3 has been immensely powerful in understanding text and handling tasks. Alexa + GPT-3 may allow Alexa to understand and complete much more complex tasks and seem more human.
Even today, Alexa is technically ‘always listening’ since it’s always listening for the wakeword to begin recording and processing commands. Alexa just doesn’t start recording and sending speech to servers until it hears the wakeword (or so Amazon says). If Alexa had this new ‘always on speech recording and processing’ technology, would it actually be any different in terms of privacy? As long as Alexa isn’t constantly sending and storing speech to Amazon’s servers and is only storing speech on it’s local memory (until it hears the wakeword), is it any different in terms of privacy than the Alexa smart speakers today?
Would you be comfortable having an Amazon smart speaker with this ‘always on speech processing’ technology?

Thank you for reading!

If you enjoyed this, please consider subscribing and sharing!

If you have any feedback, please write a comment!

Patents Today