Whisper and AR

“Voice is the new touch.” - Kopin CEO, John Fan

As noted in this Seeking Alpha article from 2016, the world is rapidly moving to Voice as the primary User Interface element we use to interact with our devices. In the last few years, the rate of adoption of Voice as preferred input method has grown rapidly.

Consider the breakout success of the Amazon Echo, Apple Airpods, and proliferation of Voice engines from Apple, Google, and others.

Furthermore, Voice is seen by many as the optimal input method for AR headsets and smart glasses. Watch Kopin CEO, John Fan speak about the importance of Voice for head-worn devices at MIT in January of 2019:

To Be Adopted By Consumers, Voice Must Be Flawless
While a fast typist can input data at a rate of perhaps 35 or 40 words-per-minute (WPM), humans speak at a rate equal to 140 WPM. The advantages of Voice as a a primary input method for our devices is clear. But, there is a problem …

Current speech recognition accuracy is not meeting user expectations and that is holding back adoption of Voice-based devices. Importantly, its not the speech engines holding us back. Natural Language Processing (NLP) engines from Nuance, Apple, Microsoft, Google, Baidu, IBM, and others are very good at interpreting a clean speech signal and converting that to a commands for execution.

The issue today is that the input speech signal is not “clean” or pure. It suffers from distortions. These distortions are the result of ambient noise, or attempts from traditional noise cancellation methods (algorithms) to remove ambient noise.

Paul Baker explains the challenges of ambient noise in this video from AR In Action at MIT:

Whisper - A Unique Solution To Our Noise Problem
From Kopin’s early days developing the Golden-i AR headset it became clear that a hands-free UI would be required. Voice is a critical part of a good hands-free UI. That realization was the genesis of 10 years of development of key speech enhancement IP at Kopin. Key events along the R&D path include:

2010 - Kopin integrates Aurisound noise cancellation IP into the Golden-i OS
2012 - Kopin acquires an equity stake in NLP pioneer Ask Ziggy
2013 - Kopin acquires Aurisound’s IP and hires founder Dashen Fan the inventor of key IP for Bluetooth headsets

In addition to the development work itself, from 2009 through 2019, Kopin filed over 30 patents related to their Voice Extraction Filter (VEF) IP now known as Whisper. The culmination of this work is the creation of an NLP-agnostic System-on-Chip (SoC) containing the Whisper core and all required silicon for customers to integrate the IP into their devices.

The Whisper SoC enables device makers to dramatically improve speech recognition for all devices that encounter distortions such as high levels of ambient noise. The type of ambient noise that exists in everyday life. See a demonstration of Whisper in action below for more:

Audio AR Sees Broad Consumer Adoption First
In 2018, Kopin filed a Patent for a version of Whisper SoC that contains Bluetooth integration:

Source: USPTO

This new version of Whisper is a perfect solution for True Wireless Stereo (TWS) headsets. You know these devices as AirPods or Ear Buds. Next we will see such devices appear as sunglasses and everyday spectacles - the new Bose Frames Audio AR headset is a great example of just such an Audio AR transition.

The new and emerging category that is Audio AR is where we will see broad Consumer AR adoption first.

We expect that soon, for less than $200, Consumers will be able to purchase a smartphone accessory they can use for a hands-free, contextual, voice-based User Interface. This will become the preferred method to access information, interact with social media feeds, and generally manage your ecosystem of devices and apps.

Voice will serve as the primary UI for your life.

In that context, the Whisper SoC silicon could well become a required component inside all Voice-based devices. The Whisper chip currently costs around $5 in low volume. At that price, we expect developers of Voice-based devices may consider including Whisper in their designs as “insurance” against the negative user impact of distortions from the ambient noise of everyday life.

After all, if design-in of a $5 component moves the needle from 90% to 98% or 99% Voice recognition accuracy, the benefit of including Whisper becomes obvious to a designer of Voice-based devices.

With global sales of Voice-based devices reaching 100s of millions of units annually, the financial benefit to Kopin from design wins for this unique $5 component is clear.

Author: Derrick Zierler - First Published: March 10, 2019

What Is Whisper and Why Does It Matter For AR?