The sensor consists of an off-the-shelf VCSEL 850nm laser with an embedded photodiode, packaged with the company's proprietary ASIC for processing the soundwaves that are detected optically from reading out the speakers' skin vibrations. Interviewed by EETimes Europe, Rammy Bahalul, Vice President of Sales and Business Development for VocalZoom gave more details about the technology.
"When someone speaks, the sound propagates all over the skin too, and we can measure these vibrations by detecting the laser's reflection on the skin through an interferometer. The way the interferometer works is that any back reflections interfere with the stabilized laser wavelength in the cavity, and that impacts the laser power".
The ASIC monitors the laser power fluctuations as read by the built-in photodiode, and turns it into a noise-free “audio” signal that can then be fused with the real audio signal recorded by a microphone, either through an audio processor or cloud software.
"It is similar to bone conduction, but without contact, we can measure vibrations up to 1.5kHz", continued Bahalul, "we are not reading lips but actual facial vibrations, these can be detected from the cheeks, all around the neck and even behind the ears."
The optical sensor can be placed a few millimetres away up to a meter, making it practical for applications in headsets, wearables, smartphones or laptops, but also in automotive applications where it could be mounted into the rear-view mirror or in ATMs.
When tested with leading speech recognition providers, the startup claims its HMC sensor makes all the difference in noisy environments, (even in strong and complex noise), reducing almost all errors and making speech recognition more widely usable. In a high noise environment, the company is able to revive original speech from -10dB (inaudible voice versus high noise) to 20dB when VocalZoom enabled.
As well as improving speech recognition, audio signal fusion from the optical sensor and a microphone could enable many features currently served by discrete sensors. It could be used to perform more robust voice identification through multi-factor biometrics (each individual having a unique facial "sound signature"), but also serve as an accurate and low power voice wakeup solution. The sensor is accurate enough to detect the speaker's heart rate from the skin, doubling as a liveness sensor, since it can make the difference between a sound speaker and a live person.