Voice-activated devices used to be the stuff of science fiction, but these days, everything from in-home lighting to motor vehicles can be controlled simply by speaking out loud.
Speech technologies have allowed for rudimentary voice control since at least the 1960s when IBM created Shoebox, a machine that could listen for and recognize 16 single words.
Voice technology continued to be refined over the years, eventually resulting in products like Dragon Dictate, a voice recognition technology that revolutionized consumer access to diction software.
Today, Google and other companies are on the cutting edge of speech technologies as voice recognition, speech processing and voice control are all features found in the latest smartphones, tablets, smartwatches, televisions and other household appliances and systems.
How Do Voice Activated Devices Work?
Voice activation works through active listening. A voice-controlled device will usually include a microphone that is constantly active, allowing it to take in, and then analyze all sounds around it.
When a particular word or phrase is “heard”, it will be recognized through software analysis. This happens through a number of processes, but on the simplest level, this is done by examining waveforms.
While people think of sound as something that is heard, it is actually the displacement of air molecules. The change in air pressure exerted on the surrounding air causes vibrations.
These vibrations, graphed as soundwaves, move at different frequencies. It is these frequency changes that determine the pitch of a sound, and analysis software can tell fairly accurately what a word is and who is speaking by matching the frequency and shape of a waveform against either a dataset or a sample waveform.
Neural network technology is also used in voice recognition and processing. A neural network acts like a dynamic processing unit that can take in data, apply a series of algorithms to analyze the data and then generate a unique output based on the changes that need to be applied.
Neural networks can be used to filter out background noise and add clarity to sounds that are recorded with voice recognition technologies.
Accuracy, Privacy And Ethics Of Voice Recognition Technologies
As powerful and convenient as voice control has become in the digital age, these technologies are not without their drawbacks or detractors. Privacy advocates point out the potential for malfeasance and hacking due to voice-controlled smart devices having an always-on microphone and Internet connection.
Although companies like Google and Amazon state that their Google Home and Alexa devices, respectively, only begin recording when someone explicitly says a key phrase, some people are wary of having a microphone always monitoring the environment.
Additionally, the question of what happens to the recordings that voice technology companies capture is another matter of concern. Privacy advocates have raised concerns in the past related to the storage and archival of voice recording files as well as who has access to these recordings.
Amazon’s Alexa digital assistant is used across the company’s range of services to allow customers to ask questions, order products from Amazon and control smart devices. In 2019, it was revealed that Amazon employed a small global army of reviewers who were tasked with manually listening through customer recordings.
The company stated that this was done for the purpose of improving the service. Having humans listen through customer recordings allowed Amazon to tag words to help its software learn to recognize phrases with greater accuracy.
Amazon, like other voice recognition service providers, does allow customers to opt-out of programs that involve the use of voice recordings for purposes other than receiving direct interaction with services.
The Ethical Future Of Voice Technology
Although privacy concerns are one troubling aspect of voice recognition and processing by machines, ethical questions are a bit more worrisome. As machine learning and artificial intelligence continue to advance, some question the accuracy and ethics involved when machines are able to make decisions based on voice analysis.
A smart home device listening for someone to call out music playlist suggestions is one thing, but what happens when machines base important decisions on voice recognition?
Can you trust that your home security system will remain armed while you’re away or asleep based on voice recognition alone? Can a machine analyze things like gender or sexual orientation based on your voice?
Can a piece of software determine your mood or intent using only a recording of your voice? What about recognizing subtleties in speech patterns across all the languages of the world?
These are all questions that have been raised because companies are working on technologies that are purported to be able to accomplish all of these tasks. Some voice services providers already have customer service tools in place that attempt to ascertain a caller’s mood during a customer service inquiry.
Other technologies used for sales try to figure out whether someone is male or female based on voice alone so that certain products or services can be marketed. If these technologies can not achieve 100% accuracy, the potential exists for voice recognition to do more harm than good.