Just half a decade ago, the short story below would have seemed like science fiction, but most of the high-tech voice interface elements we mention are already in existence today.
Virtual personal assistant technology through voice interface navigation was merely a dream a couple of decades ago, but today it has already starting to integrate with our daily lives. This suggests a public appetite for voice-controlled artificial intelligence and a greater adoption in the coming years, but for now, let’s meet Sarah.
The streets are busy as Sarah walks briskly through the bustling city center. She’s running late for her 3 o’clock meeting, so slips out her phone to ask it to send a text message to her colleague in a district a couple of miles away. She speaks into the glowing voice interface pad in her hand. “Lucy, I’m sorry, but I won’t get there until about half-past three!” The spinning icon indicates recording, before responding with a jingle to confirm. Soon, an affirmation light flashes as an eager yet soothing voice response says, “Message sent to Lucy… don’t rush too much, Sarah! You’ve got plenty of time.”
Passing her favorite local bookstore, she hears her phone speak through the earpiece. “Remember, it’s Mason’s birthday this Thursday, and the new Jackson novel is out. I think he’d like that.”
“You’re right – as always,” she muses, so dashes into the store across from the cab line she was heading for and taps the automated sales desk, requesting the desired book. It’s in her hands in moments, a physical copy, a paperback, and still warm with that freshly printed smell. Her partner always enjoys old-school things.
The touch payment has debited her account before she’s left the store, and she’s on her way to a cab then her meeting.
History of Voice Interface Technology
Voice interface technology and the seamless interaction between humans and machines have been present in popular culture for decades. It’s almost impossible to avoid in science fiction books and TV shows, which have embedded the concept into Western culture as an inevitable evolution of how we control the technology around us. It can be difficult to assess whether the technology of the time influenced these stories, or if tech leaders had been inspired by Asimov, Roddenberry, and other great visionaries. However, it may be surprising to hear that voice activation technology came from humble beginnings around 100 years ago.
Radio Rex was an electromagnetically powered toy for children that used voice recognition technology to spring the toy dog out of his house on audio command. Another leap forward in voice interface technology was the IBM Shoebox, a voice-controlled calculator developed in the 1960s that could recognize 16 spoken words and digits 0 to 9. The Shoebox was presented at the 1962 Seattle World’s Fair to some intrigue, and people have been fascinated by a voice-activated world ever since.
There’s been an ongoing scramble to corner the voice interface navigation market, with Google being the first to launch such a feature for its app back in 2008. Apple followed in 2011 with the introduction of the now-ubiquitous Siri on the iPhone 4S. According to Statista, an estimated 3.25 billion digital voice assistants were being used in devices worldwide as of 2019.
The Future of Voice Interface Technology?
When researching this article, I reached out to the Voice User Experience (VUX) expert and founder of Ambiently, Rob Moores, who creates voice-first technology solutions for the EdTech sector. Moores predicts that VUI will quickly become the new GUI, so I asked him where the voice-controlled user experience is going in the next 5+ years.
“The main thrust of voice user experience is currently focused on smart speakers from Amazon and Google, with Siri and Samsung as the main aspiring contenders,” Moores claims. “However, this should perhaps better be considered a transitional technology. In very broad terms, it’s often easiest to ask for something using one’s voice, but easier to consume the response visually. In other words, the direction of travel for user interface design is ‘voice input, visual output,’ although this of course, hides a thousand variations and exceptions.”
Will Voice Interface Navigation change everything?
As explained by Moores, key drivers in voice interface technology are a reduction in service costs and improved customer service through ease of access to information, resources, and transactions. They also provide hands-free interaction, which offers greater accessibility to people with visual or motor challenges, and those with difficulty using screens and keyboards or touch surfaces.
All this sounds positive, and while it’s easy to become absorbed by voice-only interactions, we should remember that we are likely to always have a graphical user interface alongside the voice user interface. The vocal interaction may become the default access point to data and information, but we are a long way off normalizing being able to present and digest information exclusively in an audible format. Interfaces, complicated menus, and interaction methods cannot easily be memorized then repeated during voice-only interactions.
“A pure voice-only interface doesn’t have the luxury of being able to present the user with large amounts of information or choices. So as systems designers, we can’t use the get-out-of-jail card of throwing everything possible onto the screen with a menu system, then leaving the user to find their way around it. Voice is the opposite – it presents a very ‘thin’ user interface where only one phoneme can be sent or received at a time. Where our eyes can quickly scan a menu of multiple items, our short term memory can only manage 3-5 (on a good day!). As a result, voice is a challenge to systems designers who need to focus on understanding and anticipating the user’s explicit and latent needs and wants,” warns Moores.
According to Google, we are already seeing an explosive rise in voice searches, with voice search being the fastest growing online search method. So we need to optimize our online platforms and digital products for voice search and invest in this technology where we can, or risk being left behind.
The adoption of voice technology and digital assistance is a current trend, where the least demanding – and in most cases, least consequential – tasks are designated to voice interaction. As a consequence, will we now see an increase in the adoption of voice control for more complex actions? When the technology becomes more widely used, and systems that handle voice interface navigation improve, we can only guess how deeply this technology will embed, but we do need to take voice navigation seriously as we develop our software and product roadmaps.
What Software Testers need to know about Voice Personal Assistance
When designing products with a voice interface element, some variants need to be taken into account. While voice control will have some inherent accessibility gains, it can also be a double-edged sword. Regional accents and differences in speech ability may “confuse” the listening device and negatively impact the user experience. So, when testing your voice-controlled product, be as inclusive as possible with your audience. Remember that not everybody will speak with the same accent as your development team, or find speaking as easily as the “regular” person.
As discussed earlier, the amount of audible information a person can take in at any one time is finite, so limiting voice interface options is a good strategy. Where there are no visual cues, people have to rely on their own mental models of a process, which may soon fall apart under the weight of options. Fortunately, we can assume safely that most voice interface products will also have a visual display to accompany the audio response. We need to be aware of this and work to integrate the two into a seamless process and, consequently, a positive user experience.
So, will we see a fully voice integrated world, where we can effortlessly talk to our computers and receive seamless feedback? Maybe. But we need to start planning for the world, as people are working to build it rapidly.