Technology
Here’s how your AI-enabled device understands your voice
Voice-activated home assistants like Alexa and Cortana know details about you that range from your guilty pleasure playlists to your go-to food delivery preferences — even the type of toilet paper you prefer. But how well do you understand these AIs in return?
In the video above, part of Lenovo’s “Extreme I.T.” series, Lenovo Industrial Designer Ali Ent explains the basic technologies that fuel voice-command assistants. She does so while being subjected to a flesh-and-blood command-activated entity: a ferocious Belgian Malinois named Voodoo. Be sure to watch the full video to get her (impressive, especially under the circumstances) explanation about concepts like analog to digital converters (ADCs) and phonemes.
To supplement Ent’s insights in the video, we also spoke with Rod Waltermann, distinguished engineer and chief architect for contextual and cloud computing at Lenovo, to shed light on voice-command platforms and what makes them tick.
“Every time you ask Alexa or Cortana about the weather or the score of a World Cup game, the platform files that information away, learning more and more about you.”
Three layers of language-processing technology
There are many technical nuts and bolts that go into carrying on a coherent conversation with an AI. Waltermann takes Ent’s explanation in the video to a more granular level, detailing the technological layers under the hood of an AI-driven speech platform.
“A typical speech platform has three high-level components,” he explains. The first is a Speech to Text (STT) or Automated Speech Recognition (ASR) engine. The job of this component is to convert the electrical impulses of human speech into text — a process aided by machine learning (ML). Today’s advanced ML systems are able to train programs to understand different languages and pronunciations.
The next layer is called Natural Language Processing (NLP). This is the technology that helps machines effectually “understand” what users are saying and extract meaning from commands. NLP technology is typically built upon deep learning techniques, a subset of AI that mimics the neural pathways of the human brain. NLP is usually guided by a series of learned grammatical and logical rules.
“The above two components are generally considered one-half a voice-command platform,” explains Waltermann. “When you ask a voice-enabled AI like Alexa or the Google Assistant [about the weather], the STT will generate a word list of what you spoke.”
In other words, a query such as, “What’s the weather like today” will trigger the NLP to target words like “weather” and “today,” and deduce the user’s question and intent in context.
The final component of a voice-command platform is its speech generation capability — sometimes called a chatbot. This is the element that takes discovered data (e.g. today’s weather report) and produces natural-sounding language to relay back to the user: “Today, it’s 95 and sunny,” for instance.
What’s in store for human/AI interactions?
Every time you ask Alexa or Cortana about the weather or the score of a World Cup game, the platform files that information away, learning more and more about you — and about the art of conversation — the more frequently you talk to it.
“[The tech is] making great strides as more and more people use voice-command solutions like the Google Assistant on the Lenovo Smart Display, or Alexa and Cortana on the Yoga and ThinkPad laptops,” says Waltermann.
He adds that, although the technologies have come a long way in the past few years, there are still limitations. “The solutions are like a toddler [in that] they have a limited set of vocabulary and sentence-construction skills,” he says. “But as more people interact, the AI is analyzing their questions and responses, which triggers additional training and understanding. These systems are one big feedback loop, with the output being curated (corrected or reinforced) and fed back in to the system to make it smarter.”
While no system as of yet has successfully passed the Turing test — a test of a machine’s ability to exhibit intelligence that’s indistinguishable from that of a human — Waltermann believes we’re getting closer by the year. Today, voice-command solutions are getting better at determining things such as when users are multitasking — talking about the weather while ordering a coffee, for example — although this is an area where the technology is still being tweaked.
“The other area where machines [need improvement] is making what we call idle chit-chat,” says Waltermann. “Although, now that I think about it, not every person is good at that,” he adds.
In the future, voice-command solutions and their underlying technology will likely become more streamlined as neural networks and deep learning techniques pave the way for advanced solutions to live in smaller packages. “Smaller, smarter devices that can do basic tasks for us and interact with us is something we see coming,” says Waltermann.
He adds that voice-activated technologies will likely become an even more deeply ingrained part of smart home ecosystems in the coming months and years. “In the near future, you can imagine having a smart vacuum cleaner in your house that is capable of hearing you ask to clean up a spot, detecting which spot you mean, then moving over to it and cleaning up,” Waltermann posits.
In general, Waltermann is excited for what the future holds in this rapidly developing realm of technology. Even robotic pets — hopefully ones that are equally obedient but significantly less scary than Voodoo — may be on the horizon.
“The children of today are growing up in a world where you talk to machines and ask or tell them what you want in a natural conversation format,” he says, noting that even though this technology is still in its nascent years, people are already starting to take assistants like Alexa for granted.
“It’s exciting to think about what [the next generation] will take for granted,” he muses.
Don’t be “tech basic”—to see more Extreme I.T. madness and find the answers to your biggest tech questions, .
-
Entertainment7 days ago
‘Only Murders in the Building’ Season 4 ending explained: Who killed Sazz and why?
-
Entertainment6 days ago
When will we have 2024 election results online?
-
Entertainment7 days ago
5 Dyson Supersonic dupes worth the hype in 2024
-
Entertainment5 days ago
Halloween 2024: Weekend debates, obscure memes, and a legacy of racism
-
Entertainment6 days ago
Social media drives toxic fandom. Is there a solution?
-
Entertainment5 days ago
Is ‘The Substance’ streaming? How to watch at home
-
Entertainment5 days ago
M4 MacBook Pro vs. M3 MacBook Pro: What are the differences?
-
Entertainment3 days ago
Menendez brothers case reignites online: The questions that keep resurfacing