Google takes voice recognition to public places
Google is investing a lot in conversation design. Last year, the company published a manual of good practices so that it could be applied to the development of this technology.
In the company, it is believed that they are producing another potential interface. They do not say that voice will be the only thing, but it will be an important tool.
When you use Google Assistant you can see in your phone app if Ok Google is operational and that seems to be the base for the launch of new tools for future projects.
The phone notifies you every time something has been recorded, and the recordings can be permanently deleted, Google claims. They are also working on technologies that will make voice recognition happen locally on your phone, without sending anything to the cloud, they say.
In January, the interpreter mode was announced, which will allow you to put the phone as a real-time translator for several languages.
Another thing the engineers at Google feel particularly proud of is the use of voice in accessibility and inclusive design. They say it is for the benefit of those who have difficulties moving around the house or the office.
Those who may benefit most from this technology are those who have mobility, vision or expression problems, a Google press release claims.
This new technology has caused a lot of enthusiasm. A study released by National Public Radio says that half the time people use smart speakers is with other people, which ensures that more voices will be recorded and stored.
Of course, that reality is pointed out positively by Google. They say it creates a sense of community. As of today, 41% of our lives happen in front of a screen, so being able to talk to the devices during almost half of your life will provide Google with lots of audio to be recorded about people’s private lives.
“You can throw a quick question while you are at the table eating, which is much less cumbersome than starting to look for something on your mobile,” say Google representatives. “And so everyone hears the question and the answer, so that the conversation is not interrupted.” Voice recognition can somewhat relieve our addiction to screens, but in the process, it opens the door for unwanted, illegal snooping by Google and third parties.
It is very common now to talk to a speaker
Now, smart speakers are mostly seen in homes, but soon they will also be in stores, restaurants, at work and in public places.
When people are in public they don’t like to speak loudly to their phones, so Google has the intention to work on a technology called silent speech, which has been prototyped in the MIT Media Lab under the name of Alter Ego.
The device has jaw sensors that pick up the pre-talk signals. Before speaking, people send micro signals that are caught by Alter Ego to preempt possible words and sentences.
The idea is to capture and code those micro signals so that people can communicate with their devices without ushering a word. This technology will trigger the use of voice recognition on a mass scale in private and public places.
“We have localization teams to help us understand things that in one culture may sound different than in another,” says Cathy Pearl, who has worked on improving Google’s voice recognition for 20 years.
First of all, she says, you no longer need to have microphones in front of yourself so that they catch you well. The accuracy of speech recognition is enormous. And the understanding of natural language has improved a lot, she explains.
One of the biggest limitations right now, even for Google, is what it’s called discoverability. Let’s say you have a smart speaker: how do you know what you can do?
Surely you’d be able to do thousands of things, but how do you know what to say exactly to make it work? Sometimes, it becomes a kind of guessing game, and that can be frustrating for the user. Companies like Google and Apple have been working hard by pre-installing a series of trigger sentences or words that will enable their AI voice recognition systems to get activated.
Another thing that is very limited, according to Pearl, has to do with the understanding of natural language, that is, with the understanding of context. Computers don’t have common sense. Things that can be tremendously obvious to a person are not captured by the system. This is why technology companies continue marching forward in the process of secretly recording people’s private interactions at home, so that their speech recognition and algorithms are better trained to, eventually, recognize everything we say.
“It can be difficult to have multiturn conversations in which the computer really realizes the context of what has been said and how that influences what needs to be done next,” says Pearl.
There are projects that try to guess the user’s mood by their tone of voice. That is the next step that Google wants to take to gain total control of what we say and to open the way to delve into what we think.