Why Today’s Voice Recognition Will Never Work

Posted September 19, 2023, under Computing Past vs Future

There is a fundamental difference between the way that ExoTech translates spoken commands into computer actions, and the way that everyone else does it.

Now – pay attention children, because this is one of the things that sets us completely apart from everyone else! If you can get this one across to people that you talk to, there will be no issues with AI, or what anybody else is doing!

Everyone else, and I do mean everyone else by the way, is taking the words that are spoken and using software that will recognize key words it has been taught. Their systems then perform specific actions based on those words. So, if you have voice recognition on your computer set up the do some helpful actions, you might be able to say, “Print this page”, and the computer will recognize the word “Print” and initiate the printing of the current page that is on the screen.

Big deal! You could have done that with a mouse click or a keyboard key combination. So, all you have done here is to replace the mouse click with a verbal command.

But what if there is no document on the screen at present? And you want to print the document you just saved and closed. Well, I’m sorry, but you can’t do that with today’s voice recognition systems, because to say to the computer; “Print the document I just closed” is too complex a command for the computer to follow. It does not understand what you mean and doesn’t know where that document is anymore.

But what if you had a personal assistant and you asked him or her to print the last document that was saved? No problem at all.

And what if you said to your personal assistant; “Please print the last 4 pages of the document I wrote yesterday about apples and email it to Bill in London – but include a copy of our catalogue.” Your personal assistant would have no trouble doing all that. But there is no way that today’s voice recognition systems will ever be able to do that sort of thing. Why? Because they are not built to understand what is being communicated. They are only built to respond to specific words in a very specific way for each of those words.

This is very a key point. What is this difference I am making here? Today’s voice recognition systems are listening for key words and, when they hear one, they act on it in a specific way. “Print this document” can send the current document to the default printer. But it does not understand what is being asked of it. It is simply responding to a specific signal (the word “print” in this case) and acting on that in a way that is programmed into it. Like I said, it is responding to the verbal command in exactly the same way as it would if you clicked the print button.

By comparison, when you give instructions to a personal assistant, you can do it in many different ways, that should all result in the same thing happening:

“Print that document.”
“Send that to the printer.”
“Get me a printed copy of that.”
“Let’s print that and see what it looks like.”

But talking to a so-called artificial intelligence voice recognition device is like talking to a dog. If your dog knows the word “walk”, he may well respond to it. But if you say to him, “Hey, let’s take a walk down by the lake where we can toss the ball around.” All the dog hears is “Blah, blah, blah, WALK, blah, blah, blah, blah.” Or “woof, woof, WALK, woof, woof”, maybe. But he gets excited because he knows that “walk” means going out with his master, but he didn’t understand any of the rest of it.

And that’s what today’s voice recognition systems are like: they recognize a word in the midst of many other words and simply match that word with the one in its list of words it has been taught. So, it then knows what to do next.

Let’s consider what happens when you give such an order to your personal assistant.

The first thing is that you get your thoughts and ideas sorted out; what is it that you want to communicate here?
And when you have done that, you use words to express those ideas.
The words go floating through the air and your personal assistant receives them.
He or she then translates those words into the same ideas that you had in the first place; or we hope they get the same ideas. Sometimes people get the wrong ideas when you communicate something, but it usually gets clarified pretty fast.

So, it went:

Ideas
Convert the ideas to words.
The transmission of those words to another person.
The receipt of those words by the other person.
And, finally, translation back into the ideas that were transmitted.

The point is that what you wanted to happen is perfectly expressed in your thoughts and ideas. The way you communicate those ideas to other entities is by using words, or drawings, or hand gestures, or morse code, whatever. It does not matter what medium you are using to get your ideas across to others; what matters is the ideas themselves. That’s what’s important!

But the way that all voice recognition systems (today’s so-called “AI”) are built, they are only designed to recognize specific words that are spoken. They are not, then, translating those words into ideas. They are translating those words directly into actions.

But wait a minute, most words have multiple meanings, so working only with words leaves us open to lots of potential errors. The word “fax”, for instance can mean:

Fax: The action of sending a message using a fax machine.
Fax: The piece of paper that is to be run through the fax machine to send it.
Fax: The machine used to send fax message.
Fax: The location where the fax machine is located; “Go over to the fax.”

There are also words with almost identical pronunciations that mean totally different things. Consider “fax” and “facts”. That’s fax; F A X. And facts; F A C T S. Fax – facts. Or “you” and “ewe”. That’s Y O U and E W E (the female sheep).

Now, today’s voice recognition systems will never be able to sort those out, because those systems are only dealing with the words, not the meanings.

Have you noticed, by the way, that I am not using the terms “AI”, or “artificial intelligence” very much? That’s what their manufacturers call them, but although they may well be artificial, there is no way they have any intelligence in them.

I mean, I had one of my clients call me up a few years ago on my cell phone. Her name is spelt X I M E N A, which she pronounces “ex-a-mina”. So when she left a voice message for me, the cell-phone’s voice recognition system turned it into a text message, so I could read it immediately.

Do you know what the text message read? “Hi Neil. This is sexy Nina calling…” Well, she was the wife of the CEO of one of my most important clients, so there is no way I was going to tell her what my system thought she said. And I hope that she never gets to see the transcript of this little talk.

But look, how intelligent is that? No, the so-called “AI” systems like Siri, Alexa, and others, are not intelligent at all.

But, what about the ExoBrain? How does it manage verbal instructions or commands? Well, I can’t tell you the actual way it does that, as that is the secret that makes ExoTech so special. But I can say that, whereas today’s voice recognition systems react directly to words that are said, the ExoBrain will be hearing those words and converting them to the unique, precise, ideas that underlie those words. That’s the difference!

But, more importantly, they will be the exact ideas that were intended by the person who originated them. And that’s because the ExoBrain will actually be sorting out which of the several meanings of that word were intended by the person saying them, just like another person would. So, when the ExoBrain acts, it is acting on the precise meaning that was intended.

Now, that’s what I call intelligent! And that’s why today’s voice recognition systems will never catch up to us.

Neil Clark graduated from Gordon Institute of Technology, Geelong, Australia, as an Electrical Engineer. One of his first jobs was redesigning the entire electrical control system for the Fuel Recharging process at the Hunterston Nuclear Power Station in Scotland in the early 1960s.

Neil worked with IBM Australia from 1973 to 1992 where he was the Product Manager for the IBM PC through the 1980s. He took an early retirement package and then started his own business offering database development services for small businesses.

He has worked selling, marketing and as a product manager in the computer industry, so has a wide gamut of knowledge that he brings to his position in ExoTech Ltd, as well as being a published author.

Why Today’s Voice Recognition Will Never Work

Categories

Recent Posts