THE IMPORTANCE OF CONTEXT IN PRACTICAL AI APPLICATIONS

By looking at a typical AI application, Dr John Yardley, CEO, Threads Software, discusses how AI processes must take account of humans if they are going to replace them.

 

Almost every business is influenced by human sentiment. And despite its embrace of digitisation, the finance industry is no exception. Share prices, currency movements, investment choices are driven not just by economics but by human emotion and the processes the human brain uses to make decisions.  If we are going to replace humans with machines, we must not cherry-pick the bits of human thinking that we can most easily replicate.

The perception of Artificial Intelligence has changed somewhat since Alan Turing coined the term in the 1950s. Turing said if we cannot distinguish a machine’s behaviour from that of a human, then the machine can be said to be intelligent. Nowadays, we seem to be defining AI as computer programs that emulate the human brain rather than mimic human behaviour. Neural networks, for example, are frequently touted as the pinnacle of AI, but if the neural network in your self-driving car causes you to jump a red light,  we would not describe that as intelligent – no matter how sophisticated the algorithm. If the machine is not fooling the human, not only is it not doing the intended job, it could be negatively affecting the human’s view of it.

 

John Yardley

A practical example – Automatic Speech Recognition

Let’s take the application of ASR (or automatic speech recognition, often wrongly described as voice recognition). ASR can loosely be described as getting a computer to transcribe acoustic human speech into digital text. Few would argue that this is an AI task since what we are seeking to do is replace one of two humans involved in some dialogue. If this can be done without alerting the remaining human to the fact that he/she is talking to a machine, then for sure this would meet Alan Turing’s intelligence criteria and, more important, provide potentially enormous benefit.

However, while some parts of the human process for understanding speech can be emulated using ASR, we must accept that the human listener may be using far more information that we are giving the machine. In a physical conversation, humans will be exchanging gestures, looks and body language, not to mention prior familiarity with the topic of conversation, understanding the accent, and the words being used. Presenting a machine with only a pure acoustic conversation is depriving it of a large proportion of the information available to the human. Even in a telephone conversation, humans will have significantly more knowledge than machines.

Many would be surprised just how good computers are at recognising random words and how bad humans are at articulating meaningful sentences. I have shown people ASR transcriptions of their speech and been met with incredulity. Yet when listening to the recording, the speaker is often forced to admit that the computer generally gets far more correct than he or she would give it credit for.  What the speaker and listener forget is how much interpretation they were applying to filter out the “ahs” and “ums” and “rights” and the repeated words, the hesitations, mumblings, and so on, and how much they make use of prior knowledge about each other and the topic discussed. Listeners frequently perceive words that they do not actually hear.  If the same utterances with words in random order (ie meaningless) were transcribed by human and computer, the computer would likely do better.

 

Number crunching is not the solution

The problem we have is that we cannot continually improve the understanding of speech by continually improving the recognition of words. It is like trying to get a car with flat tyres to go faster by putting in a larger engine. The engine is not the critical path and it is cheaper and more effective to pump up the tyres than improve the engine.  So too with speech. In order to behave and understand like a human, the machine needs more information, not better algorithms or more computer power to improve the word recognition.

Many banks would argue that it doesn’t matter if the customer has to repeat an account number 10 times during a telephone banking transaction because it is not costing the bank any more than saying it once.  But here again, the human factors are all-important. It is no consolation that repeating something 10 times might ultimately bring down a customer’s bank charges – eventually the customers will vote with their feet.

 

.. but adding information is.

So what is the solution? The remedy  is that AI must be applied to the problem as a whole, not just to isolated parts. Taking ASR as an example again, by using readily available information contained in email correspondence, speech recognition performance can be improved far more than by improving the ASR algorithm or running it on a bigger computer.  The emails can be used to effectively train the ASR system on the types of words that are exchanged and the subject matter being discussed. In addition, text-based messages can give valuable clues to the grammar being used – the sequences of words, the likely combinations of words, etc.  In short, the context of the discussion.  Being able to share email and voice traffic is already possible, but is not yet being widely applied, and yet could dramatically benefit both financial institutions and their customers by helping a computer better understand the context of a conversation.

Speech recognition is just one example of an AI process that often falls short on expectation. There are many more applications of AI that can be improved by taking a holistic view, not just the bits we like. AI is all about emulating humans, not number crunching. To do this, we need to understand as much as we can about the human process we wish to automate.

Looking at how the human processes information can yield benefits in many areas of IT. For example, some of the largest advances in video data compression came from an understanding of what the human eye can perceive rather than the mathematics of information theory.

In summary, AI is not about building more and more powerful neural networks, it is about convincing a human that the computer is doing as good or better a job than another human would. And to achieve this, we must tap as many information sources that the human has available – which with some lateral thinking are available to the machines too. If this information is not present then we cannot compensate by continuously improving just some parts of the process. We must either find more context or rethink the solution. Until this happens, ASR may be subject to the law of diminishing returns.

 

spot_img

Explore more