Introduction
The purpose of this document is to give the reader, whether technically trained or not, sufficient information about key aspects of computer science and the history of computing to allow the reader to understand the need for ExoTech’s solution.
I have tried to explain the concepts in terms that are familiar to any reader and have included definitions and diagrams to assist in the reader’s understanding of the material.
Algorithms
This word plays an important part in what follows, so we will spend some time on it.
An algorithm can be defined as follows: “a precise series of instructions which specify a sequence of operations to achieve a desired result.”
The word algorithm is derived from the name of the 9th-century Muslim mathematician and astronomer Muhammad ibn Musa al-Khwarizmi.
While I was at Harvard, I was faced with the task of teaching an introductory undergraduate course about computers to 150 non-science majors. I had taken over the course from an extremely well-liked, tenured professor and realized I had to do something to win over these students or else it was going to be a long haul.
So, early in the course, when it came time for the lecture “What is an algorithm?,” I wrote the above definition on the blackboard and said that I would demonstrate what this was all about. I wrote an algorithm for diapering a baby on the blackboard and pulled out a large Raggedy Ann doll and a box of paper diapers.
Here is the algorithm from my 1974 lecture notes:
- Put the baby on a flat surface
- Unfold the diaper
- Put diaper under baby, tabs to back
- Fold diaper
- Attach two back tabs to front of diaper
I then told the students that I would show them what a machine would do with this. I first dropped the doll on the floor, then restarted and held the doll by the neck against the wall, then tried putting the baby face down on the table. I kept adding adjectives to the first line of the algorithm and restarting until the machine got it right.
Then, I unfolded the diaper, and unfolded it, and unfolded it until there was a mess of paper on the floor. Then, I diapered one of the doll’s legs. Then, I diapered the doll’s head. Then I diapered the doll correctly, but with the paper side facing outward.
Finally, I did it correctly. By this time, the students were in hysterics, but they got the point – you had better be precise when writing algorithms.
Many other examples of algorithms can be found. A recipe is an algorithm, instructions for filling out a tax form constitute an algorithm, and even the choreography for a ballet could be considered an algorithm which gets an aesthetic result. In fact, a book was even written about how algorithms can be applied to everyday life – Algorithms to Live By: The Computer Science of Human Decisions.
Computers
Simply put, computers execute algorithms. And, when an algorithm is stored in a computer it is called a program.
There are three abilities which make computers powerful when executing algorithms:
1. The ability to choose what to do next based upon data.
For example, a program for computing income taxes would decide which tax table to use based on the filing status (single, married, etc.) of the person for whom the tax is being computed.
2. The ability to execute the same instructions repeatedly until a specific condition is true.
For example, when alphabetizing a list of names, the program would keep executing instructions to rearrange the names until it determined that the names were in alphabetical order.
3. The ability to encode and represent any kind of information.
Internally, computers represent information in terms of zeros and ones.
For example, consider this string of sixteen zeros and ones:
0100100001000001
This can be used to represent two letters (H and A) per the standard encoding that is used for letters and numbers by dividing the string into two parts of 8 digits each.
It could be used to represent two of 256 colors, again by dividing the string into two parts of 8 digits each:
Or all sixteen digits could be used to represent the number 18,497, using the number system that computers use.
Looking at the above, there is an additional factor about algorithms that was not emphasized in the baby diapering lecture: algorithms operate on data.
Both are needed for a computer to do anything at all.
Programs and Data
In Webster’s 1828 dictionary, the word computer is defined as, “One who computes; a reckoner; a calculator.”
The first computers that were built did the same thing – they did calculations. They just did them a lot faster than their human counterparts.
The first of these computers (the Harvard Mark One) was completed in 1944 and was used in the war effort during the last part of World War II. The second computer was the ENIAC, which was built at the University of Pennsylvania. It was used for over 10 years by the US Army for artillery firing tables.
On the Mark One, programs were placed onto paper tape (with holes in the tape used to encode the program). For the ENIAC, programs were wired into the machine using cables that often took days to setup.
For the Mark One, data was stored in the same way it was stored in old-fashioned electro-mechanical desk calculators. For the ENIAC, data was stored using 20,000 vacuum tubes and used so much electricity that supposedly the lights in Philadelphia dimmed whenever it was turned on.
In short, programs were considered distinct from data and different mediums were used for each.
But, all of this was changed by mathematician John von Neumann, who envisioned a computer in which both a program and its data were stored electronically in the same place, as shown in the diagram on the right.
This design avoided the complex and cumbersome programming involved with the Mark One and ENIAC.
But, as soon as computers were built using this model, another difficulty was encountered. The computer was much faster than the devices used to input information into it and get information out of it. So, the computer spent most of its time waiting with nothing to do.
As computers were extremely expensive, this was unacceptable. So, a more sophisticated approach was taken by the mid-’60s, where the computer could have multiple pairs of program-data stored in it simultaneously and whenever one program-data pair was doing input or output, the computer could work on another program, as shown in this picture:
Such computers were surrounded by walls of reel-to-reel tape drives that were used to get the program-data pairs in and out of the computer as quickly as possible. Computer users would submit their programs and data on punched cards which would then be put on tapes to be submitted to the computer. Working on a new program was a frustrating and time-consuming experience which involved submitting a “job” and waiting hours (or a day) to get the output, figuring out what went wrong (as usually something was wrong) and then repeating the cycle again.
As programming had become the bottleneck to production, the computer evolved again, and time-sharing computers were built.
With such a computer, many users could connect directly to it concurrently and the computer would store all their program-data pairs and give a little bit of computer time to each of them.
Since the computer terminals of the day were extremely slow (printing 10 characters a second on reels of paper) a single computer could manage to serve many users at one time:
And because memory was expensive, users who were using the same program would share the program (which was not modifiable) and only their data was kept separate, as shown in the picture below for a “simple word processor.”
But as computers became cheaper, it became possible to give each user his own computer, which could contain a single program-data pair:
And, finally, as computers became even cheaper and faster, it was possible to have each of these computers contain multiple program-data pairs, where a single program could be used with separate instances of data (as in a time-sharing system), but now it was all being used by a single person:
And thus, we have arrived at current-day Windows® and Apple® computers.
But despite all this evolution, the program-data pair model of computing as used today can be traced back to the first computers that were built 75 years ago, and the model has essentially remained unchanged since then.
Paradigms
Paradigm: a worldview underlying the theories and methodology of a particular scientific subject. (Google Dictionary™)
Given the history of the program-data model of computing, it certainly fits the definition of a paradigm – it is the paradigm of how computing is done.
But paradigms can change over time. When a paradigm changes it is called a paradigm shift:
Paradigm shift: a fundamental change in approach or underlying assumptions. (Google Dictionary)
While the term “paradigm” originated in the 15th century, the concept of a “paradigm shift” is much more recent. It was coined by Berkeley History and Philosophy of Science Professor Thomas E. Kuhn in his 1962 book The Structure of Scientific Revolutions.
As the title implies, old paradigms do not die easily. Things can get messy.
Some classic examples are as follows:
- There was a paradigm shift from the viewpoint that the sun revolves around the earth (Ptolemy, 100-170 AD) to the viewpoint that the earth revolves around the sun (Copernicus, 1473-1543). In 1609, using his newly invented telescope, Galileo did experiments which validated Copernicus’s work, but he was subsequently suppressed by the Catholic Church which viewed the Ptolemaic viewpoint as Church dogma.
- There was a paradigm shift from the viewpoint that the body is controlled by several different liquids (Galen,129-210 AD) to the viewpoint that blood circulates through the body via action of the heart (William Harvey, 1578-1657). Many scholars stated that they would rather “err with Galen” than support Harvey’s new conclusions, perhaps because they had a vested interest in keeping the status quo.
Paradigm shifts can also apply to technology, e.g. vinyl records to 8-track tapes to cassettes to CDs to MP3 downloads onto cell phones. Such shifts are caused by disruptive technologies:
“Disruptive technology is an innovation that significantly alters the way that consumers, industries, or businesses operate. A disruptive technology sweeps away the systems or habits it replaces because it has attributes that are recognizably superior.
“Recent disruptive technology examples include e-commerce, online news sites, ride-sharing apps, and GPS systems.
“In their own times, the automobile, electricity service, and television were disruptive technologies.”
Having covered Algorithms, Computers, Program-Data pairs, and Paradigm shifts, we have one more topic to cover before we get to the key part of this presentation.
Artificial Intelligence
Alan Turing (1912-1954) was an English mathematician.
In recent years, Turing has become widely known because of the 2014 movie The Imitation Game starring Benedict Cumberbatch and Kiera Knightly. It describes Turing’s work during World War II to crack the Nazi’s encryption technology. Estimates vary, but Turing’s work is estimated to have reduced the length of the war between one and two years and saved up to 14 million lives.
Turing is also known as “the father of artificial intelligence” because of an article he wrote in 1950 entitled “Computing Machinery and Intelligence.”
In this article he defined “the imitation game,” which became known as “Turing’s Test,” in which a person (the “interrogator”) has two computer terminals in front of him. One terminal is attached to a computer and the other is attached to a terminal operated by another person.
The interrogator asks questions using both terminals and gets answers from each one. If, based on the answers, the interrogator cannot determine which terminal is attached to the computer and which is attached to the other person, then Turing said that the computer could be considered to be intelligent.
But humans are easily fooled.
In the mid-’60s, MIT Professor Joseph Weizenbaum wrote a program called ELIZA that had little knowledge but could converse with the user. The name Eliza was chosen based on the character Eliza Doolittle in “My Fair Lady,” who is passed off as a duchess by Professor Higgins, even though she is a poor flower girl. The point being made by Weizenbaum was that the computer did not know very much (just like Eliza Doolittle) but could be made to appear as if it was in communication with the user.
Per Wikipedia: “Weizenbaum’s own secretary reportedly asked Weizenbaum to leave the room so that she and ELIZA could have a real conversation. Weizenbaum was surprised by this, later writing: ‘I had not realized … that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people.'”
So, perhaps Turing’s test is unworkable because it is subject to the delusions of normal people.
But, if we discard Turing’s test, what do we use? Peter Warren, CEO of ExoTech®,provided the answer with a different test, which he calls “The Warren Do Test.”
The Warren Do Test
Per Peter: “what the human and the computer DO in response to an order should be interchangeable.” In other words, the target is not to make the computer think, but to make the computer get products equivalent to those produced by a human.
This is best explained by a diagram:
A boss wants to get a certain product out of the computer. There are two ways he can do this. He can give an order to his secretary, who performs a sequence of actions (S-Actions) with a computer to get the product, or he can give the same order to an AI, which will perform a similar sequence of actions (AI-Actions) using the computer and produce the same product.
In mathematical terms, if the secretary and the AI obtain the same product the diagram is said to “commute,” meaning that either of the two blue paths produces the same result as shown by the red line.
And, per Peter’s test, the AI would be considered to be intelligent in this case.
But let us consider the situation with the current program-data paradigm and the secretary and the AI using a Windows computer with Word, Excel®, Outlook®, etc.
The secretary would be able to produce the product because she is completely external to the computer system and can move information between systems, use pencil and paper as needed, and re-input information as needed to get the product wanted by the boss.
But the AI is not going to get the product. There is no way to easily get randomly chosen information out of diverse systems in unforeseen ways. Existing systems are not built to allow this to occur because all of their possible connections are rigidly predefined. All the systems would somehow need to be modified to integrate the AI, leading to an unworkable mess, if it could be built at all.
Here is an example of something I am doing daily which illustrates this: I assist a family member by tracking her daughter’s spending on her Amex® card so I can let her daughter know each night how much she can charge until her next allowance and how much this amounts to on a daily basis.
But Amex does not provide a total for “pending charges for individual cardmember.” For some reason, it is Amex’s viewpoint that this number should not be provided, although other credit card companies do provide that number.
So, to do the calculation, I must get the total posted charges from the website and write that down on paper.
Then, I copy/paste the pending charges into a text file.
Then, I run a program that I wrote and input the spending limit, days left in the period, posted charges and the name of the text file with the pending charges.
The output is a text file containing a text message addressed to my niece with the total amount available till the end of the period and amount available per day. I then copy/paste this into Google’s “messages for web” program to send it as a text message to her.
There is simply no way to get a computer program to try to accomplish all of the above with existing programs by coordinating the efforts of the different programs involved. The programs are simply not built in a way that would allow this to occur.
In other words, the diagram is not going to “commute.”
Conclusion: The existing program-data paradigm is unlikely to produce an AI which can get products using existing software systems because it requires obtaining randomly chosen information out of diverse existing systems in unforeseen ways. A paradigm shift will be required.
Turing, Again
One could argue with the conclusion at the end of the last section and say that it is not the program-data paradigm that is the problem, but it could be some inherent limitation in computer hardware that would prevent an AI from being constructed.
Well, that would be a nice try, but unfortunately it isn’t true. We owe the response to this argument to Alan Turing.
In 1937, before the Mark One or ENIAC were built, Turing invented a theoretical computer that became known as a “Turing Machine.”
The computer is dead simple in its operation. It has a tape (potentially infinite in both directions) which is divided up into squares. Each square can hold a symbol (such as a letter or number). The tape constitutes the “data.”
There is a read/write head positioned at one square. The instructions contained in the head control the operation of the computer.
An instruction looks at the symbol at the tape head, and based on the symbol, it decides what symbol to write on the tape (replacing what is in the square), whether to move the tape left or right, and what instruction to perform next. Or it can choose to halt the machine because it has completed its work.
While Turing was working on his theoretical machine, other mathematicians were working on alternate and possibly more powerful ways to express such computations. It was a case of attempting “Anything you can do, (I can do better)” but with no rifles involved.
But what occurred was quite astounding – all of these alternate ways turned out to be equivalent in the sense that none was more powerful than the other. In other words, anything one could do all the others could do as well.
This result led researchers to believe that a more powerful way to express computations could never be found and they named this hypothesis the “Church-Turing Hypothesis.”
Any modern computer can be used to construct a Turing Machine. Thus, any modern computer can compute whatever can be envisioned to be computed, per the hypothesis.
Although one can still conjecture that such a more powerful way could be found, it is unlikely because of the Turing Machine’s ability to imitate any other method of computation. In other words, if you can express it as an algorithm in some notation, the Turing Machine can be programmed to compute it.
Summary
Here are the key takeaways from this presentation:
- Algorithms are a precise series of instructions that specify a sequence of operations to achieve a desired result.
- Computer programs are algorithms that operate on data within a computer.
- The paradigm of a program that is separate from but operates on data was introduced in the ’40s and has been sliced and diced various ways but remains essentially unchanged 75 years later.
- Turing’s test for AI relies on a human observer who can be deceived.
- Peter Warren’s “Do Test” asks whether an AI can get a product as opposed to whether it can think.
- The probability is almost non-existent that an AI can be built which will interact with existing software in order to get products equivalent to what a human can obtain with existing software.
- A paradigm shift will be needed to allow an AI to pass the “Warren Do Test.”
- There is no inherent limitation in existing computer hardware which would prevent #7 from occurring.
Dr. Charles J. Prenner
Special Projects Executive, ExoTech Ltd
ExoTech and ExoBrain are trademarks owned by ExoTech Ltd, Bermuda. All other trademarks are the property of their respective owners. Windows, Excel and Outlook are registered trademarks of Microsoft® Corporation. Google Dictionary is a trademark of Google LLC. Apple is a registered trademark of Apple Inc. Amex is a registered trademark of American Express Company.