Thomas Kuhn, in his book The Structure of Scientific Revolutions, provides us with a framework for modeling the historic progression of scientific progress. Opposing the prevailing view of scientific progress as an accumulation of accepted facts and theories, Kuhn argued that scientific progress took a more episodic path, in which periods of normal science were interrupted by periods of revolutionary science.
According to Kuhn, when enough anomalies have accrued against a current scientific consensus (some level of error is always inevitable), the field is thrown into a state of crisis, in which new ideas are tried, eventually leading to a paradigm shift. Investment of time and money pure in as the new paradigm proves successful in solving old and new problems. Eventually, this new paradigm may run into intractable problems of its own and the cycle repeats.
It is with this framework in mind that we will dive into the history of Artificial Intelligence. It’s a history littered with so-called “AI Summers” and “AI Winters”, where new ways of thinking spark rampant enthusiasm, followed by rampant pessimism when the lofty promises aren’t kept. It’s the boom and bust cycle that shows up again and again throughout human history.
Birth of Artificial Intelligence
From the Greek myths of Haphaestus’ golden robots, to Rabbi Judah Loew’s Golem, to Mary Shelly’s Frankenstein, the idea of artificial intelligence has been floating around the ether for a while now. However, the actually founding of artificial intelligence as an academic discipline would not occur until the summer of 1956, when a group of scientists came together at Dartmouth College with the goal of ascertaining every aspect of learning so precisely that a machine could be made to simulate it.
More specifically, Artificial intelligence aimed to teach machines six things.
- Reasoning: solving word problems, playing chess, etc.
- Knowledge Representation: ability to model information from the real world.
- Planning: navigate the world efficiently.
- Natural Language Processing: how to understand and communicate human language.
- Perception: sight, sound, touch, smell, taste.
- Generalized Intelligence: all aspects of human behavior, including emotions.
The Dartmouth researchers, armed with their punch-card computers and a scientific framework of goals, got to work. The first paradigm of research began.
The Golden Years (1956-1974)
Optimism was strong. H.A. Simon argued that “machines will be capable, within twenty years, of doing any work a man can do.” Marvin Minsky proclaimed that “within a generation the problem of creating artificial intelligence will substantially be solved.” and later that “from three to eight years we will have a machine with the general intelligence of an average human being.”
The cold war proved a motivating factor in the development of AI systems. Unsurprisingly, translations systems were a top priority. In the mid 1950s, researchers at Georgetown and IBM worked on one such system that could translate Russian sentences into English sentences. Using a demo of 60 Russian sentences, the researchers successfully demonstrated that an accurate system could be devised. Funding and new startups quickly popped up with the aim of turning that demo into a fully functioning general translation system.
This proved harder than thought as computers completely failed at capturing the meaning or semantics of a sentence. Computers were good at word for word translations, but they would fall for anything more. For example: “The spirit is willing, but the flesh is weak.” would turn into “The whisky is strong, but the meat is rotten.” It was obvious to an outside observer that something was being lost in translation.
Another attempt at AI took a search approach. Think of trying out all possible paths in a maze and then backtracking when you hit a dead-end. This quickly ran into computational limitations as the number of possible paths could grow exponentially. Heuristics were developed to eliminate paths from the calculation, but this reduced the ability of the system to generalize.
A micro-world approach was taken at MIT. Think of using toy blocks to represent a city. Systems were developed that could see and stack blocks in this context. For example, we could tell the system to put a red block on a blue block and it would. Like the other approaches, these systems fell apart when taken out of a controlled environment. For example, adding an unknown object to the environment.
Optimism began to turn into pessimism. Researchers became discouraged as their lofty goals proved unreachable. In 1966, the Automated Language Processing Advisor Committee in the US said that the process was so slow that machine translation funding would be cut off for a decade. In 1974, a UK report proclaimed the utter failure of AI to achieve its grandiose objectives. Companies based on the current paradigms failed, funding dried up, and the first AI winter began.
The Second Boom (1980-1987)
It wouldn’t be until the 1980s that a new paradigm would rise up to get people excited again. The idea of teaching machines from the bottom-up had failed, so researches decided to try teaching machines from the top-down. Instead of teaching machines like we teach kids, maybe we could do the opposite and teach a computer something incredibly sophisticated. This approach was called “expert systems”.
These expert systems would restrict themselves to a small domain of specific knowledge and ignore the problem of generalization. The first such system was developed in 1965 to identify chemical compounds from a spectrometer. In 1972, this work was built upon to diagnose infectious blood diseases. Finally, AI seemed to at least produce something useful. If we could just bottle up the expertise of 1000 or so expert professions, we would be on the way to combining them into a general AI.
This idea didn’t fully take off until the 1980s when new companies began to pop-up around the technology. The early successes meant that money was excited to flow back into the space. Governments around the world announced new research grants. One of the more famous startups at the time was Symbolics, which manufactured the hardware needed for building these expert systems.
Unfortunately, this paradigm ran into a wall of its own. Each expert system required a long process of finding an expert, figuring out what they do, programming the set of rules, and then starting over from scratch for the next domain. Any work you did on one expert system didn’t help you build out the next. Furthermore, domain knowledge isn’t static. These programs would almost immediately be obsolete and were expensive to maintain.
In 1987, Symbolics went out of business and with it the excitement around the technology. Another boom and bust cycle had occurred, and another paradigm had failed.
Moore’s Law (1993-2001)
This time around, advances in AI had less to do with a revolutionary new paradigm, and more to do with the improvement in raw computer power. In May 1997, Deep Blue became the first computer chess-playing system to beat a reigning world chess champion. It did so by reverting back to the original idea of search based AI, but with the added benefit of being able to compute 200,000,000 moves per second.
Machine learning, a subset of Artificial Intelligence, began to take off. Combining computing power and data, machine learning aims to find patterns that can be used for prediction. Previous paradigms were concerned with programming the rules that a system would need to follow to act intelligently. The machine learning paradigm flipped that by just giving the system the raw data, and letting it figure out the rules on its own. This paradigm is still going strong today, with the most startling results coming from a subset of machine learning known as deep learning with neural networks.
Deep Learning (2001-present)
Two factors laid the groundwork for our current paradigm: 1) extremely fast and cheap GPUs (thanks gamers) and 2) extremely large data sets (thanks death of privacy). The idea of neural networks have been around since the 1940s, but it wasn’t until the above factors came into play that we started to have large success. The basic idea came from the human brain. We have neurons in the brain, that either fire or don’t based on some activation threshold. Similarly, we can have nodes in a computer system, that fire or don’t based on weights applied to those nodes and an activation threshold. A Neural Network.
We wont get into full history of neural network research and instead skip ahead to 2012, when google combined these methods with their vast stores of data. Ten million YouTube videos were imputed into a thousand computers, and allowed to run on a neural network algorithm for a week. The system was able to learn to recognize over a thousand objects, including cats. More recently, the team at deepmind used a deep neural network and an ensemble of other machine learning techniques, to beat the world champion at a game of GO.
How successful has this paradigm been? We have already gotten to near or better than human performance on reasoning through certain games, diagnosing certain diseases, driving through a city, translating languages, recognizing objects, and more. This is not to say that we are anywhere near the science fiction world of AI where computers have a will of their own. It is to say that computers are getting really good at pattern recognition given enough data.
History plus humanity seems to equal boom and bust cycles. This post came at that general theme from the angle of scientific advancement. A new paradigm arises, early success gives undue confidence, later failures gives undue doubt, somewhere in the middle progress happens.
It looks like we are currently in an up-cycle of confidence in AI advancement. As far as anyone can tell, real breakthroughs are occurring on decades old problems. Money is pouring in, and the biggest companies in the world are making it a priority for their future. However, we should remember the lessons of history before we get too caught up in the party: Progress never goes in a straight line.