Suppose we want to predict the chance that it rains tomorrow using the variables available to us today. To make it simple, let’s say there are two variables that we are tracking: raining or sunny. If it is sunny today, the chance of it being sunny tomorrow are higher, and if it is raining today, the chance of it raining tomorrow are higher. This seems like a reasonable, though simplified, way to model reality. When tomorrow comes, we would just repeat the process.
Similarly, suppose we want to predict the next word that will be typed on a cell phone. We could look at every word that has ever been typed on that phone and find the most frequent one, or we could just look at the most recently typed word and see what word usually comes after it.
In both cases, we are using the most recent information available to us to predict the future and ignoring everything that came before. When the future comes, we transition and repeat the process. This way of modeling the world is known as a Markov Chain and it has some powerful applications.
To understand the origins of Markov Chains, we need to go back to about 400 BCE when Plato was trying to make sense of the world. Plato believed that everything in the world had a hidden perfect form that it aspired too. For example, nature may never produce a perfectly spherical orange, but it was clear to Plato that nature was following a blueprint that could only be expressed in terms of philosophy and mathematics.
The platonic idea of abstract pure forms carried weight up until the 16th century when new thinkers began to embrace the chaotic nature of reality. Bernoulli, the most famous of them, formulated a way to describe the world through probability distributions. Bernoulli proved that the hidden probability of an event could be discovered through experiment. For example, if you flip unfair coin multiple times, as the number of trials increases, the true ratio of heads to tails will be revealed (law of large numbers). Bernoulli stated that everything in the world is governed by these precise ratios. From this we get the Binomial Distributions and the Central Limit Theorem.
Philosophers didn’t like this idea of predetermined statistical fate and challenged Bernoulli’s findings. Pavel Nekrasov, a theologian and mathematician, argued that independence was a necessary condition for the law of large numbers to hold. In other words, he was saying that Bernoulli’s ideas only hold up in an unrealistic world where events didn’t affect other events and should thus be limited to the world of dice and children’s games, not the real world.
Andrey Markov, a Russian mathematician, publicly denounced Nekrasov claim. Through a simple construction, Markov proved that the law of large numbers does in fact apply to both independent and dependent outcomes. To understand this, imagine that we have two coins. One coin (state 1) has a 50/50 chance of landing on heads. The other coin (state 2) has a 80/20 chance of landing on heads. If we are at state 1 and flip heads, we move to state 2. If we are at state 2 and flip tails, we move back to state 1. Otherwise we loop back to the same state.
Even though the probability of each flip depends on the outcome of the previous flip, given enough trials, this system will still converge on a predictable distribution. Markov intended this as a theoretical thought experiment, but the idea of modeling random events in the world by using states and transitions between those states took on a life of its own. The idea grew and spread to other fields and was referred to as a Markov Chain
In the simplest terms, a Markov process is any process where the future depends only on the present. For example, if we are currently at move 5 in a chess game, what happened in moves 1-4 wont help us in predicting the best move 6. The current state of the board is all that we would need to know to work out the future.
For systems that follow this process, we can use something called a Markov Decision Process to solve for the best policy or strategy given our present state. All we would need to do was add action-reward pairs to our Markov chain and then maximize their values. This is the foundation of reinforcement learning.
Markov Chains can be found in a variety of fields from physics to economics. For example, the original PageRank algorithm used by Google was based on a Markov chain in which internet pages are states, and the links to other pages are transitions. If we were randomly assigned to a page on the internet, we could calculate the likelihood of us landing on any other page based solely on the present page. Google used this Markov Chain to rank pages based on their amount of inbound links across the internet.
In reinforcement learning, Markov chains have been used to solve a variety of games from Chess to Go. Q-learning is one version of this where an action-value function can be learned through trial and error that will give the expected reward of an action given the current state of a board. This same idea has been transferred to programing self-driving cars and to designing trading algorithms.
The details and applications of Markov chains are deep and this post only touched the surface. What you can take away from this is that by chaining together simple two-period models, we can create complex multi-period models of the real world. This concept allowed us to overcome the huge computational hurdle of remembering all past states and variables.