Programming #13: Recurrent Neural Networks

Last time, we explored how a Convolutional Neural Network could be trained to recognize and classify patterns in an image. With a slight modification, a CNN could also be trained to generate new images. But what if we were given a series of frames in an animation and wanted our CNN to predict the next frame? We could feed it a bunch of two frame pairs and see if it could learn that after frame ‘a’ usually came frame ‘b’ but this wouldn’t work that great.

What we really need is a neural network that is able to learn from longer sequences of data. For example, if all the previous frames show a ball flying in an arc, the neural network might be able to lean how quickly the ball is moving in each subsequent time period and make a prediction on the next frame based off that. This is where Recurrent Neural Networks (RNN) come in.

Today, we’ll be conceptualizing and exploring RNN’s by building a deep neural network that functions as part of an end-to-end machine translation pipeline. Our completed pipeline will accept English text as input and return the French translation as output. You can follow along with the code here.

Continue reading “Programming #13: Recurrent Neural Networks”


Programming #12: Convolutional Neural Networks

Last time, we explored how a simple MLP neural network could be used to classify the MNIST dataset. Today, we will work on a messier problem. We will use a modified version of the Stanford dogs dataset to train a neural network that can classify dog breeds. Since inter-class variations are small, and an obscure detail could be the deciding factor, we will need a model that can capture more detail. This is where convolutional neural networks (CNN) come in.

As always, we will start by explaining some of the high-level concepts. You can follow along with the code here.

Continue reading “Programming #12: Convolutional Neural Networks”

Programming #11: Deep Neural Networks

In this post, we will attempt to conceptualize Deep Neural Networks (DNN) and apply one to a common problem. We’ll train a version of a DNN called a Multilayer Perceptron (or vanilla network) to classify images from the MNIST database. The MNIST database contains 70,000 handwritten digits from 0-9 and is one of the most famous datasets in machine learning. If all this sounds confusing so far, don’t worry we’ll start at the beginning.

If you want to follow along with the code, the notebook can be found here.

Continue reading “Programming #11: Deep Neural Networks”

History #17: Artificial Intelligence

Thomas Kuhn, in his book The Structure of Scientific Revolutions, provides us with a framework for modeling the historic progression of scientific progress. Opposing the prevailing view of scientific progress as an accumulation of accepted facts and theories, Kuhn argued that scientific progress took a more episodic path, in which periods of normal science were interrupted by periods of revolutionary science.

According to Kuhn, when enough anomalies have accrued against a current scientific consensus (some level of error is always inevitable), the field is thrown into a state of crisis, in which new ideas are tried, eventually leading to a paradigm shift. Investment of time and money pure in as the new paradigm proves successful in solving old and new problems. Eventually, this new paradigm may run into intractable problems of its own and the cycle repeats.

It is with this framework in mind that we will dive into the history of Artificial Intelligence. It’s a history littered with so-called “AI Summers” and “AI Winters”, where new ways of thinking spark rampant enthusiasm, followed by rampant pessimism when the lofty promises aren’t kept. It’s the boom and bust cycle that shows up again and again throughout human history.

Continue reading “History #17: Artificial Intelligence”