Fisher Price “My First Neural Net”

Introduction

Remember the Fisher Price “My First…” toys from the 70s and 80s? They were super simple versions of common toys or household objects.

Let’s build a Fisher Price My First Neural Net, which is the simplest possible piece of software that qualifies as a full-fledged neural net. Even this ultra-basic neural net will be capable of actual classification work, just like an industrial-grade neural net.

Although no experience building neural nets is required, this project will make a lot more sense if you have some understanding of the basic concepts of neural nets:

nodes
weighted connections between nodes
hidden layers
output layers
prediction through forward-propagation
training through back-propagation

If you need an introduction or refresher to any of these, see the first section of Wikipedia’s Artificial Neural Net entry.

You should also be familiar with the basics of Python, and with writing and running Python scripts in an editor and the command line, in iPython, or in a Jupyter notebook.

This post leads you through three tasks:

Set up your development environment
Build the simplest neural net you possibly can
Make the neural net more accurate by tweaking it in very minor ways

The code for making our Fisher Price My First Neural Net is spread throughout this post, but it’s also presented at the end of sections 2 and 3 for easy copying into your own environment.

Step 1: Set up your development environment

I assume that you’re starting from a reasonably clean Linux or macOS machine. I haven’t tested these steps on Windows, though they will probably work with only slight modification.

This step just collects the lego blocks that can be snapped together to make a neural net. We won’t actually assemble those lego blocks until Step 2 below.

a. Install Miniconda

Thanks to the magic of the Miniconda package manager for Python, setting up your development environment is trivial.

Use the official instructions to install Miniconda.

You could probably use Miniconda’s more full-featured cousin Anaconda instead, but I’m not as familiar with that tool.

b. Make a new Python environment

In a terminal, use Miniconda to make a new Python environment to play around in, so you don’t corrupt the rest of your system.

conda create -n fisher-price
conda activate fisher-price

c. Install the Keras neural net library

We’ll build our neural net using the Python-based Keras library, which is a user-friendly wrapper on top of Google’s TensorFlow library. As of early 2019, Keras is probably the most accessible neural net package for newbies. It’s mature, robust, and has reasonable documentation. Install it with a single command in the terminal.

conda install keras

Answer the prompts, watch Miniconda install 20 or so dependencies, and you’re done!

Step 2: Build the simplest neural net you possibly can

First, an important note on the accuracy rates you’ll see in this post. When you create a brand new neural net, all of its weights are set to random values. As you train it, those weights change and ultimately converge on values that give the neural net its predictive power. But because different neural net instances are initialized with different random weights, even if we train both of them with the same data, they’ll end up with slightly different final weights and will have slightly different predictive accuracy. That’s just the nature of the beast when it comes to neural nets.

This means that if you run this code in your own environment, you can expect similar, but not identical accuracy.

Now, let’s see how quickly we can make a real neural net with a Python script. Thanks to Keras, this takes remarkably little code. Let’s walk through each line.

At the very top of a new Python script, load Keras.

import keras

Next, load some data for our neural net to train on. The most common “hello world” dataset for learning about neural nets is MNIST, which comprises grayscale images of handwritten digits. A neural net can be trained to categorize each picture as a handwritten 0, 1, 2, or whatever. The MNIST dataset gives us 60,000 images to train the neural net, and 10,000 images to test the neural net’s accuracy.

Let’s try something slightly different by using the Fashion MNIST dataset instead. This is exactly the same as MNIST in format (same number of pictures, same size of pictures, same grayscale), but consists of articles of clothing instead of digits. A neural net trained on Fashion MNIST learns to identify what category of clothing (t-shirt, dress, handbag, etc.) each picture belongs to.

Keras provides a helper method for importing the Fashion MNIST training data. We’re actually importing four separate sets of data with one command:

images to train the neural net on
category labels for those training images (“t-shirt”, “dress”, etc.)
images to test the neural net with once it’s been trained
category labels for those test images

(train_images, train_labels), (test_images, test_labels) = keras.datasets.fashion_mnist.load_data()

Python represents the training data as a three-dimensional matrix, of size 60,000 x 28 x 28. To feed this data to our neural net, we have to first convert it into a two-dimensional matrix, of size 60,000 x 784 (note that 784 = 28 x 28). The technical reasons that we need to “reshape” the data are unimportant for this post — it’s just easier for Keras to process that way.

We need to do exactly the same thing for the 10,000 images that we’ll use for testing the trained neural net.

train_images = train_images.reshape(60000, 784)
test_images = test_images.reshape(10000, 784)

Now let’s create the model. This is the neural net itself, with nodes arranged in layers and weighted interconnections between various nodes.

We’ll use Keras to make a Sequential model, which is the simplest kind of neural net. In a Sequential model, signals flow from input nodes, then to one or more layers of hidden nodes, and finally to output nodes. This model doesn’t include any fancy bells or whistles: it’s a plain vanilla neural net architecture.

neural_net = keras.models.Sequential()

Time to add some layers of nodes to our network.

First, let’s add a hidden layer of 100 nodes.

neural_net.add(keras.layers.Dense(100, input_dim=784))

There’s a lot going on in this line, so let’s step through it.

Dense is a type of layer that connects every node that it contains to every node in the next layer. This is the simplest and probably most common kind of layer. It’s the basic building block of most neural nets.

100 specifies the number of nodes in this layer. I picked this number more or less at random; we’ll experiment with tweaking it later.

input_dim specifies how how many pieces of data will feed into the neural net. Generally, only the first layer in a neural net uses this parameter. The images that serve as input to our neural net are 28 pixels by 28 pixels. Keras automatically flattens this 2-dimensional image data into a 1-dimensional list of 28 x 28, or 784 numbers, and feeds that list to each of the 100 nodes in this layer. Each of these 784 number ranges from 0-255, representing grayscale values.

neural_net.add(keras.layers.Dense(10, activation='softmax'))

This is the second and last layer in our network, so will serve as its output layer. Output layers are traditionally Dense. I’m not sure why this is so, considering that there are no more layers after it for it to connect to.

This layer has 10 nodes, each representing one category label.

The activation parameter scales the weights of the nodes in this layer so they all add up to 1. This means that if the neural net is thinks that a given image is a dress, the output node that represents “dress” might have an activation of 0.9, whereas the output nodes that represent “handbag” and “t-shirt” might each have an activation of 0.05.

Next we “compile” the network, which you can think of as converting it from a blueprint of a network into a runnable network. We also pass a few “hyperparameters” to the network, which control how it operates.

neural_net.compile(optimizer='adam',
                   loss='sparse_categorical_crossentropy',
                   metrics=['accuracy'])

The required optimizer parameter tells the network what algorithm to use when training. “Adam” is a good general-purpose algorithm that’s usually a good place to start.

The required loss parameter tells the network how to measure how accurate it is, which is not only makes it possible for us to understand how successful the training has been, but also helps with the training process itslef. The exact details are unimportant, but “sparse categorical cross-entropy” works well for this type of classification task.

The final parameter, metrics, is optional. The way we’re using it here gives us ongoing reports of how the network’s accuracy improves as it is trained.

Now that we’ve specified and compiled the network, we need to train it using the Fashion MNIST training data. Once again, Keras makes this simple.

Although a modern CPU should be able to run this in less than a minute, this step could take much longer (hours or even days) if we had a more complicated neural net or vastly more training data.

neural_net.fit(train_images, train_labels)

Now it’s time for the big payoff! Let’s feed test data to the trained neural net and see how accurately it classifies fashion images it’s never seen before.

print(neural_net.evaluate(test_images, test_labels))

10000/10000 [==============================] - 0s 25us/step
[14.50628568725586, 0.1]

That output is cryptic, but the important part is the second number in the last line of output: 0.1. That means that our trained neural net correctly identified… 10% of the handwritten digits in the test set.

The bad news: that’s pathetic.

The good news: there are lots of simple ways to modify our Fisher Price neural net to improve its accuracy. Let’s see how high we can get the accuracy with some simple tweaks.

But before we try to improve our accuracy, let’s catch our breath and take a look at all the code we have so far, all in one place.

# load the neural net library
import keras

# load the images and category labels we'll use to train the neural net, and then to test its accuracy
(train_images, train_labels), (test_images, test_labels) = keras.datasets.fashion_mnist.load_data()

# flatten the images from two-dimensional arrays into one-dimensional arrays
train_images = train_images.reshape(60000, 784)
test_images = test_images.reshape(10000, 784)

# specify the neural net's architecture: one hidden layer and one output layer
neural_net = keras.models.Sequential()
neural_net.add(keras.layers.Dense(100, input_dim=784))
neural_net.add(keras.layers.Dense(10, activation='softmax'))

# convert the neural net blueprint into a runnable neural net
neural_net.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# using our dataset of training images, train the neural net by adjusting the weights between its connections
neural_net.fit(train_images, train_labels)

# using our dataset of test images, check our neural net's accuracy
print(neural_net.evaluate(test_images, test_labels))

Step 3: Make the neural net more accurate by tweaking it in very minor ways

When we trained the neural net before, we trained it on a single epoch. That means that we fed it each of the 60,000 images in the training set just once. But what would we happen if we trained it on the entire training set multiples times? Let’s try specifying five epochs (i.e, feed the entire training set into the neural net five times) and see what happens.

For technical reasons that are unimportant for our purposes, neural nets often improve dramatically when each layer has something called an activation function, which modifies the activation level of each node.

Remember that our neural net has two layers. We’ve already specified the softmax activation function for the output layer, but we didn’t specify an activation function for the hidden layer. So let’s add one to the hidden layer and see if that gives us better results. We’ll start with tanh, which is a common activation function. I’ve highlighted the changed line below.

import keras
(train_images, train_labels), (test_images, test_labels) = keras.datasets.fashion_mnist.load_data()
train_images = train_images.reshape(60000, 784)
test_images = test_images.reshape(10000, 784)
neural_net = keras.models.Sequential()
neural_net.add(keras.layers.Dense(100, input_dim=784, activation='tanh'))
neural_net.add(keras.layers.Dense(10, activation='softmax'))
neural_net.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
neural_net.fit(train_images, train_labels)
print(neural_net.evaluate(test_images, test_labels))

Epoch 1/1
60000/60000 [==============================] - 5s 78us/step - loss: 1.0321 - acc: 0.6291
10000/10000 [==============================] - 0s 28us/step
[0.9260841958999634, 0.6585]

Wow. Just adding the tanh activation function to the hidden layer catapulted accuracy to 66%!

Here’s something else to try: neural nets work best when each piece of input data (in this case, the grayscale value of each pixel in an image) ranges from 0 to 1.0. Currently our image data ranges from 0 to 255. Let’s scale our data (for both the training and test images) so it fits in the 0 – 1.0 range, and see how that affects our accuracy.

import keras
(train_images, train_labels), (test_images, test_labels) = keras.datasets.fashion_mnist.load_data()
train_images = train_images / 255.0
test_images = test_images / 255.0
train_images = train_images.reshape(60000, 784)
test_images = test_images.reshape(10000, 784)
neural_net = keras.models.Sequential()
neural_net.add(keras.layers.Dense(100, input_dim=784, activation='tanh'))
neural_net.add(keras.layers.Dense(10, activation='softmax'))
neural_net.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
neural_net.fit(train_images, train_labels)
print(neural_net.evaluate(test_images, test_labels))

Epoch 1/1 60000/60000 [==============================] - 5s 82us/step - loss: 0.4803 - acc: 0.8281 
10000/10000 [==============================] - 0s 31us/step [0.4426993363380432, 0.8401]

Better still: we’re up to 84%!

Next, let’s see what happens if we train the system on the training images not just once, but multiple times. Since training adjusts the weights of the connections only a little bit with batch of training data, maybe it will continue to improve its accuracy if we just let it take several passes at the same set of training data.

We do this by using the epochs parameter during training. One epoch is a single pass through all the training data, so specifying five epochs means we’ll train the neural net on the same training data five times.

import keras
(train_images, train_labels), (test_images, test_labels) = keras.datasets.fashion_mnist.load_data()
train_images = train_images / 255.0
test_images = test_images / 255.0
train_images = train_images.reshape(60000, 784)
test_images = test_images.reshape(10000, 784)
neural_net = keras.models.Sequential()
neural_net.add(keras.layers.Dense(100, input_dim=784, activation='tanh'))
neural_net.add(keras.layers.Dense(10, activation='softmax'))
neural_net.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
neural_net.fit(train_images, train_labels, epochs=5)
print(neural_net.evaluate(test_images, test_labels))

Epoch 1/5 60000/60000 [==============================] - 5s 83us/step - loss: 0.4766 - acc: 0.8290 
Epoch 2/5 60000/60000 [==============================] - 5s 79us/step - loss: 0.3699 - acc: 0.8661 
Epoch 3/5 60000/60000 [==============================] - 5s 80us/step - loss: 0.3374 - acc: 0.8765 
Epoch 4/5 60000/60000 [==============================] - 5s 81us/step - loss: 0.3138 - acc: 0.8854 
Epoch 5/5 60000/60000 [==============================] - 5s 80us/step - loss: 0.2966 - acc: 0.8910 
10000/10000 [==============================] - 0s 33us/step 
[0.36925499482154844, 0.8619]

More improvement: we’re at 86%!

Note that the higher your accuracy gets, the harder it becomes to eke out even better accuracy, and the more important even tiny gains become. That’s why a 2% improvement from 85% to 87% is nothing to sneeze at.

Let’s declare victory and stop here.

In the interest of science, I did try a few other tweaks, but none of them improved the accuracy of our Fisher Price neural net above what we’ve already achieved:

more nodes in the hidden layer
more hidden layers
different activation functions
more training epochs
different training batch sizes (where batch size is the number of images we feed through the neural net during training, before we adjust the weights of the neural net’s connections — it just takes many batches to make our way through a whole epoch)

I suspect we could get even better accuracy with a more complicated neural net architecture, but that’s a topic for another blog post.

Conclusion

Think about what you just did: with a dozen lines of code you created a neural net that categorize pictures of clothing with 86% accuracy. This would have been unthinkable just 10 years ago.

Although none of the concepts involved with neural nets are all that difficult, there are a bewildering number of ways you can build and tune a neural net to perform optimally for your particular categorization or prediction task. One unusual aspect of neural net engineering is that it’s as much art as it is science. In many cases, we don’t fully understand why certain neural net architectures or tunings perform better than others. The standard neural net development workflow consists of starting with a good general purpose architecture and set of hyperparamters, and then experimenting with variations as you watch the system’s accuracy move up and down. Once you hit on a combination that gives you the accuracy you need, you’re done. It’s hard to think of another branch of computer science that works in exactly this way (although maybe performance optimization comes close).

If you want to explore further, I recommend the official Keras documentation and tutorials, or the excellent book Deep Learning with Python by Francois Chollet, the lead designer of Keras.