Lab: Neural Networks

CSC261 - Artificial Intelligence - Weinman

Summary:: Exploring network topologies and training hyperparameters, we compete to find the best neural network for digit recognition by using the backpropagation algorithm to learn neural network parameters.

Preparation

Change to the directory containing the the program and data for this lab

$ cd ~weinman/courses/CSC261/code/nnc

Background

The Semeion Handwritten Digit Data Set consists of 1,593 instances of handwritten digits 0-9. These images are scaled to 16×16 and converted to binary (boolean), so the inputs are 256 dimensional feature vectors, while the output is a "one-hot" vector of length 10, indicating which digit the input corresponds to.

These data have been broken into a train, validation, and test set. You can use the train and validation sets. You don't get to see the test set until the end. The files are as follows

train-data.txt: The file containing the 256 inputs representing each of the 398 training images.
train-labels.txt: The file containing the 10 target outputs for each of the 398 training labels.
validate-data.txt: The file containing the 256 inputs representing each of the 398 validation images.
validate-labels.txt: The file containing the 10 target outputs for each of the 398 validation labels.

Note that since the entire data set is cut in half, and then half again to produce your training set, you only have about 400 training examples. That is not very much data, so you will need to be very careful about not overfitting.

You have until the last few minutes of class to compete to produce the best scoring network. Things to explore:

Learning rate
Convergence threshold
Network topology
- Number of hidden layers (zero up to ....)
- Size of each hidden layer (one up to ...)
Schedule for learning rate

May the best net win!

The next sections tell you how to train, test, continue training, and submit your network(s).

Tips

Vary exponentially.: If you're exploring hidden layer sizes, make dramatic changes to see performance differences. (I like powers of two.)
Keep organized!: You might expore lots of possibilities. You need a good naming scheme and strategy for tracking what you have tried and how it worked.
Submit often.: The leaderboard will update regularly. You'll want to know whether you're "running with the pack" or might need to change strategy.
Consider cross-validation.: Both the nominal train and validation sets are the same size. You could swap their train/validation roles to calculate a more stable average of performance (though doing so will take more time of course).
Parallelize search.: Your host CPUs have four cores, which means you can run four training sessions simultaneously.

Training

The first training program creates, trains (using batch gradient descent), and saves a network trained using the parameters you specify:

: ./train inputs-file targets-file network-file step-size tolerance [dim1 ... dimN]

Here is what these parameters mean.

inputs-file: The path to the file containing the input feature vectors for all the examples to be trained on
targets-file: The path to the file containing the target output vectors (i.e., the training labels) for all the examples to be trained on
network-file: The file name to save the learned network in. (Any existing file by the same name will be overwritten.)
step-size: The size of the step to take in the batch gradient descent algorithm. "alpha" in the text.
tolerance: The gradient descent algorithm will stop when the L₂ loss function being minimized differs by less than this amount (on average per example) after one step of weight updates.
dim1: (Optional) The number of units in the first hidden layer
...
dimN: (Optional) The number of units in the last hidden layer (just before the outputs)

All layers are fully connected using sigmoid activation functions. Providing no dim arguments would construct a network with no hidden layers (i.e., logistic regression).

Examples

To begin training a logistic regression model on the digits training set, one might use the command

: ./train train-data.txt train-labels.txt ~/net-lr.dat .001 .001

This reports some information about the training data and network structure before reporting the epoch and loss every so often. It concludes with the per-example average loss and a classification error rate as measured by the arg max of the output ndoes.

To begin training a model with a single-hidden layer of 128 units on the digits training set, one might use the command

: ./train train-data.txt train-labels.txt ~/net-128.dat .001 .001 128

There are far more parameters in this network, so of course it will take longer to run each epoch.

To begin training a model with two hidden layers, one of size 64 and the next of size 32, one might use the command

: ./train train-data.txt train-labels.txt ~/net-64-32.dat .01 .0001 64 32

Testing

While the training program reports the average loss and classification error rate (a number between zero and one) on the training data, it's usually nice to know how your resulting net performs on a held-out validation set. The testing program allows you to do this:

: ./test inputs-file targets-file network-file

The parameters are the same as above, except this time the net stored in network-file is loaded and used for subsequent testing rounds (forward propagation only).

Note that test reports the total loss, rather than the average loss. along with the classification error rate.

Examples

To test the examples above on the held out validation data, one might run the following commands:

: ./test validate-data.txt validate-labels.txt ~/net-lr.dat
./test validate-data.txt validate-labels.txt ~/net-128.dat
./test validate-data.txt validate-labels.txt ~/net-64-32.dat

Continued Testing

Rather than start training from scratch with random weights, sometimes one may want to continue training a previously trained network. We might want to use a different step size or continue with the same step size but a lower convergence tolerance so that training will continue for longer. The following program allows you to do this continued training:

: ./train2 inputs-file targets-file network-start-file step-size tolerance network-out-file

Note that this program requires no network dimensions. That's because it gets them from the network-start-file. When the program is done, it save the results in network-out-file.

Examples

To continue training the logistic regression model above with a finer convergence threshold:

: ./train2 train-data.txt train-labels.txt ~/net-lr.dat .001 1e-5 ~/net-lr-m5.dat

Note that typing the command line argument 1e-5 means 1×10^-5 and is equivalent to typing 0.00001 (not to mention it's much easier to interpret).

Submitting Your Network

You may repeatedly submit whatever network you want. A repeat submission will replace your prior submission. To submit your network for the competition, run the following command:

: ./submit net-file

This command will write a world-readable copy of your network into a submission directory corresponding to your username.

Example

To submit the logistic regression network above, you'd type

: ./submit ~/net-lr.dat

Leaderboard

Once per hour, the leaderboard in the file leaders.txt will be updated in the lab directory. It will not show the actual score on the test data set (that will be revealed at the end), but the user names of the top five networks are shown (in order). The baseline entry weinman is/was the poorly trained logistic regression model net-lr.dat above.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License.