Lab: Neural Networks
CSC261 - Artificial Intelligence - Weinman
- Summary:
- Exploring network topologies and training hyperparameters,
we compete to find the best neural network for digit recognition by
using the backpropagation algorithm to learn neural network parameters.
Preparation
- Change to the directory containing the the program and data for this
lab
-
$ cd ~weinman/courses/CSC261/code/nnc
Background
The Semeion
Handwritten Digit Data Set consists of 1,593 instances of handwritten
digits 0-9. These images are scaled to 16×16 and converted
to binary (boolean), so the inputs are 256 dimensional feature vectors,
while the output is a "one-hot" vector of length 10, indicating
which digit the input corresponds to.
These data have been broken into a train, validation, and test set.
You can use the train and validation sets. You don't get to see the
test set until the end. The files are as follows
- train-data.txt
- The file containing the 256 inputs representing
each of the 398 training images.
- train-labels.txt
- The file containing the 10 target outputs
for each of the 398 training labels.
- validate-data.txt
- The file containing the 256 inputs
representing each of the 398 validation images.
- validate-labels.txt
- The file containing the 10 target
outputs for each of the 398 validation labels.
Note that since the entire data set is cut in half, and then half
again to produce your training set, you only have about 400 training
examples. That is not very much data, so you will need to be
very careful about not overfitting.
You have until the last few minutes of class to compete to produce
the best scoring network. Things to explore:
- Learning rate
- Convergence threshold
- Network topology
- Number of hidden layers (zero up to ....)
- Size of each hidden layer (one up to ...)
- Schedule for learning rate
May the best net win!
The next sections tell you how to train, test, continue training,
and submit your network(s).
Tips
- Vary exponentially.
- If you're exploring hidden layer sizes,
make dramatic changes to see performance differences. (I like powers
of two.)
- Keep organized!
- You might expore lots of possibilities. You
need a good naming scheme and strategy for tracking what you have
tried and how it worked.
- Submit often.
- The leaderboard will update regularly. You'll
want to know whether you're "running with the pack" or might need
to change strategy.
- Consider cross-validation.
- Both the nominal train and validation
sets are the same size. You could swap their train/validation roles
to calculate a more stable average of performance (though doing so
will take more time of course).
- Parallelize search.
- Your host CPUs have four cores, which means
you can run four training sessions simultaneously.
Training
The first training program creates, trains (using batch gradient descent),
and saves a network trained using the parameters you specify:
-
./train inputs-file targets-file network-file step-size tolerance [dim1 ... dimN]
Here is what these parameters mean.
- inputs-file
- The path to the file containing the input
feature vectors for all the examples to be trained on
- targets-file
- The path to the file containing the target
output vectors (i.e., the training labels) for all the examples to
be trained on
- network-file
- The file name to save the learned network
in. (Any existing file by the same name will be overwritten.)
- step-size
- The size of the step to take in the batch
gradient descent algorithm. "alpha" in the text.
- tolerance
- The gradient descent algorithm will stop when
the L2 loss function being minimized differs by less than this
amount (on average per example) after one step of weight updates.
- dim1
- (Optional) The number of units in the first hidden
layer
- ...
-
- dimN
- (Optional) The number of units in the last hidden
layer (just before the outputs)
All layers are fully connected using sigmoid activation functions.
Providing no dim arguments would construct a network with
no hidden layers (i.e., logistic regression).
Examples
To begin training a logistic regression model on the digits training
set, one might use the command
-
./train train-data.txt train-labels.txt ~/net-lr.dat .001 .001
This reports some information about the training data and network
structure before reporting the epoch and loss every so often. It concludes
with the per-example average loss and a classification error rate
as measured by the arg max of the output ndoes.
To begin training a model with a single-hidden layer of 128 units
on the digits training set, one might use the command
-
./train train-data.txt train-labels.txt ~/net-128.dat .001 .001 128
There are far more parameters in this network, so of course it will
take longer to run each epoch.
To begin training a model with two hidden layers, one of size 64 and
the next of size 32, one might use the command
-
./train train-data.txt train-labels.txt ~/net-64-32.dat .01 .0001 64 32
Testing
While the training program reports the average loss and classification
error rate (a number between zero and one) on the training data, it's
usually nice to know how your resulting net performs on a held-out
validation set. The testing program allows you to do this:
-
./test inputs-file targets-file network-file
The parameters are the same as above, except this time the net stored
in network-file is loaded and used for subsequent
testing rounds (forward propagation only).
Note that test reports the total loss, rather than the average
loss. along with the classification error rate.
Examples
To test the examples above on the held out validation data, one might
run the following commands:
-
./test validate-data.txt validate-labels.txt ~/net-lr.dat
./test validate-data.txt validate-labels.txt ~/net-128.dat
./test validate-data.txt validate-labels.txt ~/net-64-32.dat
Continued Testing
Rather than start training from scratch with random weights, sometimes
one may want to continue training a previously trained network. We
might want to use a different step size or continue with the same
step size but a lower convergence tolerance so that training will
continue for longer. The following program allows you to do this continued
training:
-
./train2 inputs-file targets-file network-start-file step-size tolerance network-out-file
Note that this program requires no network dimensions. That's because
it gets them from the network-start-file. When the program
is done, it save the results in network-out-file.
Examples
To continue training the logistic regression model above with a finer
convergence threshold:
-
./train2 train-data.txt train-labels.txt ~/net-lr.dat .001 1e-5 ~/net-lr-m5.dat
Note that typing the command line argument 1e-5 means 1×10-5
and is equivalent to typing 0.00001 (not to mention it's
much easier to interpret).
Submitting Your Network
You may repeatedly submit whatever network you want. A repeat submission
will replace your prior submission. To submit your network for the
competition, run the following command:
-
./submit net-file
This command will write a world-readable copy of your network into
a submission directory corresponding to your username.
Example
To submit the logistic regression network above, you'd type
-
./submit ~/net-lr.dat
Leaderboard
Once per hour, the leaderboard in the file leaders.txt will
be updated in the lab directory. It will not show the actual score
on the test data set (that will be revealed at the end), but the user
names of the top five networks are shown (in order). The baseline
entry weinman is/was the poorly trained logistic regression
model net-lr.dat above.
Copyright © 2011, 2013, 2015, 2018, 2020 Jerod
Weinman.
This work is licensed under a Creative
Commons Attribution-Noncommercial-Share Alike 4.0 International License.