Readings: Convolutional Neural Networks

CSC 262 - Computer Vision - Weinman

Summary:: Outline of the preparatory readings on learning and convolutional neural networks.

The reading for this unit comes from Foundations of Computer Vision by Torralba, Isola, and Freeman (2024).

Although this reading assignment is not substantially longer than many prior readings (at 28 pages), it spans several chapters in the textbook on its way toward hitting five key ideas:

Computer vision tasks can be solved by "fitting" functions to example data: learning.
When the task-function is differentiable, we learn by iteratively taking small steps toward improvement: gradient descent.
Multiple "layers" (compositions) of linear and non-linear functions can learn any task: neural networks.
With gradient descent, the chain rule of calculus enables efficient learning in neural networks: backpropagation.
Layer types that specifically leverage image properties excel at visual tasks: ConvNets.

Keeping these key ideas in mind, you should be able to move somewhat quickly through many sections of the reading that help flesh out the connections among them. Focus on the text and the figures, rather than getting too hung up on the equations. (Many illustrations help to provide good intuitions for the equations; these are worth studying.)

Sections	Pages	Topic
9.1-9.3	137-140	Learning with examples
10.1-10.3	151-152	Basic gradient descent
12.1-12.5.0	175-180	Multi-layer perceptron (neural network)
14.1-14.4	199-204	Backpropagation
24.1-24.2.1	403-407	Convolutional layers
24.2.5-24.4.4	413-417	Other layers: sampling, pooling, and norm

The section on backpropagation is perhaps worth making the greatest investment of time in understanding the mathematics. The ideas are relatively simple on the surface, but worth mastery of the nuances to solidify intuitions.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License.