- Can make the parameter vector the output of a function.

- eg. weight sharing with a Y connector (tie values of W together)

- weights are forced to be equal; basis of a lot of ideas

- gradients are summed up while backpropagating

- Hypernetwork: weights of one network are computed as the outputs of another network. (will come back in a few weeks)

- Detect motif anywhere on an input

- detect if there's a speech signal

- can have a detector that can slide over the input – that all share the same weight

- output goes to a max function

- similarly can swipe a template over the image to detect motifs

- Shift invariance – output unchanged with shift in input

- Shift equivariance – output changes corresponding to shift in input

- Convolution y
_{i} = sum over j w_{j} * x_{(i-j)}

- Index goes backwards in the window, goes forward in the weights

- 2d y
_{(ij)} = sum over kl w_{kl} x_{(i+k,j+l)}

- Cross correlation

- Index and weights go forward together

- w is the convolution kernel

- stride move window forward by > 1 step

- generally solved by padding the windows

- Inspired by biology

- brain recognizes objects in 100ms

- can have very specialized: invariant to irrelevant transformations

- simple cells detect local features

- complex cells pool outputs of simple cells

- complex cell relies on a combination of all the subcells

- Architecture

- Filter banks / non linearity / pooling

- Modern arch

- Normalization

- Filter bank

- Non linearity

- Feature pooling (generally max pooling)

- max, pth root of sum of pth powers; probability pooling

- basically functions that return the same value irrespective of position

- (Repeat above)

- Classifier

- Fully connected layers

- can be viewed as 1 by 1 convolutions

- LATER Read LeCun 1998 and implement it with Pytorch; implement it with my own code

- github.com/activatedgeek/LeNet-5

- Multiple character recognition

- Swipe convnet over the input: shifting it over the input

- Allows giving characters

- That becomes extremely wasteful

- Instead

- Take a large input and keep convolving

- Get multiple outputs with a convolutional layer

- much more cheaper than recomputing at every location

- Another approach was finding the character at the middle of the convnet

- Convnets are good for

- shift invariant, distortion invariant

- sizes of the objects change a lot

- be able to detect smaller objects by training on constantly smaller sizes on the same convnet

- multi dim array signals

- strong local correlations between values

- less similar with distance

- features can appear anywhere is why we can have shared weights

- FC net doesn't care about permutations

- Practicum

- signals can be represented as vectors – waveform

- words are one hot vectors – language has those kinds of properties

- Receptive field = how many neurons I see with the previous layer

- sparsity only because data shows locality

- stationary – things appear again and again => can share parameters

- parameter sharing leads to

- faster convergence

- better generalization

- not constrained on the input size (can keep shifting)

- kernel independence => high parallelization

- kernels => 1d data

- 1d data uses 3 kernels

- odd sized kernels so that it's evenly distributed on both sides

- Standard spatial cnn

- multiple layers of

- conv

- non linearities

- pooling

- batch normalization

- Residual bypass connection

- logit at the end for classification

- Geoff hinton – capsule maps