# Deep Learning Nyu/Week 3

• Can make the parameter vector the output of a function.
• eg. weight sharing with a Y connector (tie values of W together)
• weights are forced to be equal; basis of a lot of ideas
• gradients are summed up while backpropagating
• Hypernetwork: weights of one network are computed as the outputs of another network. (will come back in a few weeks)
• Detect motif anywhere on an input
• detect if there's a speech signal
• can have a detector that can slide over the input – that all share the same weight
• output goes to a max function
• similarly can swipe a template over the image to detect motifs
• Shift invariance – output unchanged with shift in input
• Shift equivariance – output changes corresponding to shift in input
• Convolution yi = sum over j wj * x(i-j)
• Index goes backwards in the window, goes forward in the weights
• 2d y(ij) = sum over kl wkl x(i+k,j+l)
• Cross correlation
• Index and weights go forward together
• w is the convolution kernel
• also called a filter
• stride move window forward by > 1 step
• generally solved by padding the windows
• Inspired by biology
• brain recognizes objects in 100ms
• can have very specialized: invariant to irrelevant transformations
• simple cells detect local features
• complex cells pool outputs of simple cells
• complex cell relies on a combination of all the subcells
• Architecture
• Filter banks / non linearity / pooling
• Modern arch
• Normalization
• Filter bank
• Non linearity
• Feature pooling (generally max pooling)
• max, pth root of sum of pth powers; probability pooling
• basically functions that return the same value irrespective of position
• (Repeat above)
• Classifier
• Fully connected layers
• can be viewed as 1 by 1 convolutions
• LATER Read LeCun 1998 and implement it with Pytorch; implement it with my own code
• github.com/activatedgeek/LeNet-5
• Multiple character recognition
• Swipe convnet over the input: shifting it over the input
• Allows giving characters
• That becomes extremely wasteful
• Take a large input and keep convolving
• Get multiple outputs with a convolutional layer
• much more cheaper than recomputing at every location
• Another approach was finding the character at the middle of the convnet
• Convnets are good for
• shift invariant, distortion invariant
• sizes of the objects change a lot
• be able to detect smaller objects by training on constantly smaller sizes on the same convnet
• multi dim array signals
• strong local correlations between values
• less similar with distance
• features can appear anywhere is why we can have shared weights
• FC net doesn't care about permutations
• Practicum
• signals can be represented as vectors – waveform
• words are one hot vectors – language has those kinds of properties
• Receptive field = how many neurons I see with the previous layer
• sparsity only because data shows locality
• stationary – things appear again and again => can share parameters
• parameter sharing leads to
• faster convergence
• better generalization
• not constrained on the input size (can keep shifting)
• kernel independence => high parallelization
• kernels => 1d data
• 1d data uses 3 kernels
• odd sized kernels so that it's evenly distributed on both sides
• Standard spatial cnn
• multiple layers of
• conv
• non linearities
• pooling
• batch normalization
• Residual bypass connection
• logit at the end for classification
• Geoff hinton – capsule maps