Friday, April 3, 2015

An Introduction to CNN: Carrying the Machine learning on its shoulders

Traditional neural network layers use a matrix multiplication to describe the interaction between each input unit and each output unit. This means every output unit interacts with every input unit. Convolutional networks, however, typically have sparse interactions. This is accomplished by making the kernel smaller than the input and using it for the whole image.

Parameter sharing concept is used in CNN. It refers to using the same parameter for more than one function (input values) in a model. In a convolutional neural net, each member of the kernel is used at every position of the input. The parameter sharing used by the convolution operation means that rather than learning a separate set of parameters for every location, we learn only one set. This is also called as sparse connectivity.

convolutional neural network

If the function that a layer needs to learn is indeed a local, translation invariant function, then the layer will be dramatically more efficient if it uses convolution rather than matrix multiplication. If the necessary function does not have these properties, then using a convolutional layer will cause the model to have high training error.


First stage, the layer performs several convolutions in parallel to produce a set of presynaptic activations. In the second stage, each presynaptic activation is run through a nonlinear activation function, such as the rectified linear activation function. This stage is sometimes called the detector stage. In the third stage, we use a pooling function. A pooling function replaces the output of the net at a certain location with a summary statistic of the nearby outputs. For example, the max pooling operation reports the maximum output within a rectangular neighborhood. Pooling helps to make the representation becomes invariant to small translations of the input.


Zero-padding setting is when just enough zero-padding is added to keep the size of the output equal to the size of the input. It calls same convolution, full convolution, in which enough zeroes are added for every pixel to be visited k times in each direction.

The CNN behavior analysis will be explained in the further articles from the R&D team at SiliconMentor working in Computer Vision, Biomedical Signal Analysis, VLSI and their associated domains.