Convolutional Neural Networks

Go to [[Week 2 - Introduction]] or back to the [[Main AI Page]]

Part of the page on [[Neural Networks]]

tl;dr A CNN breaks an image down by simplifying it a whole bunch of times to try and recognise macro-features and classifying those features it thinks it has a high certainty of seeing.

For example, if a convolution layer fuzzies the shit out of an image and can still see a ‘smile’ macro feature, it passes that on. If another layer fuzzies the shit out of it and is fairly sure it sees an ‘eye’ macro feature, it passes that on. If then a fully-connected layer gets a bunch of layers saying ‘smile’, ‘eye’, ‘hair’, ‘ears’, etc, eventually it’ll probably come out with ‘face’ as the image’s classification.

A graphical representation of AlexNet 2012, a CNN

Convolutional neural networks or CNNs are multilayer neural networks that take inspiration from the animal visual cortex. CNNs are useful in applications such as image processing, video recognition, and natural language processing. A convolution is a mathematical operation, where a function is applied to another function and the result is a mixture of the two functions.

Convolutions are good at detecting simple structures in an image, and putting those simple features together to construct more complex features.

In a convolutional network, this process occurs over a series of layers, each of which conducts a convolution on the output of the previous layer.

CNNs are adept at building complex features from less complex ones.

A CNN is composed of several kinds of layers:

Convolutional layer - creates a feature map to predict the class probabilities for each feature by applying a filter that scans the whole image, few pixels at a time.
Pooling layer (downsampling) - scales down the amount of information the convolutional layer generated for each feature and maintains the most essential information (the process of the convolutional and pooling layers usually repeats several times).
Fully connected input layer - “flattens” the outputs generated by previous layers to turn them into a single vector that can be used as an input for the next layer.
Fully connected layer - applies weights over the input generated by the feature analysis to predict an accurate label.
Fully connected output layer - generates the final probabilities to determine a class for the image.

Loading pushes...

Rendering context...