The hidden layers are either convolutional, pooling or fully connected.
https://en.wikipedia.org/wiki/Convolutional_neural_network
Convolutional layers apply a convolution operation to the input, passing the result to the next layer. The convolution emulates the response of an individual neuron to visual stimuli.[7]
Each convolutional neuron processes data only for its receptive field. Tiling allows CNNs to tolerate translation of the input image (eg. translation, rotation, perspective distortion) .[clarification needed]
Although fully connected feedforward neural networks can be used to learn features as well as classify data, it is not practical to apply this architecture to images. A very high number of neurons would be necessary even in a shallow architecture (opposite of deep). The convolution operation brings a solution to this problem as it reduces the number of free parameters, allowing the network to be deeper with fewer parameters.[8] In other words, it resolves the vanishing or exploding problems in training traditional multi-layer neural networks with many layers by using backpropagation.