Skip to main content

How Pooling layer helps in reducing dimension in convolutional neural networks?

One of the most important layer in a CNN architecture is Pooling layer. In this article we will understand what is pooling layer? what does pooling layer do? and how pooling layer works? we will also look at different types of pooling such as max pooling, average pooling and global average pooling.

Photo by Andras Kerekes

What is Pooling layer and what does pooling layer do? 

In simple words Pooling is used for dimensionality reduction in CNN. Why dimensionality reduction? For decreasing the computational power required to process the data. But pooling is not just for reducing the dimension only, it also helps in extracting the dominant features like edges in the image.

How pooling layer works?

So, now we know that pooling is used for dimensionality reduction but how pooling reduces dimension? Pooling works similar to filters. Consider the below image where we are using a filter of size 2 X 2. In case of filters, we used to multiply filter values to the input element wise and calculate the sum. In case of pooling, we still have the filter window but instead of multiplying we do some operation like taking max or average within the window. For example, in the below image having a 2 X 2 filter with stride 2, we have taken the max of the filter window and taken it as the activation function. Consider the green area where we have taken the max of the 4 values that is 7 and took it as the activation function then filter moves to the next position that is the red area and again it took the maximum value that is 10. Similarly, the filter moves to the orange and blue area and takes the max value from those areas.

pooling in convolution neural network
Max Pooling

Now we can see that the dimension has been reduced from a 4 X 4 input to a 2 X 2 output. But how pooling extract the dominant features.

In the above image we took the max value of the filter window and if the filter has detected something then the maximum value represents those detections hence taking only that value only and throwing away the redundant information.

Now let’s see how pooling works with the 3d input image.

Consider the below image with a 3d input image having RGB channels. So, when we apply pooling to 3d image it works independently on each channel. The filter will first go through the red channel and takes the max values for each window and then similarly to the green and blue channels. So, for a 3d input we get a 3d output with reduced dimensions.

max pooling on 3d input
Max Pooling on 3d image

Let’s take a look at different types of pooling

Max Pooling - we have already discussed this pooling where we take maximum value of the filter window.

Average pooling - It is similar in operation as max pooling the only difference is instead of taking the maximum of the values, we take the average of the values in the window.

max pooling and average pooling
Different types of pooling

Global average pooling

Earlier we have seen that if we have a 3D input image we will get a 3d output. But in global average pooling if we pass a 3D input, we get a 1D output. This is used when we want the CNN or the feature extraction part to connect to the fully connected part.

In global average pooling the filter size is equal to the size of the entire image. So instead to a 2 X 2 filter that we used in max pooling and average pooling the filter size will be 4 X 4 that is equal to the size of the images. It takes the average of the entire channel. For example, it will first take the average of entire red channel then green and then blue and convert it into a 1D vector.

global average pooling
Global average pooling

Global max pooling

It is similar to the global average pooling the only difference is that instead of taking average we take maximum value of the window.

global max pooling
Global max pooling

Watch the video version of this article for better understanding.

So this is it about pooling. I hope after reading this article you have got a good understanding of pooling layer in cnn. Check out my articles on the basics of convolution neural networkfilters, stride and padding.


Popular posts from this blog

Understanding mean Average Precision for Object Detection (with Python Code)

Photo by  Avel Chuklanov  on  Unsplash If you ever worked on object detection problem where you need to predict the bounding box coordinates of the objects, you may have come across the term mAP (mean average precision). mAP is a metric used for evaluating object detectors. As the name suggest it is the average of the AP. To understand mAP , first we need to understand what is precision, recall and IoU(Intersection over union). Almost everyone is familiar with first two terms, in case you don’t know these terms I am here to help you. Precision and Recall Precision: It tells us how accurate is our predictions or proportion of data points that our model says relevant are actually relevant. Formula for precision Recall: It is ability of a model to find all the data points of inte

Extract Captcha Text using CNN in Python(Captcha solver)

Photo by Janik Fischer on Unsplash Captcha solver or captcha text extraction is a process of extracting text from the captcha image. This can be done by using OCR (Optical character recognition) tools like ‘Tesseract’. But to understand Computer vision more deeply you can build your own custom captcha solver. So let’s see how you can build your own captcha solver with the help of openCV and keras. Building Captcha Solver In order to detect the text in the captcha we will build a CNN model trained on separate image of letters of the captcha. For building the model we need to separate out each letter image from the captcha and write it for training the model. After training our model we can pass

How to Scrape Soundcloud data using Selenium? (from scratch)

Photo by  ClĂ©ment H  on  Unsplash Hello there, if you are new to web scraping or want to learn how you can scrape data from websites using Selenium then this article is for you. In this article we are going to scrape data from SoundCloud but you can use this technique to scrape data from other websites also. Before we move further and jump into coding, let’s take a look at what is web scraping. If you already hold knowledge about scraping you can jump to the coding section. Web Scrapping (also termed Screen Scraping, Web Data Extraction, Web Harvesting etc.) is a technique employed to extract large amounts of data from websites whereby the data is extracted and saved to a local file in your computer or to a database in table (spreadsheet) format.  As mentioned in the topic of the article, we are going to use Selenium for scraping the data. In case if you don’t know what seleni