Extract Captcha Text using CNN in Python(Captcha solver)

Photo by Janik Fischer on Unsplash

Captcha solver or captcha text extraction is a process of extracting text from the captcha image. This can be done by using OCR (Optical character recognition) tools like ‘Tesseract’. But to understand Computer vision more deeply you can build your own custom captcha solver. So let’s see how you can build your own captcha solver with the help of openCV and keras.

Building Captcha Solver

In order to detect the text in the captcha we will build a CNN model trained on separate image of letters of the captcha. For building the model we need to separate out each letter image from the captcha and write it for training the model. After training our model we can pass the unseen captcha image and our model will detect the captcha text.

Training Data

Before moving forward make sure you have openCV and keras library installed.

Preparing Training Data

Our first step is to prepare training data. For this we will separate out each later present in the captcha and write them as separate images. Our data contains captcha images with their text as image name(look at the captcha image shown above). we will start by importing necessary libraries and then reading the image using openCV. we will grab the image name text using splitext function for future use.

Now we will add a small padding of 2 pixels around the captcha to avoid any text touching the border. After that we will apply threshold to create the difference between the pixels of text and the captcha background. I am applying ‘THRESH_BINARY_INV’ here but you can change the values or apply other thresholds according to your captcha images.

In order to separate each letter, we will find contours using cv2.findcontours function. After that we will loop over all contours and grab the coordinates of the bounding rectangle. In a captcha two letters can be very close to each other or joined together. To counter this problem, we will check if any contour is too wide or not, if we find a contour to wide we will split it into half and will store these coordinates in an empty list.

Now it’s time to save or write the separated letters image with their correct name. Remember our captcha text is our image name, which means we have to grab our first letter’s image and first letter from the image name. To match the letters image and letters name correctly we will sort our coordinates list based on the x coordinates and grab the letters from left to right. After this we will extract the letters from the original image and add a 2 pixels padding around it.

Next we will set the path to save the extracted letters images in separate folders based on the letters names. Here is the full code for extracting letters from all the captcha images present in a folder using for loop.

So now that we have our training images are ready, it’s time to prepare training data for our CNN model. We will read the extracted images and convert them to grayscale. Our extracted images are not of the same size so we will resize them. Now our image is resized to (x, y). Our image is grayscale, hence it has only two dimensions but keras needs 3 dimensions, so we add another dimension using numpy. We will store the image array into an empty list ‘data’ and the image labels which is the image name in list ‘labels’.

For better training we will rescale our pixel values between 0 and 1. Next we will split data into training and validation sets. After that we will label encode our target variable using LabelBinarizer.

Building and Training CNN model

Now we got everything ready for building CNN model, so let’s build our CNN model using keras sequential model.

Let’s train our model. We will use early stopping for avoiding over-fitting.

Our CNN model is trained, let’s test it on an unseen captcha image. Before testing the unseen image, we have to extract out each letter of the captcha as we did with the training images.

Prediction on unseen data


This method worked for this types of captchas perfectly but it will not work for all the captchas as there are different types of captchas with different complexity. But you can build your own captcha solver by understanding the type of captcha and find out how you can separate out letters or how you can clean the captcha image before extracting the text.

That’s all for this article. You can grab the code for this project from github here: https://github.com/chauhan01/Captcha-text-extraction

It takes a lot of time and effort to write such articles. Please donate a small amount and help me make my living. Thank you.  

Check out my article on neural networks where I explained how neural network works in a very simple way without using any complex math.