Skip to main content

Extract Captcha Text using CNN in Python(Captcha solver)

Photo by Janik Fischer on Unsplash

Captcha solver or captcha text extraction is a process of extracting text from the captcha image. This can be done by using OCR (Optical character recognition) tools like ‘Tesseract’. But to understand Computer vision more deeply you can build your own custom captcha solver. So let’s see how you can build your own captcha solver with the help of openCV and keras.

Building Captcha Solver

In order to detect the text in the captcha we will build a CNN model trained on separate image of letters of the captcha. For building the model we need to separate out each letter image from the captcha and write it for training the model. After training our model we can pass the unseen captcha image and our model will detect the captcha text.

Training Data

Before moving forward make sure you have openCV and keras library installed.

Preparing Training Data

Our first step is to prepare training data. For this we will separate out each later present in the captcha and write them as separate images. Our data contains captcha images with their text as image name(look at the captcha image shown above). we will start by importing necessary libraries and then reading the image using openCV. we will grab the image name text using splitext function for future use.

Now we will add a small padding of 2 pixels around the captcha to avoid any text touching the border. After that we will apply threshold to create the difference between the pixels of text and the captcha background. I am applying ‘THRESH_BINARY_INV’ here but you can change the values or apply other thresholds according to your captcha images.

In order to separate each letter, we will find contours using cv2.findcontours function. After that we will loop over all contours and grab the coordinates of the bounding rectangle. In a captcha two letters can be very close to each other or joined together. To counter this problem, we will check if any contour is too wide or not, if we find a contour to wide we will split it into half and will store these coordinates in an empty list.

Now it’s time to save or write the separated letters image with their correct name. Remember our captcha text is our image name, which means we have to grab our first letter’s image and first letter from the image name. To match the letters image and letters name correctly we will sort our coordinates list based on the x coordinates and grab the letters from left to right. After this we will extract the letters from the original image and add a 2 pixels padding around it.

Next we will set the path to save the extracted letters images in separate folders based on the letters names. Here is the full code for extracting letters from all the captcha images present in a folder using for loop.

So now that we have our training images are ready, it’s time to prepare training data for our CNN model. We will read the extracted images and convert them to grayscale. Our extracted images are not of the same size so we will resize them. Now our image is resized to (x, y). Our image is grayscale, hence it has only two dimensions but keras needs 3 dimensions, so we add another dimension using numpy. We will store the image array into an empty list ‘data’ and the image labels which is the image name in list ‘labels’.

For better training we will rescale our pixel values between 0 and 1. Next we will split data into training and validation sets. After that we will label encode our target variable using LabelBinarizer.

Building and Training CNN model

Now we got everything ready for building CNN model, so let’s build our CNN model using keras sequential model.

Let’s train our model. We will use early stopping for avoiding over-fitting.

Our CNN model is trained, let’s test it on an unseen captcha image. Before testing the unseen image, we have to extract out each letter of the captcha as we did with the training images.

captcha detection using machine learning
Prediction on unseen data


This method worked for this types of captchas perfectly but it will not work for all the captchas as there are different types of captchas with different complexity. But you can build your own captcha solver by understanding the type of captcha and find out how you can separate out letters or how you can clean the captcha image before extracting the text.

That’s all for this article. You can grab the code for this project from github here:

Watch this video and learn how pooling layer works in CNN.

Check out my article on neural networks where I explained how neural network works in a very simple way without using any complex math or learn about convolutional neural network through this article.


Popular posts from this blog

Understanding mean Average Precision for Object Detection (with Python Code)

Photo by  Avel Chuklanov  on  Unsplash If you ever worked on object detection problem where you need to predict the bounding box coordinates of the objects, you may have come across the term mAP (mean average precision). mAP is a metric used for evaluating object detectors. As the name suggest it is the average of the AP. To understand mAP , first we need to understand what is precision, recall and IoU(Intersection over union). Almost everyone is familiar with first two terms, in case you don’t know these terms I am here to help you. Precision and Recall Precision: It tells us how accurate is our predictions or proportion of data points that our model says relevant are actually relevant. Formula for precision Recall: It is ability of a model to find all the data points of inte

How to Scrape Soundcloud data using Selenium? (from scratch)

Photo by  ClĂ©ment H  on  Unsplash Hello there, if you are new to web scraping or want to learn how you can scrape data from websites using Selenium then this article is for you. In this article we are going to scrape data from SoundCloud but you can use this technique to scrape data from other websites also. Before we move further and jump into coding, let’s take a look at what is web scraping. If you already hold knowledge about scraping you can jump to the coding section. Web Scrapping (also termed Screen Scraping, Web Data Extraction, Web Harvesting etc.) is a technique employed to extract large amounts of data from websites whereby the data is extracted and saved to a local file in your computer or to a database in table (spreadsheet) format.  As mentioned in the topic of the article, we are going to use Selenium for scraping the data. In case if you don’t know what seleni