Software Secret Weapons™

Cracking WaterCap CAPTCHA In 24 Hours
by Pavel Simakov on 2007-05-10 01:29:54 under Spam & Bots, view comments
Bookmark and Share
I have recently posted WaterCap - new, simple and strong CAPTCHA image generator in PHP. At the same time I mentioned that I have no idea how strong it really is. Various notes that people made while assessing the strength are collected here...

12 Hours Passed
In just 12 hours some one took a crack at breaking my CAPTCHA. Here is what was said:

The noise doesn't help a bit if it's easy to filter out. The letters a very predictable and easy to crack. It would only help if you're one of the few that uses this code, but if it's accepted as a patch I'm sure it will be cracked in no time. Random fonts, font size, rotation, character position, contrast, color, backgrounds, blur etc, those would really make a "better captcha".

To prove the point two pictures were posted with the noise removed and the edges detected. Here:

WaterCap Crack: characters a..z, noise removed

WaterCap Crack: characters 0..1, edges detected

Sad news... I, however, don't have these tools to try anything myself. If I had the tools I could try to improve the design. Should ask around for tools and try to reproduce these.

24 Hours Passed
Following the hints posted elsewhere I found some tools that can effectively do a digital image processing. And, well, it's true! The WaterCap backgrounds can be removed and the edges can be detected - quite easily.

Having the image processing tools it's possible to compare the WaterCap with other CAPTCHA's. And what a surprise... It looks like the WaterCap is nor better or worth than any other CAPTCHA that I have collected so far. In all the CAPTCHAS the backgrounds can be removed and the edges can be detected using identical 4-5 step image processing sequence!

And I spend just minutes creating the digital filters. Much better results are possible if more time can be put into the selection of filters, adjusting brightness, etc. Here you can see the results of my experimentation on background removal/edge detection in various CAPTCHA's.

WaterCap Crack: characters a..z, original

WaterCap Crack: characters a..z, noise removed, edges detected, method 1

WaterCap Crack: characters a..z, noise removed, method 2

Other CAPTCHA's (some defeated): original

Other CAPTCHA's (some defeated): noise removed, edges detected, method 1

Other CAPTCHA's (some defeated): noise removed, edges detected, method 2

Having the tools allows me to conduct some quick and promising experimentation. However, I am limited in the ability to manipulate these tools. The tools I found are full blown end-user graphical software applications (a.k.a. Microsoft Office) with very limited scripting ability. I have to make many manual operations: open files, use menu, click here and there wasting a lot of time to perform each experiment. In order to have a good CAPTCHA testing/cracking platform the same image processing methods need to be available in a programming language PHP, Java, etc. Since WaterCap was written in PHP I better find and learn how to use a digital image processing library for PHP...

36 Hours Passed
Well, you only need PHP 5 with a standard GD library to get the backgrounds removed and the edges detected. In just 10-15 lines of PHP I can clean up the original WaterCap challenge image to fully remove the noise and to reveal the features of individual characters.

Here below you can see the results of my home-grown WaterCap noise removal/edge detection in PHP.

WaterCap Crack: characters a..z, original

WaterCap Crack: characters a..z, home-grown noise removal/edge detection

48 Hours Passed
Some dust has settled and new opinions are in. Highway of Life fully understood the intentions and the design behind the WaterCap! He saw more than just YACA - "Yet Another CaptchA". Here is the quote:
I know that NO CAPTCHA is perfect, but it's the uniqueness of each one that will defeat the bots. The trick in yours I think is the fact that it appears as an optical illusion, it's the shadows that are creating the characters, this is good, since they are not directly outlined. Sure, it could be defeated, but as long as it makes it more difficult to defeat, you are winning.

Here is an example of an optical illusion. What letters do you see?

(the key: there are yellow letters inside the black letters)

Comments (4)

  • Comment by Aang — January 11, 2008 @ 2:16 pm

    Is a very good idea to use: optical illusion
    Another captha that I like a lot is used on the brazilian system of domain register –>
    The capcha make questions about the letters and numbers (collors, position, etc.) on the captha.

    Best regards!

  • Comment by Lee Anne — March 6, 2008 @ 10:20 pm

    You guys are over analyzing the problem. Just use bitmap comparison on blue tinted pixels. Make a font and don’t even worry about segmentation. Bitmap comparison is so fast (unless you guys are using PHP) that you can try each pixel for each of the 26 letters and throw away pixels with low confidence. It isn’t hard.

    Here’s some software which could probably help

  • Comment by Pavel Simakov — August 30, 2011 @ 4:18 pm

    Sad, but true. Any information out there can be used for the good or the evil.

  • Comment by david — September 30, 2011 @ 1:44 pm

    Very interesting results for your investigation and publishing images noise removed thank you

Leave a comment

  Copyright © 2004-2015 by Pavel Simakov
any conclusions, recommendations, ideas, thoughts or the source code presented on this site are my own and do not reflect a official opinion of my current or past employers, partners or clients