Software Secret Weapons™

 
WaterCap Strong PHP CAPTCHA With Negative Spaces And Shadows
by Pavel Simakov on 2007-05-09 22:07:15 under Spam & Bots, view comments
Bookmark and Share
 

Introduction

Most of the Internet users these day have seen a CAPTCHA. A CAPTCHA is a challenge-response test used on many web sites to determine whether or not the user is human. It's the most widely used mechanism to defend an access to a specific content against the software bots, while allowing an entry to the human users. You probably faced CAPTCHA already, especially if you use hosted email, have a web site, are involved in e-commerce or provide services over the Internet to others.

Here I present WaterCap - new, simple and strong CAPTCHA image generator (on the right hand side of the page). In under 50 lines of PHP code, WaterCap was specifically designed to withstand commonly used CAPTCHA defeat algorithms.

The problem

I am involved in the development of several large web sites, many of which heavily rely on CAPTCHA. CAPTCHA's seem to be working well, except for the phpBB forum. The phpBB forum software version 2.0.2x uses very weak CAPTCHA that is being regularly defeated by the software bots. Thus, I now get all kinds of porn, Viagra and other fun stuff in addition to serving thousands of web pages to dozens of non-human members registering daily!

If you follow the news on the topic, this might not be a surprise to you, but it is a huge surprise to me. Before I discovered this problem in my phpBB forum I didn't even think that CAPTCHA's can be defeated. Apparently, there are numerous articles {1-5} with the examples of software (some open-source) that instantly breaks CAPTCHA, some reporting over 90% success rate! So, as many others things in life - CAPTCHA is a chase! Us against them, good against evil - with a lot of time, money and humanity burned in the process...

The solution

After a quick research I found several CAPTCHA image generators for PHP, but none I liked. They all seemed a variation on the same theme and they all seem to me to be easy to defeat. Thus I decided to read more about the software that breaks CAPTCHA, hoping to construct the CAPTCHA image generator that is difficult for these tools to defeat.

The CAPTCHA breaking software {1-5} works by processing the challenge image in several stages, including some of these steps:

  1. background noise elimination- fetch the same challenge several times, hoping that is always has different random noise, but the same challenge text; if so, all images can be "added up" and the noise can be subtracted out
  2. pixel convolution (grouping) - roughly if in 3x3 matrix has only one white pixel and all other black pixels, turn this white pixel black
  3. border detection - where a bounding box for each character is detected
  4. foreground enhancement - within a bounding box 
  5. character search - brute force matching of extracted character image to a database of character images for well known fonts
  6. word validation - if it is known that a challenge is a valid word, rather than random symbol combination
  7. character outlining
  8. line thinning
  9. endpoint finding
  10. feature vector search

I have collected and inspected many examples of CAPTCHA images, most of which have been defeated already with over 90% accuracy. What makes them all easy to defeat? How can I generate challenge images in the way that makes these techniques above useless? How to complicate the "boundary detection" and the "character outlining"? Why none of these work:

Take a closer look at these images. They all have a common trait of having distinct text color. The letters are distorted in variety of ways: turned, fogger, shadowed, squished, and stretched, noise is added, but one thing remains the same - the color of all characters is the same. This is the main weakness!

WaterCap CAPTCHA image generator described here is designed to eliminate this weakness and make several steps in the automatic image recognition process especially difficult. With WaterCap the pixel convolution becomes useless, the border detection is much harder and so is the foreground enhancement. And it all is achieved with one simple technique - by imprinting the text with negative spaces and shadows, by using the background color as the text color.

As I think more and more about this I even have an idea why other CAPCHA engines draw the text a one specific color. I think that drawing colored text is complex. As far as I know, a typical drawText() function found in Java, .Net, Delphi, PHP or Perl drawing API's just can't do it. Can this really be so simple...

I have no proof yet that the WaterCap is a better CAPTCHA image generator, compared to other generators. (Edited on 070508: There is more information about evaluating this CAPTCHA's strength below.) But it seems to me to be so, because the WaterCap doesn't use any additional color for the text - it uses the background color itself. The noise is placed on top and around the text, so it resembles the shadow of the letter, but without continuous boundary around each character. This is what I think will make it difficult to defeat WaterCap by a software program. And the beauty is in simplicity: only 50 lines of PHP code is needed to create the image! Here I have several examples:

WaterCap Example 1: Characters 0..1

WaterCap Example 2: Characters a..z

WaterCap Example 3: Characters A..Z

The implementation

The complete PHP implementation of WaterCap is presented below. Since I am very new to PHP, I have started from the original code of Simon Jarvis to avoid learning PHP drawing API. The WaterCap image is obtained by drawing the same challenge text three times with three different colors, while shifting the text a bit. The small angle rotation quickly adds light fuzziness. Among other things, I made sure that noise is always the same for the same challenge code.


/*
*
* Name: WaterCap CAPTCHA Image Generator 
* Author: Pavel Simakov
* Copyright: 2007 Pavel Simakov
* Version: 0.9
* Requirements: PHP 4/5 with GD and FreeType libraries
* Link: http://www.softwaresecretweapons.com/jspwiki/Wiki.jsp?page=WaterCap_Strong_PHP_CAPTCHA_With_Negative_Spaces_And_Shadows
*
* Based on prior work of: Simon Jarvis
* Link: http://www.white-hat-web-design.co.uk/articles/php-captcha.php
* 
* This program is free software; you can redistribute it and/or 
* modify it under the terms of the GNU General Public License 
* as published by the Free Software Foundation; either version 2 
* of the License, or (at your option) any later version.
* 
* This program is distributed in the hope that it will be useful, 
* but WITHOUT ANY WARRANTY; without even the implied warranty of 
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 
* GNU General Public License for more details: 
* http://www.gnu.org/licenses/gpl.html
*
*/

class WaterCap {
 
   var $font = '../res/monofont.ttf';
 
   function WaterCap ($code, $width='250', $height='60') {
	  
      /* seed random number gen to produce the same noise pattern time after time */
      mt_srand(crc32($code));	

      /* init image */
      $font_size = $height * 0.85;
      $image = @imagecreate($width, $height) or die('Cannot initialize new GD image stream');

      /* set the colours */
      $background_color = imagecolorallocate($image, 255, 255, 255);
      $text_color = imagecolorallocate($image, 20, 40, 100);
      $noise_color = imagecolorallocate($image, 100, 120, 180);

      /* create textbox and add text */
      $textbox = imagettfbbox($font_size, 0, $this->font, $code) or die('Error in imagettfbbox function');
      $x = ($width - $textbox[4])/2;
      $y = ($height - $textbox[5])/2;
      $d = -1;
      imagettftext($image, $font_size, 0, $x, $y, $text_color, $this->font , $code) or die('Error in imagettftext function');
      imagettftext(
	    $image, $font_size, 0, $x + $d, $y + $d, $noise_color, $this->font , $code
      ) or die('Error in imagettftext function');
      imagettftext(
	    $image, $font_size, 0, $x + 2 * $d + 1, $y + 2 * $d + 1, $noise_color, $this->font , $code
      ) or die('Error in imagettftext function');
      imagettftext(
	    $image, $font_size, 0, $x + 2 * $d, $y + 2 * $d, $background_color, $this->font , $code
      ) or die('Error in imagettftext function');

      /* mix in background dots */
      for( $i=0; $i<($width*$height)/10; $i++ ) { 
            imagefilledellipse($image, mt_rand(0,$width), mt_rand(0,$height), 1, 1, $background_color);		 
      }

      /* mix in text and noise dots */
      for( $i=0; $i<($width*$height)/25; $i++ ) { 
         imagefilledellipse($image, mt_rand(0,$width), mt_rand(0,$height), 1, 1, $noise_color);		 
	 imagefilledellipse($image, mt_rand(0,$width), mt_rand(0,$height), 1, 1, $text_color);		 
      }

      /* rotate a bit to add fuzziness */
      $image = imagerotate($image, 1, $background_color);

      /* output */
      imagejpeg($image);
      imagedestroy($image);
   }
}


Here is an example of using WaterCap in phpBB. Open and edit usercp_confirm.php file; add the WaterCap class definition at the top. Insert three new lines just before $_png = define_filtered_pngs(); as shown below. This is it! Nothing else to change.

...
...

header('Content-Type: image/jpeg');
$captcha = new WaterCap($code);
exit;

...
...
// We can we will generate a single filtered png 
// Thanks to DavidMJ for emulating zlib within the code :)
$_png = define_filtered_pngs();
...
...

Cracking attempts!

Cracking attempts + reader's feedback

Final word

Don't be afraid of the software bots! A software bot is just a program written by a human - by a software engineer dude just like you. It can be quickly defeated as soon as you put your thought into the defense. Don't just trust the tools (CAPTCHA or otherwise) and forget about the forces behind the games you play. The whole software engineering is about continuous change, so keep the eyes on the ball.

WaterCap CAPTCHA and the ideas from this article are yours to use as you see fit for your own projects. I have no proof yet that WaterCap works well, but I am investigating its strength and will report on it if it is confirmed. No doubt that even if it works well today it's likely not to work well tomorrow. But we will talk about what to do then when that time comes...

Resources

  1. Breaking a Visual CAPTCHA
  2. Overture CAPTCHA recognizer
  3. PWNtcha - captcha decoder
  4. Breaking the phpBB CAPTCHA
  5. XRumer - automatically posts your messages to forums, guestbooks, bulletin boards and catalogs of the links
  6. Using AI to beat CAPTCHA and post comment spam

Comments (7)

  • Comment by Botmaster — June 20, 2008 @ 11:32 am

    Very interesting, thankt you! ;)

  • Comment by frgfg — April 7, 2009 @ 3:40 am

    dsd

  • Comment by Horace — June 21, 2009 @ 6:22 pm

    Why is the $text_color and $noise_color not the same in the code below ?:

    $text_color = imagecolorallocate($img, 20, 40, 100);
    $noise_color = imagecolorallocate($img, 100, 120, 180);

    I thought that if the noise color was indistinguishable from text color, then it would make their separation more difficult…

  • Comment by Horace — June 21, 2009 @ 6:24 pm

    BTW: I have added font variations to your Watercap script.
    If you want it, just reply where you want me to send it.

  • Comment by Abram Hindle — July 7, 2009 @ 11:15 am

    http://doi.ieeecomputersociety.org/10.1109/WCRE.2008.35

    In this paper I describe modifying the watercap captcha source code in order to break it with 99% accuracy.

  • Comment by carte sd — December 14, 2009 @ 7:22 am

    Most of you are aware of the weak CAPTCHA that is used on phpBB2, and the basic version on phpBB3. Recently, phpBB3 revamped their advanced CAPTCHA, giving the user more options with the x/y axis of noise levels. The problem is, the CAPTCHA can be fairly un-readable.

  • Comment by dhruba — July 29, 2013 @ 5:36 pm

    Implementing some random rotation and watery effect
    will justify your work.


Leave a comment


  Copyright © 2004-2014 by Pavel Simakov
any conclusions, recommendations, ideas, thoughts or the source code presented on this site are my own and do not reflect a official opinion of my current or past employers, partners or clients