CAPTCHAs and the Fake Security

published on January 17, 2011 in technical

We all use CAPTCHAs all the time, almost every day. For an average user, even its name is scaring, not to mention its functionality. Even I, as developer and a heavy internet user, did not register on many sites, because they used bad CAPTCHAs. In this article, I shall talk a bit about CAPTCHAs, their fake security and alternatives.

1. What is a CAPTCHA?

CAPTCHA stands for Completely Automated Public Turing test to Tell Computers and Humans Apart. Scary, isn't it?

We all are familiar with registration forms, with comment availability at the bottom of articles or posts. If we take the simplest case, all of these are not too secure. Internet bots, are which basically programs, can fill your database with spam registrations, with garbage comments. In the end, this may cause a free space problem on your server, high system load due to intensive MySQL work.

This is a situation that we do need. At all.

As a workaround for the above mentioned problem, we have the CAPTCHA. It has been created, made up, to make or websites more secure, by protecting the input forms from robots. But, in practice, this protection is not as effective as it should be. In fact, 88 - 100% of the existing CAPTCHAs can be “guessed” with a little OCR knowledge.

2. What should we use instead?

There a few initiatives as an answer to this question.

One of them, is a W3C article from 2005, where the author writes in detail about tha fake security of CAPTCHAs. Also, he provides a few replacements:

  • logic puzzles – our main end is to make the user think because bots cannot think. Anything will do, that fullfills this rule. 
  • sound output – instead of using pictures, we use a sound: the user hears some sound, an he must identify the phrase he heard.
  • background checks – there are two kinds of these. One is classical spam protection, where using spam protection logic we can tell the humans apart. The other is using heuristics. Basically, this one is about havy background statistics, like amount of inout data, access times and so forth, wich help us decide whether we are dealing with a human or not.
  • services – PASSPORT services, public-key infrastructure solutions and biometrics fit in this category
  • other possibilities – these are based upon credit card numbers, or other similar IDs, like Social Security number in US.

There is also a project called reCAPTCHA. Usually, people spend around ten seconds to get the correct value of a CAPTCHA. This time is basically a useless time, especially for the user. This project tries to make that lost time be a bit more usefull. The initiative uses the strings in the CAPTCHAs to help digitize printed books. This is bit usefull for the humanity, but not necessarily for the user.

3. What would I do, if I needed some kind of spam protection

As a first step, I would check the size of the site I am working on. If we are talking abount a big portal, with lots of users, then I shall need some serious protection, which might be a professional CAPTCHA, to provide the protection I need.

But if the site is not that big, lets say a smaller blog, then I would try getting the users used to little uncommon ways of protection. If this prooves too little, then I still have the possibility of using a CAPTCHA in the future.

If I assume that most of the bots do not understand Javascript or CSS, then I already have a bunch of great protection possibilities. Phil Haack calls on of these a Honeypot Captcha. The method is very simple: bots are little hardworking beasts, they fill in every field they find on a form. All we need is a invisibile field in the form, that must be left empty. We are going to check this on the server-side and act accordingly. Ned Batchelder talks of something similar in his article: Stop Spambots with Hashes and Honepots.

Another simple way of protection, mentionned by the same Phil Haack, the Lightweight Invisible Captcha Validator Control. Based on the same lack of CSS and Javascript knoledge on the part of the spam bots, it fills in a hidden field using Javascript. The value is a expected value, that the server-side code knows of. We check it over thee and drop the request if needed, or, even better, we teach our code to recognize the IP of the request and never accept submits from that IP (in reality, this is a bit more complicated).

Both ways have good examples on the subkismet.com site.

If I would need something intelligent and more complex, I would implement a Bayesian spam protection algorithm. The explanation of this algorithm points beyond the scope of this article. Maybe I shall write an entire one solely on this.

There are also the spam protection services. One the most famous is the Akismet which is free for private sites and cheap for bussinesses..

4. Conclusion

The most important thing is tha we must keep in mind the best interests of our users. THey do not care that our portal is at stake, or under consant attack. What they want, is to use the services we are providing. And if think it over a bit, that is also what we really what. If our user is happy, he will come back and use our site again, and again. Also, if wee need the protection, we should try to make the users lose as little time as possible, as we all know, time is money, and on the internet this a lot more true then on a classical market.