New Attacks on CAPTCHAs

Abstract: We report a novel attack on two CAPTCHAs that have been widely deployed on the Internet, one being Google’s home design and the other acquired by Google (i.e. reCAPTCHA). With a minor change, our attack program also works well on the latest ReCAPTCHA version, which uses a new defence mechanism that was unknown to us when we designed our attack. This suggests that our attack works in a fundamental level. Our attack appears to be applicable to a whole family of text CAPTCHAs that build on top of the popular segmentation-resistant mechanism of “crowding character together” for security. Next, we propose a novel framework that guides the application of our well-tested security engineering methodology for evaluating CAPTCHA robustness, and we propose a new general principle for CAPTCHA design.

Tags: academic papers, captchas, Google

Posted on October 12, 2011 at 6:57 AM • 24 Comments

Comments

Brian Raaen • October 12, 2011 1:44 PM

This is a rather fascinating approach that uses a simple divide and conquer method. Thanks for the great read.

Glenn Fleishman • October 12, 2011 1:45 PM

This explains why ReCAPTCHA has become more difficult to read in the last few weeks.

Nick P • October 12, 2011 1:55 PM

@ Glenn Fleishman

Yes. And what’s worse: ReCAPTCHA is now officially easier for machines to read than humans. Quite the opposite of the intended goal, eh?

aikimark • October 12, 2011 1:56 PM

elegant approach. Thanks for posting, Bruce.

Captain Obvious • October 12, 2011 2:25 PM

@Glenn

It’s been that way for me for years, with google being the worst. When I click the audio to get help, all I hear is aliens talking.

I need a plugin that can prove I’m human.

NobodySpecial • October 12, 2011 2:46 PM

@Captain Obvious – an inverse turing test !

godel • October 12, 2011 3:56 PM

To some extent reCAPTCHA has lost its other original purpose, namely to decode text that then current OCR programs were incapable of handling.

It sounds as if the new CAPTCHA decoders do better than humans at that task.

RH • October 12, 2011 4:42 PM

What has me interested in CAPTCHAs is that, eventually, they will fail. Eventually someone will understand the human cognition model responsible for written word. They will then write a system which uses the same model as humans, so anything which prevents it from working will also yield an unreadable result from humans.

We will learn a lot about how a “successful” alphabet works, and then get spammed into oblivion.

Dirk Praet • October 12, 2011 5:37 PM

Fascinating read. Hat off for the boffins who worked on this research project. Now move on, Google et al.

kingsnake • October 12, 2011 6:12 PM

I’m no expert, but Bayesian filtering seems to make a lot of sense, at least for email. No idea if that would also work for form spam. (I imagine most blogs would not generate the volume necessary to properly teach the filter.) As far as blogs / forums go, there really is no substitute for an actively engaged editor …

http://www.paulgraham.com/spam.html

christopher • October 12, 2011 6:58 PM

that is one heck of a subtle troll, kingsnake.

Seiran • October 12, 2011 7:05 PM

“Quite the opposite of the intended goal, eh?”

One goal of reCaptcha was to stop spam, but the elegance of reCaptcha was that, if and when it became broken, we would all benefit either way.

If the captcha has not yet been broken, then we have an effective way to prove human authenticity (and stop spam).

If the captcha has been broken, then there should exist an algorithm which can can decode pictures of smudged words to a reasonably good guess, and the field of machine vision has therefore evolved to a point that makes scanning books much easier.

When the reCaptcha service is working, it “scans” books. When the reCaptcha service becomes ineffective, take the tool that is cracking the captcha and use that to scan the books.

Now all we need to do is find a problem that shares similar properties. Too bad that computers have gotten surprisingly good at telling cats from dogs.

https://research.microsoft.com/en-us/um/redmond/projects/asirra/
http://crypto.stanford.edu/~pgolle/papers/dogcat.html

I’ve used Asirra on a website before, and while no form-spam robot has ever tried to solve it, humans don’t seem to like using it either.

Winter • October 13, 2011 1:04 AM

@Seiran
CAPTCHAs as crowd-sourcing OCR problems?

That is a brilliant concepts.

Natanael L • October 13, 2011 2:04 AM

I’ve seen some sites that use puzzles. They use a wide variety.
One site use resistors that you read (but that one’s easy, you just match the colors).

Another use game-ish puzzles.
There HAS to be some problems that our brains can excel at, but that computers just can’t solve.

As a matter of fact, the game Folding is quite interesting (although I guess the average internet noob won’t beat a computer at that).

Then there’s the unfortunate situation that sites like Amazon Turk are abused by spammers to hire people in poor countries for cents per solved CAPTCHA, and there’s plenty of sites dedicated for that too.

So then there’s another possible solution: Context-dependent/cultural CAPTCHAs!
https://krebsonsecurity.com/2011/09/cultural-captchas/
Example: “What is the name of Schneier’s next book?”
Another one: “Security by obscurity ____” (isn’t)
Or this one: “What is this CAPTCHA supposed to stop?” (spam 🙂

That might be easy for a low payed “turk” to find, but if we combine a few of these then we can be sure a computer can’t get them all correct while solving it will just cost too much for the spammers.
Another problem could be to figure out enough of them to keep spammers away. Also, we can’t match the answers to a fixed text string.

It might keep some readers from posting, but a heavily spammed comment field are probably a worse deterrant against posting (“will anybody see it or even bother to read it?”).

What other variants do you know of?

Earl Killian • October 13, 2011 2:11 AM

So is someone going to make an easy to use Firefox plug-in of this so I can enter CAPTCHAs correctly without having to go through 3 iterations?

renoX • October 13, 2011 3:52 AM

@Seiran the problem is that to break a Captcha you just need a (for example) 5% success rate, then you try 20 times: eventually the computer will register, but to have a good OCR you need a much higher success rate..

kingsnake • October 13, 2011 8:00 AM

“that is one heck of a subtle troll, kingsnake.

Posted by: christopher at October 12, 2011 6:58 PM”

I’m curious why you think that. If you think I was referring to this blog, I was not. It is obvious Bruce has an active editor: We’ve all seen him step in when he thought it was necessary. But it is a common misconception among business owners that when they contract to have a site developed for them, that when the site goes live the owner’s work is done. It is not. It requires daily upkeep — especially if a blog/bulletin board/etc. is involved. And no amount of automatic filtering will negate that necessity.

Ron • October 13, 2011 8:31 AM

I’ve started to use context-sensitive questions for a CAPTCHA. For example, on our hackerspace’s site, the CAPTCHA is simply, “Which day of the week is our weekly meeting?” – it’s something that somebody who belongs would know (or be able to find out fairly easily), but one that a machine would totally fail on.

It’s not the solution to every CAPTCHA problem, but it’s working great for us!

karrde • October 13, 2011 9:46 AM

@Natanael L:

I saw a math-teacher’s blog that required solving elementary arithmetic problems.

I’ve seen some blogs which require the user to answer a question, even though the answer is after the word ‘hint’ in the question-text.

Something like: which day of the week is after Tuesday? (Hint: Wednesday).

gingerbreadman • October 13, 2011 11:17 AM

Interesting paper.

My favorite question is from a forum of a Linux distro. This is required to set up an account. 🙂

What is the output of “date -u +%W$(uname)|sha256sum|sed ‘s/\W//g'”?(Required)

Jonathan Wilson • October 14, 2011 1:39 AM

Some sites I have seen use the simple math problem (e.g. “what is 2 + 5”). The domain-specific CAPTCHA seems like a good idea for forums with a specialized audience.

Paeniteo • October 14, 2011 8:11 AM

@Jonathan Wilson: “simple math problem”

Yes. Small sites can get away with largely trivial CAPTCHAs.

In fact, I totally stopped spambots in my personal mail contact form by including two radioboxes analogous to these:

(0) I am a spambot. (this was selected by default)
( ) I am a human.
An advanced version then used Javascript to change the value and hide the element. This way, only non-Javascript visitors were bothered with the protection at all.
For fun, I logged all rejected attempts and there were quite some – but no spambot bothered to change the radiobox.

So, to sum up, it’s always the question whether a site is valuable enough to warrant even the tiniest bit of attention from the spambot programmer. 😉

David Harmon • October 19, 2011 4:06 PM

Yes. Small sites can get away with largely trivial CAPTCHAs.

Not anymore they can’t. I’ve seen at least two sites which do use reCAPTCHAs overrun. One has enough active users to flag and nuke them. One… doesn’t.

Malcolm • October 24, 2011 8:34 AM

Interesting paper that utilize fundamental but simple ideas to break the text into characters.

Most likely ReCAPTCHA will simply increase its difficulty as a counter measure … usability will become a major issue with CAPTHCAs.

New Attacks on CAPTCHAs

Comments

Leave a comment Cancel reply