Breaking Semantic Image CAPTCHAs

Interesting research: Suphannee Sivakorn, Iasonas Polakis and Angelos D. Keromytis, “I Am Robot: (Deep) Learning to Break Semantic Image CAPTCHAs“:

Abstract: Since their inception, captchas have been widely used for preventing fraudsters from performing illicit actions. Nevertheless, economic incentives have resulted in an armsrace, where fraudsters develop automated solvers and, in turn, captcha services tweak their design to break the solvers. Recent work, however, presented a generic attack that can be applied to any text-based captcha scheme. Fittingly, Google recently unveiled the latest version of reCaptcha. The goal of their new system is twofold; to minimize the effort for legitimate users, while requiring tasks that are more challenging to computers than text recognition. ReCaptcha is driven by an “advanced risk analysis system” that evaluates requests and selects the difficulty of the captcha that will be returned. Users may be required to click in a checkbox, or solve a challenge by identifying images with similar content.

In this paper, we conduct a comprehensive study of reCaptcha, and explore how the risk analysis process is influenced by each aspect of the request. Through extensive experimentation, we identify flaws that allow adversaries to effortlessly influence the risk analysis, bypass restrictions, and deploy large-scale attacks. Subsequently, we design a novel low-cost attack that leverages deep learning technologies for the semantic annotation of images. Our system is extremely effective, automatically solving 70.78% of the image reCaptcha challenges, while requiring only 19 seconds per challenge. We also apply our attack to the Facebook image captcha and achieve an accuracy of 83.5%. Based on our experimental findings, we propose a series of safeguards and modifications for impacting the scalability and accuracy of our attacks. Overall, while our study focuses on reCaptcha, our findings have wide implications; as the semantic information conveyed via images is increasingly within the realm of automated reasoning, the future of captchas relies on the exploration of novel directions.

News articles.

Tags: academic papers, captchas, risk assessment

Posted on April 8, 2016 at 6:39 AM • 15 Comments

Comments

Trebla • April 8, 2016 8:18 AM

Great, I could use that. It is nearly impossible even for me (that is almost human) to get those image captchas right.

Dr. I. Needtob Athe • April 8, 2016 9:05 AM

If AI technology advances enough then maybe we’ll reach the point where trying to differentiate between a human and a robot is considered discrimination against robots.

Andrew • April 8, 2016 9:42 AM

Interesting, still more accurate to embed that captcha in a porn site page to unlock a higher level. Those visitors are really good at solving them!

David Leppik • April 8, 2016 9:57 AM

Good thing Bruce isn’t using reCaptcha. The bots will never figure out his security questions!

Jacob • April 8, 2016 10:02 AM

Just yesterday I gave up on entering a site with this image matching ReCapcha: On the first challenge, I was asked to select all pictures with a river in them. They were a bit dark, some showing a lake(?) and I was wrong in my set construction. I asked for 5 more challenges – all had similar issues. I just gave up. Maybe I am a robot, or more probably Google developers are very bright individuals with very bright screens.

Andrew • April 8, 2016 10:12 AM

@Jacob
You need to wait for a while, both before checking pictures and after.

ReCAPTCHA My Tor Signature • April 8, 2016 12:12 PM

With Google’s No CAPTCHA reCAPTCHA taking over the market, you should suspect this is really a profiling exercise, and not just about defeating spambots.

Sure enough, when you click that ‘you are not a robot’ button, what actually happens?

http://www.businessinsider.com/google-no-captcha-adtruth-privacy-research-2015-2

The No CAPTCHA reCAPTCHA then drops its own cookie from Google into your browser. It then takes a pixel-by-pixel fingerprint of the user’s browser window at that time, pulling information such as:

– Screen size and resolution, date, language, browser plug-ins, and all Javascript objects
– IP address
– CSS information from the page you are on
– A count of mouse and touch events

In addition, Google’s new CAPTCHA will also make use of any cookies that have been set by other Google properties — like Gmail, Search, Analytics, and so on — in the last six months. The belief is that humans use Google’s services in certain “human” ways, whereas bots do not, and those patterns can be detected.

All of this personally identifiable information gets encrypted and sent back to Google.

Perona told us: “The use of Google.com’s domain for the CAPTCHA is completely intentional, as that means Google can drop long-lived cookies in any device that comes into contact with the CAPTCHA, bypassing third-party cookie restrictions [like ad blockers] as long as the device has previously used any service hosted on Google.com.”

He added: “The mix of a fingerprint and first-party cookies is pervasive as Google can give a very high level of entropy when it comes to distinguishing an individual person.”

The way the new CAPTCHA works also seems to support this theory, as there appears to be at least three main CAPTCHA types, according to AdTruth’s research:

If Google cookies are present, and your fingerprint is obtained, you will often see the checkbox that asks you to prove whether you are a human.
If you delete all your Google cookies, the CAPTCHA will likely ask you to fill in a two-word CAPTCHA.
If you are using a form of anti-fingerprinting plugin, Google will likely ask you to fill in a two-word CAPTCHA, regardless of your cookies.

The implication is that Google isn’t just looking to identify whether you’re a human with its No CAPTCHA, but potentially exactly which human you are. The combination of first-party cookies and a browser fingerprint can be tied back to an individual — and most individuals simply clicking “I’m not a robot” won’t know this is happening behind the scenes.

Well, I’m shocked….. shocked that an upstanding American PRISM partner with a history of aggressive surveillance would screw people in the ass while simply trying to navigate the net. Truly unthinkable. /sarc

Michael Sierchio • April 8, 2016 2:27 PM

“Google can give a very high level of entropy” is meaningless drivel (from the Business Insider article).

Mark • April 8, 2016 3:49 PM

I don’t suppose they plan to make their CAPTCHA-solver available to the general public, so that us mere mortals can access protected content?

Do No Evil • April 8, 2016 5:14 PM

I have blocked the .google. domains as I’m sick of their corporate Spying-as-a-Service (SaaS).

Since about the time they started working with the NSA, they stopped using their slogan “Do No Evil”. I thought for years that was suspect, then after Snowden it became clear.

Do As Much Evil As Possible • April 8, 2016 8:45 PM

Am I to be surprised…

Don • April 9, 2016 12:50 AM

Now that you mention it, they did change from google to alphabet as well, better to fit in with the alphabet agencies perhaps?

Gavin B • April 9, 2016 10:41 AM

Perhaps it is time to reverse the Captcha logic as suggested here:
Practical application of visual illusions: errare humanum est

ACM article
(mirror).

Abstract:

As a failing peculiar to animate visual systems, visual illusions might be used to distinguish humans from “computer bots”, or any other artificial intelligence empowered with a visual capacity. Any such entity is unlikely to suffer the same illusions as our own, unless, of course, it has been specifically engineered to do so. This approach inverts, and complements, the logic of the Turing test: not requiring evidence of an intelligent capacity equivalent to that of human beings, but rather that of a characteristic human failing.

Gert-Jan • April 9, 2016 11:02 AM

This is a problem that is a long time coming. A captcha has always been a band aid.

I would like to have legislation that requires every AI, bot or what have you, to answer the question “are you human” with “no”.

This could go a long way in fighting spam.

In the long run, I don’t think there is an alternative, because soon machines will outsmart most humans.

TRX • April 10, 2016 10:05 AM

I hope the developers make the captcha-cracker open source. I could really use one.

Some of the images are so “noisy” that they take me more than one attempt to guess their text. And since I’m color blind, the colored images stop me cold.

Breaking Semantic Image CAPTCHAs

Comments

Leave a comment Cancel reply