Schneier on Security
A blog covering security and security technology.
« Friday Squid Blogging: Cipherlopods |
| USB Combination Lock »
March 15, 2010
"Measuring the Perpetrators and Funders of Typosquatting," by Tyler Moore and Benjamin Edelman:
Abstract. We describe a method for identifying "typosquatting", the intentional registration of misspellings of popular website addresses. We estimate that at least 938 000 typosquatting domains target the top 3 264 .com sites, and we crawl more than 285 000 of these domains to analyze their revenue sources. We find that 80% are supported by pay-per-click ads often advertising the correctly spelled domain and its competitors.Another 20% include static redirection to other sites. We present an automated technique that uncovered 75 otherwise legitimate websites which benefited from direct links from thousands of misspellings of competing websites. Using regression analysis, we find that websites in categories with higher pay-per-click ad prices face more typosquatting registrations, indicating that ad platforms such as Google AdWords exacerbate typosquatting. However, our investigations also confirm the feasibility of signicantly reducing typosquatting. We find that typosquatting is highly concentrated: Of typo domains showing Google ads, 63% use one of five advertising IDs, and some large name servers host typosquatting domains as much as four times as often as the web as a whole.
The paper appeared at the Financial Cryptography conference this year.
Posted on March 15, 2010 at 6:13 AM
• 49 Comments
To receive these entries once a month by e-mail, sign up for the Crypto-Gram Newsletter.
Do no evil ? It seems that such revenue could easily be blocked by the ad provider.
Why is this considered "illegal"?
I know typosquatters are very, very, often involved in illegal activities. But to what extent is typosquatting, or should it be, illegal?
Does it break trademark law?
Can someone please explain to me what is wrong with typosquatting? I honestly don't get it, especially when the paragraph states that most of the typosquatters are linking to the domain the user wanted anyway.
In other news, land prices near Wall Street are higher than those in rural Kansas, and businesses located next to Wall Street make more money than those in Kansas.
It seems fraudulent to me, especially in cases where the false site does not link to the main site. The typosquatters are preying on those who mistype the URLs, and make money off their mistakes.
I recall an issue several years ago between two major phone providers--one offering 1-800-OPERATOR, and the other using 1-800-0PERAT0R without ever making it clear that they were not the other service. (There was a lawsuit, and the copycat lost.)
@posted asking "what's wrong with typosquatting"
Some of it is addressed in the Anticybersquatting Consumer Protection Act (ACPA), 15 U.S.C. § 1125(d). This is a 1999 law, so the problem is not new.
In a nutshell, it provides some protection against misleading information. Based on my understanding, it would be legal for me to create a site called "www.schneir.com" (a common misspelling of the name of this site) and run information refuting this site, as long as I didn't lead someone to believe this was actually Bruce Schneier's site. An example of this is the www.falwell.com vs. www.fallwell.com case.
What cannot be done is to make a common typo and lead the person to believe they are at another site for gain. I cannot sell my own books by leading people to believe they are buying them from schneier.com, or mimicking a site to get people to submit information.
Imagine someone setting up a booth at a mall called SSA - the Social Security Assistants. This leads people to give them private information, thinking it is the Social Security Administration (SSA). This would be illegal for obvious reasons, so that was extended to the Internet world.
Basically, it boils down to intent, gain, and disclosure (you can't pretend to be someone else when a hapless visitor stumbles in).
I wouldn't be surprised if it was determined to be in violation of trademark law; if a URL is trademarked (or represents a trademarked word or phrase), then a slightly-altered URL, intended to confuse/replace/supplant the official URL, would be Substantively Similar and Intended to Defraud.
I didn't see any mention in the abstract of the word illegal
"I didn't see any mention in the abstract of the word illegal"
The wording of the abstract does indeed refrain from calling it "illegal". But the whole air is that of criminal doings.
A quote like "...otherwise legitimate websites..." do lead us to the conclusion that the authors look at "typosquatting" as not legitimate.
My question remainas: Why?
I could catch misspellings and redirect them to the correct site. This service could be remunerated by serving ads. So both the original misspellers and the target site would benefit.
I am a bit "meh" about this.
As I see it there are two types of typosquatting.
1 - the type mentioned by HJohn where a person registers a typo-domain and then either passes themselves off as the original or tries to discredit the original in the eyes of unsuspecting visitors.
This is wrong and should, IMHO, be addressed.
2 - the type sort of alluded to in the abstract where someone registers a typo-domain, fills it full of adwords and hopes to get some profit from the clicks of the unwary.
Personally, I see nothing wrong with this and if (for example) Google are upset that their adwords are being shown on www.example.com then they can take action themselves.
"I see nothing wrong with this and if (for example) Google are upset that their adwords are being shown on www.example.com then they can take action themselves."
I think there's significant precedent in copyright/trademark law that already provides grounds for prosecution, once such sites are found, if the legitimate-site-holder chooses to prosecute. Were this not the case, every injured party with grounds for complaint (on copyright/trademark grounds) would have to set precedent every time, and that's not the case.
In the UK, advertisers have made capital out of confusion.
ComparetheMarket has its own spoof website called "ComparetheMeerkat.com" as well as its own.
Since I have my homepage set to Google, I just type in whatever site I want to go to in Google's search bar and it presents me with a list of possibilities.
If I want to save the site location, I bookmark it.
I haven't hit a typo-squatting site since I started that.
Ah, but it is a big world. What is trademarked in the USA isn't in many other countries so pick a registrar and hosting provider in a country which doesn't recognise the trademark you are interested in and nobody can do anything about it. Contrary to common mis-belief US law doesn't rule everywhere.
There is a similar situation with copyright law. In much of the Middle East there is no enforcement of copyright law in respect to music and video. How much notice do you think the authorities will take of complex arguments relating to misspellings of common words and names?
@Andy Fletcher: "Contrary to common mis-belief US law doesn't rule everywhere."
US citizens don't believe US law rules everywhere. Cheap shots like that get tiresome.
Cheap shot, hardly. Just look at some of the recent attempts at imposing US law on non-US entities in regard to domains and the like. I'd like ICANN, IANA and the like to come under the ITU-T. That would add a treacle factor to everything but at least it would be justifiably International.
With respect to questions of US law, IIRC Ben Edelman is a Harvard-trained lawyer. Something tells me that the words used in the paper were carefully (and properly) chosen. Of course, IANAL.
I'd argue this is wrong for two reasons: 1) typosquatting is often used in combination with phishing. 2) even in cases where it only redirects to the intended website, it fails to properly punish bad spelling with an NXDOMAIN. ;)
Now should it be actually illegal? I'm inclined to say no, but that is probably some of my personal bias against the formation of new laws creeping in.
Does anyone know how often typosquatting is used to execute man-in-the middle attacks or phishing scams?
My ISP, Comcast, does automated typosquatting with their new "DomainHelper" service. Yes, they actually call it that. When you enter a domain that doesn't exist, you are redirected to their ad page. They have a cumbersome way to opt-out which doesn't always work. I eventually had to call customer service to complete the opt-out process. Now when I type a domain that doesn't exist, I get a proper error message from my browser.
As a terrible typist I frequently visit microsfot or ggogle. I find it funny that it happens often enough that someone registered the domain.
I'm not sure that's technically typosquatting, since instead of actually registering misspellings of domain names, they are breaking RFC2308, a far worse offense in my opinion.
Curiously enough, comcast does not do that in my area, though it seems Verizon dsl has begun to do so.
Let me preface this with the IANAL* disclaimer.
I worked for a travel agency in a pervious life. We had a web site called (for example) www.travelagency.com
We learned that there was a site called www.tarvelagency.com that sold links to a bunch of travel services (none of them ours). When we contacted them, they offered to sell it to us for a large fee.
We also had a case where a company exploited a common typo of our name to serve pornography. That doesn't take business away, but it does hurt goodwill and reputation because our clients ended up there looking to do business with us and got a porno site instead.
Are these examples illegal? No*. Are they scummy exploitations of honest mistakes? Yes.
What if every wrong number you dialed cost you $50.00? What if every person who called your number in error cost you $50.00?
I'm sure some folks out there will say "It does now" but the point is that typosquatters are profitting off of an honest mistake. Not illegal, but I think most would aggree that it's not exactly honest either.
Google and setting favorites work for tech savvy people who can recognize when they get a phoney, and for established visitors. But you advertise a new website and within hours a bunch of typosquatters have registered similar names. That certainly dilutes the power of your advertising.
*I am not a lawyer
In re: DomainHelper
I use Comcast at home, but did manage to opt out of DomainHelper. Can't help but be amused, though. According to most historical accounts, Almon P. Strowger invented what became the dial telephone because he was annoyed that the operator in his town, who was married to one of his competitors, would connect his callers to her husbands business. By replacing the corruptible human with an "incorruptible" machine, he hoped to end such chicanery. Fast forward 100+ years and lo, machines are not so incorruptible after all.
Why not buy out the typosquatters if they are causing problems for the business? Why not preemptively register the typosquatting names when putting the business online?
It's like starting a garden and not putting a fence around it to keep out the critters, and then getting upset that some crops are being eaten. Do you ask the police to try and trap the critters, or do you just put up a fence?
It's unlikely that anything will be done about this, since the issue has been around for over a decade. But it does show that IT professionals need to lead their clients into registering permutations of their name to avoid such issues and thereby remove the opportunity from the table. It's for this reason that I have all of my nonprofit clients also register the.com version of their domain name. The small cost of additional names and parking is trivial versus the possible damage to reputation and lost business relationships.
1) Typosquatters will often ask for absurd ammounts of money in exchange for the domains. Money that a business may not be able to afford.
2) You can't possibly register every possible typo people will make, or even most of them.
(also yes, most people call animal control when they have an animal problem. Just like you can't expect every homeowner to know how to handle wild animals, you can't expect every website owner to be sauvy to this sort of thing)
(still very "meh" about it but like a good debate :-) )
@Scott K - I totally agree where it is an infringement of copyright but surely holding the copyright / trademark on "Example.com" shouldn't extend to include rights over "Exmaple.com" otherwise it feels like a very slippery slope indeed.
@First Timer - here I sort of agree. If your travel agency is Example.com and Exmaple.com is set up to get typo-traffic it can be annoying but should it be something that an external agency deals with?
The costs you mention dont really apply here. It costs the owners of Example.com nothing (*) each time someone mistakenly types in Exmaple.com. It costs the visitor nothing (unless they decide to hand over CC details for the porn or whatever) to visit the wrong site then either give up or search again.
Typosquatters may well be profiting off an honest mistake but that is not really worse then when I do some research and hit a site that is irrelevant (for whatever reason).
Typosquatting only works when people type in URLs so for most users who seem to use Google to visit Google this shouldnt be as much of a problem.
(*) Yes, I can see that there is a risk of lost business but I have no idea how to quantify this without simply making numbers up.
Google AdSense has a specific policy against domain parkers, and a specific service FOR domain parking:
"Domains submitted for the AdSense for domains program may not violate any trademark (and related rights), copyright, trade secret, patent or other intellectual property right of any third party."
"Publishers may not deceptively drive traffic to pages participating in AdSense for domains. "
Of course if Google popped them offline, they'd move to another advertiser.
I take a more middle of the road approach on something like this.
If someone is leveraging a misspelling to get traffic, but does not misrepresent themselves or their intent, then I'm not for outlawing it. I personally think it it is unethical at times, but I don't believe every practice I disagree with should be outlawed. It is a free country.
However, if someone leverages a common misspelling in order to dupe people who make mistakes into thinking they are something they are not, that is an entirely different story. It's one thing for me to make a site called schneir.com to get traffic but to make it clear I am not schneier, it is quite another to make it look like it is his site in order to peddle my opinions or sell my books by using his reputation. That, I believe, should be illegal.
I think the fallwell.com case I mentioned above that was litigated. I, personally, think it was inappropriate of the site owner. However, I would not advocate outlawing it because the site holder simply presented an alternate view to traffic without misrepresenting the site (or peddling his positions using falwell's reputation).
No direct costs true. And quantifying it is a messy subject, but there is no doubt that typosquatters diluted the big budget web site launch.
Where we did have a direct cost was our clients went to the porno site. We got enough phone calls to make a significant bump in our monthly call stats. And I read somewhere that for every customer that complains, ten don't. That's lost good will, extra costs to deal with increased call volumes, and customers that got turned off from the experience and stopped dealing with us thinking it was our fault.
The first case (typosquatting for a similar service) is hard to quantify. But the second case, we could quantify the costs, though that was only a portion.
I don't know that an external agency is the place to resolve this, or if this is even resolvable. But it is an issue, and maybe that is just the cost of doing business. Put grain in a bin and the mice and rats will come.
Does anyone remember in 2003 when Verisign redirected all unregistered URL's to a page they set up? It caused an uproar in technical circles. That was, in essence, typosquatting en masse. The difference here is, I think, merely a matter of scale.
@ First Timer
Totally agree. There are a raft of unquantifiable issues that a typosquatter can cause a business, and certainly hitting a porn site rather than (in this example) a travel agent is likely to make you go elsewhere.
I think in this instance, I fall on the side of it being an unresolvable cost of doing business. Unless, of course, the world manages to create some internationally enforceable law that anyone who registers "example.com" also has the implied rights to similar domain names they havent registered.
As I find that unlikely, I think just accepting it is the better options.
For me the risk (and cost) analysis would be based on how many people will get to your domain via typing in the URL - is there a lot of non-internet advertising for example.
From previous posts here (and elsewhere) it seems that more and more people get to a website via Google (et al). While this strikes me as a bit crazy, it massively downgrades the risk posed by typosquatters.
From a business protection point, I think that there are greater risks - such as a site keyword stuffing so it draws search engine traffic away from you.
The whole thing changes totally when the site copies trademarked / copyrighted material or purports to be the original to misdirect / gain sales.
A simple URL typo (especially in a different market segment - eg porn vs travel agency.... although I can see an overlap...) seems almost trivial by comparison.
One sort of typosquat attack I'm certains has to have been done at times would be to take a common typo of a financial institution, such as a bank or credit card, run some kind of malware (such as a keystroke logger) then automatically forward them to the intended site.
For example, someone types www.samlpebank.com, it runs its magic and forwards them to www.samplebank.com. Then, the procede to log in to the legitimate site... meanwhile, the user name and password are sent to the fraudsters.
Could also be done for email, PII, a business, anything.
Of course, one would hope their window would be short lived, but that's the pesky little think about credit card numbers, log in information, and PII... once it is disclosed, it is impossible to undisclose.
There's a lot of issues here and to answer them all fully would take quite an answer.
There's a big difference between typosquatting and using similar names. There's a long history of brand names that sound like generic words. If I had a range of animal toys and registered Zooz as a trademark for my toys, very few people would froth at the mouth claiming that I was typosquatting the English word "Zoos". What makes my blood boil is when the owner of Zooz turns around and claims that they now have a right to Zoos.com but that's another story.
For its revenue stream typosquatting relies on the surfer attempting to get to a known website and mistyping the name. What they receive when they do this varies, but is typically one of an advertising only site showing ads related to the real site, a bogus clone of the real site diverting sales or a copy (including redirects) of the site of a competitor. The typosquat typically gives no real additional value to the web and the intent of the squat is purely to profit from the brand name the user intended to type. This is different from two non-competing businesses that just happen to have similar names. In my experience the behaviour of typosquatters means that there are very few ambiguous cases and where there are traditional trademark laws apply and the trademark lawyers can argue it out.
In my opinion any campaign against typosquatting needs to attack the revenue streams, both in terms of advertising and traffic. There's already plenty of ways to shut down trademark infringements and (again this needs a whole article of it own) any attempt to make it easier for "legitimate owners" to "recover" typosquats is going to be misused by making it easier for large companies to attack small companies with similar but legitimate names. Sorting out these issues is one for the civil courts who have a long history of balancing competing claims, not for little ad-hoc tribunals.
Typosquatting should be purely a trademark issue, related to diverting revenue from the targetted site. If typosquats are used to spread malware (I've yet to personally encounter one that does), the problem is really with the browser and the user just as with any other site set up to spread malware; the malware distributers are just using the name as one more attack vector among many to get the surfer onto their site. I can't see that forcibly closing typosquat sites will help much, the answer is to cut off the traffic and make them uneconomic to maintain.
@Rob: there's an easy way to get out of stupid ISP "helper" DNS ad servers - get a free OpenDNS account and set that up on your router as your DNS.
I had so many DNS problems when I was using OpenDNS (for my own network, not my employer's) that I switched to Google DNS and all of my DNS problems vanished.
I've wondered for a while now whether all of these domain name squatting problems simply mean that we're abusing DNS to serve as an end-user site access mechanism—i.e., the fact that we use DNS hostnames as the short names that users type into address bars to reach websites, and that businesses therefore covet and advertise.
Perhaps a technical solution is to move away from the hostname-as-access-keyword model to another where we simply use search engines or online directories as the access mechanism. This would be a bit like the old "AOL keywords" mechanism, except that we would have many such keyword directories freely competing with each other, and users would be able to use any one they wanted at any time. When you look at it from the perspective of this proposal, the problem with DNS is that it forces the respectable keywords sellers to honor the keywords sold by their less respectable colleagues.
Note that this is not a proposal to get rid of DNS—DNS is still a useful mechanism that isolates applications from IP addresses, and therefore allows for things like changing the IP address of a site without breaking hyperlinks to it. But hostnames would become a relatively low-level detail of internet addressing, not the primary content keyword mechanism.
This isn't really the place for me to get into these sorts of rants, but it bugs me to no end when people suggest using OpenDNS. OpenDNS messes with NXDOMAIN too, unless you register which is needlessly complicated. There are plenty of free and standards conforming DNS servers out there, sometimes I feel the only reason so many people use OpenDNS is because they associate the "Open" part of its name with "good".
typosquatting also shades over into phishing attacks when the name of the phishing site contains the name of the legitimate one, or is off by an easily-mistaken letter. But the mechanism for getting to phishing sites is generally clicking on a link rather than typing in a browser. (I'm trying to think of the last time I typed a full url into a browser. Everything goes by links or autocompletion...)
I published a tool that does a lot of this stuff years ago.
urlcrazy analyzes domain name typos. It generates typos, checks whether they are in use and displays popularity. It's written in Ruby for Linux.
You can download it from http://code.google.com/p/urlcrazy/
It's interesting to run it on the list of top sites from Alexa. You can collect a lot of the same data as these guys did by using whatweb, i.e. redirects, google-analytics accounts, etc.
This is yet another example of the free rider problem with regards to the internet. These web domains are not like real estate, which has an intrinsic value because of real world location, but are only valuable because the people who registered the useful websites spent time/money/energy promoting their site in a way that makes people want to visit it.
The typosquatters are free loading off of this effort, just like the spammers are freeloading off of the people who build servers to handle legitimate traffic. This may not be illegal, but no one can argue that it is ethical.
The real problem is that web advertisers (including Google) haven't become sophisticated enough to stop people from employing this kind of chicanery.
Here is a related paper discussing combined typosquatting and phishing:
The paper conducts a comprehensive study of SSL certificates for legitimate popular domains, as opposed to those used for typosquatting and phishing. Drawing from extensive measurements, it builds a classifier that detects malicious domains with high accuracy (validated through experiments).
The approach is orthogonal to existing mitigation techniques and can be integrated with other available solutions... in other words, it shows that, besides its intended benefits of confidentiality and authenticity, the use of HTTPS can help mitigate web-fraud.
I'm in a curious situation where another business picked a business name just one character away from mine, and spelled in an unusual fashion where his customers are constantly emailing me instead. My site is Quirkz, his site is Qirkz, and apparently the average English-speaker is incapable of writing Q without putting a U right after it. I'm constantly getting email asking for reservations, tickets, directions, etc. Essentially, he picked a name that guarantees I'm typosquatting him in reverse.
As he's a music club in Australia, and I'm Stateside and not connected at all to music, there's no business overlap, so the resulting stream of email is really obnoxious. I have little to gain from getting the misdirected email -- sometimes I try to give a little company info in my replies, but his customers mostly don't want what I have to offer.
Based on the frequency of the messages I get for one little music club, I'm going to guess that the number of people who typo a site name is actually pretty significant, especially for major sites. You'd think between bookmarks, browser history, and search engines it'd be easy to land in the right place, but an astounding number of people apparently type (and mistype) URLs (or in my case, email addresses) from memory.
In my case, I'm a pretty nice guy, so I tend to reply with a correction and point people in the right direction. If I wanted to be a jerk, in this case it'd be pretty easy to start trashing another business' reputation, either by passively ignoring requests, or by actively being inappropriate in my replies.
@Quirkz: "In my case, I'm a pretty nice guy, so I tend to reply with a correction and point people in the right direction."
Must be frustrating, especially since no ill-intent appears to be present.
There are a couple things that may make things easier for you:
1) you may consider an autoreply. Something like "Thank you for contacting Quirkz. If you are looking for the Australian music club Qirkz, it is at a similar site www.qirkz.com.
2) You may also be able to automatically delete any emails that come from austrailian domains. They get the auto reply directing them to the correct site, you don't get the email.
Maybe these won't work for you, just some thoughts.
@Quirkz and at HJohn at March 23, 2010 12:04 PM
I meant to add also "Your message will be replied to in the order received."
@ Joel "The real problem is that web advertisers (including Google) haven't become sophisticated enough to stop people from employing this kind of chicanery."
Joel, you dont grow a company into a multi billion dollar enterprize in a few short years without some sophistication!
The truth is that Google IS PART OF THE PROBLEM here. As the largest seller of advertising on the Internet and as one of the few companies responsible for stocking parked typo pages with ads they are in fact responsible and are profiting handsomely from the act of typosquatting. If google were to filter typo traffic from their network their revenues and thus stock price would suffer and as a result they have moved very very slowly to address this issue. I believe they may get there but they have to move slowly so as not to shock shareholder value. Remember the ads you see on typo pages are the same ads that appear on google and yahoo search result sunder their "sponsored listings" Advertisers are the one lacking sophistication. They are for the most part unaware of just where their ads are located. Most believe their ads are triggered by search engine queries but Google and Yahoo have found new avenues to pump clicks to their advertisers by placing their ads elsewhere, like on typo pages, see www.temberland.com for example. PS
I own temberland so please click on all the links and click often, me and google need some new shoes
Schneier.com is a personal website. Opinions expressed are not necessarily those of BT.