Unicode URL Hack

A long time ago I wrote about the security risks of Unicode. This is an example of the problem.

Here’s a demo: it’s a Web page that appears to be www.paypal.com but is not PayPal. Everything from the address bar to the hover-over status on the link says www.paypal.com.

It works by substituting a Unicode character for the second “a” in PayPal. That Unicode character happens to look like an English “a,” but it’s not an “a.” The attack works even under SSL.

Here’s the source code of the link: http://www.p&amp#1072;ypal.com/

Secuna has some information on how to fix this vulnerability. So does BoingBoing.

Posted on February 16, 2005 at 9:17 AM28 Comments


David Pashley February 16, 2005 10:10 AM

The problem is that it is fundamentally unfixable. While people want to use glyphs other than the 127 in ascii, we will have this problem.

A suggested solution would be to have a symbol showing that the url contained non-ascii characters.

Rory Alsop February 16, 2005 10:19 AM

The two points of note with this vulnerability are:
1) Internet Explorer is by default immune to this, as IDN support is not enabled.
2) The Firefox developers have been proactive in sorting the issue, and in future, IDN will be disabled on install (and enabling it will prompt with a security warning.)
Of course the media has hyped this out of all reality, claiming a major bug in Firefox, when it is just a feature which can be misused, the same as many others. Standard advice to protect against phishing still applies – type the URL yourself and don’t trust links implicitly.

Bill Godfrey February 16, 2005 10:59 AM

What if (say) a Russian organisation legitimately wants to use the cyrillic ‘a’. All these proposed fixes would stop that working, or at least unfairly taint this legitimate Russian name.

Perhaps warning where components (between dots) have mixed languages would be better?

[all english].com is fine.
[all cyrillic].com is also fine.
[mixed].com is not.

I don’t if there are legitimate cases where mixed language is needed. Anyone?

Anyway, the standard advice to never click links in email, and instead type in your own address or use a trusted bookmark still applies.

Kevin February 16, 2005 11:44 AM

The vulnerability fix that the Shmoo Group links works on Firefox (the turning off IDN support in compreg.dat) although note should be made that Firefox automatically re-creates that when you install/uninstall an extension. However the extension linked to remove IDN support did not work for me, and the Trust Bar can’t verify non-https links.

Tim February 16, 2005 12:02 PM

A visual cue would be rather easy here. Firefox already color codes the URL depending on whether or not SSL is in use. It seems to me color coding characters above 127 would be an aid here.

However, you’re still fighting education. So many users would still be content to “look at the pretty colors”.

grey February 16, 2005 12:20 PM

If you’re using Firefox, there is an Extension called SpoofStick which just released a new version which helps with the IDN/unicode issues as a means to distinguish between what a site’s URL appears to be, and what characters it actually is.

I had been using it prior to this, as it’s just general handy to make sure that the site you’re browsing is really what you think you’re browsing, and with Unicode that’s all the more true.

SpoofStick’s site is here: http://www.corestreet.com/spoofstick/

Here’s a screenshot of it in action:

Anonymous February 16, 2005 1:00 PM

How about if UTF-8 was used for domain names and everyone would simply reject encodings that are longer than necessary? Wouldn’t this guarantee a unique representation for each Unicode character?

Aqualung February 16, 2005 1:54 PM

Is this really a security exploit? I mean yeah, it allows for slightly more convincing URI spoofing, but the unicode encoding characters (ampersand, pound) are NOT valid characters for a domain name, meaning that the domain in the href will never resolve. It may be possible to get around this with some social engineering or a javascript exploit, but it would seem that the javascript exploit would work fine with a non-unicode domain name as well.

Dean Harding February 16, 2005 4:08 PM

“A visual cue would be rather easy here. Firefox already color codes the URL depending on whether or not SSL is in use. It seems to me color coding characters above 127 would be an aid here.”

That’s an English-centric solution. What if you spoof a valid Crillic URL with a half-Cryllic/half-Latin url. Both would be colour-coded (since they’d both contain non-ASCII characters) and hence the colour-coding would be pretty useless.

You could colour-code (or dissallow) mixed-script domains (expect where mixing scripts is in normal use, like Kanji/Katakana, etc) but there are still times when mixed scripts are nice. For example: http://www.Ελλας-fans.org

Anyway, I think what Firefox and Mozilla are doing (disabling support until we can think of a more complete solution) is probably not a bad idea.

Bill Sharrock February 16, 2005 6:47 PM

Would it be worthwhile to store approved domains as a hash and then, as you hover over a link, have the browser examine the URL and compare the domain against the approved list? If the browser determines that the link goes to the proper domain it could provide a visual cue to the user that the link is “good.” paypal.com would be a lot different from pаypal.com

I’m aware that links can redirect a user to another site so the idea probably isn’t trivial to implement but is it feasible?

Dido Sevilla February 16, 2005 9:23 PM

How about if UTF-8 was used for domain
names and everyone would simply reject
encodings that are longer than
necessary? Wouldn’t this guarantee a
unique representation for each Unicode

Yes, but it won’t fix the problem. The main thing here is that there are a number of characters in Unicode, known as homographs, that visually look the same, e.g. an ASCII ‘C’ looks like the Cyrillic ‘C’ for instance, so the attack still works even without resorting to devious encodings.

The only solution I can think of here would be to signal to users that they are visiting a domain that uses IDN.

Dido Sevilla February 16, 2005 9:46 PM

Apparently, the IDN working group was aware of this problem when they released RFC3490. The security considerations they give in section 12 state:

To help prevent confusion between characters
that are visually similar, it is suggested
that implementations provide visual indications
where a domain name contains multiple scripts.
Such mechanisms can also be used to show when
a name contains a mixture of simplified and
traditional Chinese characters, or to
distinguish zero and one from O and l. DNS
zone adminstrators may impose restrictions
(subject to the limitations in section 2) that
try to minimize homographs.

Now if only people engaged in making IDN-aware applications would take these considerations to heart, or if they can figure out a better solution than the IETF IDN working group suggests…


Chris Becke February 17, 2005 3:57 AM

The whole trust-by-url system is flawed. In the general case how are you supposed to trust that any URL actually maps to the corporation/institution you think you are dealing with?

Why does ssl and thus ecommerce security rely on the DNS system being unhacked and providing “good” company identifiers when it was never designed to perform that service?

stew February 17, 2005 8:29 AM

IMHO, the most scary thing is that the SSL-certificate looks valid and is signed by a CA accepted by all browsers. Only a close look at the details of the cert shows the cert doesn’t belong to “paypal.com”. The rest of it looks perfectly legitimate. I doubt the likely phishing victims would notice.

If there was something like a ~/.ssh/knows_hosts for SSL-certs, which is checked automatically by your browser, you would instantly know that the shmoo-paypal isn’t the paypal you visited before
(Of course a too simple implementation would break if a web site uses several servers with different certificates). Why isn’t that done?

Ray February 17, 2005 2:07 PM

Use the Source, Luke.

If you even only know the most basic of HTML this, as every phishing attack I’ve ever seen, is obvious the second you look at the source. I know that so far attempts to get the average user to even check certs has not worked but perhaps it is time that we start trying to teach the basic skills to read enough HTML to let users protect themselves.

Yeah. You’re right. Ain’t gonna happen.

Ah well. We can always dream.

Matt February 17, 2005 2:24 PM

This is far more of a problem with the way the domain registry works than with IDN itself, or the browsers. Verisign could easily implement the IDN language tables and prevent registrants from mixing languages in their domain names.

It doesn’t completely fix the problem, but then the problem isn’t unique to IDNs anyway. Unless you’ve got a font that distinguishes well, it’s just as hard to tell the difference between paypal.com and paypa1.com.

Stephen Norris February 17, 2005 6:31 PM

On my desktop machine anyway, the font is visibly different (the “a” is about 75% as big as a normal a). Also, while the page was loading, Firefox displayed the undecoded unicode, so it was full of some strange bytes…

ed nixon February 18, 2005 7:29 AM

I hover over your “Web page” link and the address displayed at the bottom of my Mozilla browser says:


I understand the theory of what you’re saying, but I’m not seeing it bourne out, either here or in the numerous spoofs I receive pretending to be PayPal security alerts.

I imagine there are a goodly number of people who don’t bother to double check in this way, however.


Kevin Kirkpatrick February 18, 2005 12:17 PM

Chris Becke is the only one who’s nailed it so far. Security through “eyeballing -the-URL” is not and cannot ever be highly reliable. With respect to IDN, all “fixes” for this are: “Let’s make URL’s for non-English-speaking non-ASCII-using people ugly, or disable them entirely”.

Not only is this discriminatory, but the bigger picture is missed as well: these solutions only fix the problem for the small fraction of the population who understand what the problem is – and they do not address the larger problem of URL spoofing. They will not protect my parents from spoofing techniques. Additionally, they will not protect my parents, or anyone else for that matter, from clicking on a link to:

The only true fix is education – in fact, one simple rule:
For any site you plan to give private information to, type the domain in the first time you go there, bookmark the page, and use that bookmark to access the page thereafter.

Or, in short: clicked links = something stinks

Chung Leong February 18, 2005 2:48 PM

Banning mixed script domain names wouldn’t really solve the problem. The full Roman alphabet is present, for instance, in the CJK codespace as full-width characters. And then there are diacriticals and combination characters. A little tick mark on a letter is easy to miss when you’re not expecting for it. And don’t forget the compound Latin characters for Serbo-Crotian transliteration.

Another thing, I wonder if you can register a name with invisible characters like ZERO WIDTH NON-JOINER, and ZERO WIDTH NO-BREAK SPACE. There are lot of these in Unicode.

At this point IDN not be implemented at all. There are so many issues yet to be resolved. We’re only talking about phishing with Roman character domain names. What about phishing in other scripts?

In Arabic, for example, the kashida can be used to lengthen a word. You also have the presentation forms to worry about.

In Devanagari, the vowel i appears before a consonant even through phoneticaly it comes after. Thus [i][ka] looks the same as [ka][i].

In Chinese, a character can have many variants. They are slightly different in appearance but are understood to be the same character. You also have different characters that sound the same and mean more or the same thing. On top of that you have different characters that mean different things but look very similiar (the “wood” radical can be easily mistaken for the “hand” radical, for example). One can almost say that it’s impossible to stop homographic attacks against Chinese domain names.

Chris Becke February 18, 2005 5:44 PM

Well, if the information from the cert was made first class information – i.e. displayed prominently to the user (browser toolbar plugin) I see no reason that clicking on links should not be as safe as typing in links.

It would be nice if the user could in some way mark (when initially creating an account) wether they personally trust a certificate as associated with an account. Then, if one does end up looking at a shimmed p@ypal cert it will be shown as untrusted and the scam will be exposed.

Leave a comment


Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.