Faking Domain Names with Unicode Characters

It's things like this that make phishing attacks easier.

News article.

Posted on April 25, 2017 at 5:25 AM • 31 Comments


Who?April 25, 2017 5:57 AM

First of all, I understand the value of multiculturalism. However I would say that the only fix to this problem is allowing only US-ASCII in the domain name system and other Internet core services. Allowing Unicode host and domain names not only allows problems like this one happening but also makes it difficult reaching these addresses from foreign computers. On the other hand not all computers and operating systems are Unicode-ready.

It does not fix the problem with dangerous hyperlinks, but it is a different issue.

Victor WagnerApril 25, 2017 6:12 AM

We should assign a color to each script available in unicode.

For instance, draw latin characters in black, cyrillic in green, greek in red etc.

Thus domain names in mixed script would be visually alarming, and names in other script at least distinguishable.

KaiApril 25, 2017 6:42 AM

This is a thorny problem - and one that was predicted right back when IDN first became a thing. Quite frankly, I'm surprised it's taken this long for someone to register a domain similar to any of the big, well-known domains.

We can not restrict domain names to US ASCII. That isn't fair on the other 90% of the world that doesn't speak English.

I don't have any workable ideas to solve this problem. There are things that can be put in place to mitigate the problem - like having a tooltip on the domain name that shows the raw representation of the string, but they require user-intervention and awareness.

Lack of an EV SSL isn't good enough. If I were to go to a domain that looked like https://apple.com and it didn't have an EV SSL cert I'm not entirely sure I'd notice it, particularly if it was a good enough forgery that the rest of the site looked legit.

Using a different font isn't going to be enough for most casual users to notice. Changing the text colour may work in some cases, but what about people using accessibility features or a different colour scheme on their computer (or a mono screen)?

Even having a human verify every domain registration would be difficult. In this case, what list do they verify them against?

keinerApril 25, 2017 7:03 AM

The browser should throw an appropriate warning, as nowadays for login without HTTPS. All safe...

Wendy M. GrossmanApril 25, 2017 7:36 AM

Kai: yes, I remember writing a piece about the risks of IDNs back when they were first mooted. I also remember DNS people saying, yes, but it's really important to support multicultural etc.

I was shocked to see that all this time - 20 years! - later it was still an open vulnerability.


DanielApril 25, 2017 7:44 AM

In Firefox it is easy to avoid an homograph attack.
Type about:config and set network.IDN_show_punycode to true

TatütataApril 25, 2017 8:00 AM

This phenomenon is really a form of cybersquatting (ICANN defines this as "bad faith registration of another's trademark in a domain name"), and the part of the solution is legal. The legitimate domain owner could file an UDRP ("Uniform Domain Name Dispute Resolution Policy"), although this wouldn't happen overnight, and possibly require the existence of an underlying trade-mark which is properly registered.

The defendant could always argue that their domain-name is bona fide, and not an fraudulent transliteration. But this would require them to come out into the open, which is something I would very much doubt if they are actually scammers.

The existing anti-phishing mechanism provided by many browsers could be used to warn users. The database could be updated by an algorithm that scans newly registered domain names and confronts them to existing ones. There is already a sort of legal mafia operation that spews out cease-and-desist letters to legitimate domain name owners...

Dr. I. Needtob AtheApril 25, 2017 8:00 AM

Yes Daniel, we know from the article. It's even easier with Chrome. When I clicked "help" and "about", it immediately updated to version 58, almost as if it had said "oops!" I'd better fix this right away!

GabrielApril 25, 2017 8:55 AM

A possible solution would be to display non-ASCII characters in, say, red color.

ThomApril 25, 2017 9:06 AM

I don't agree with ""or navigate to sites via a search engine when in doubt.""
Because there have been incidents in the past where some of the top results in for example google, were paid hits, which actually linked to a phishing site.

(This was the case with several banks among others)

So, just type the actual link manually - I'd be inclined to say use your bookmarks if you have them, but I'm worried that even those might be edited somehow by a malicious script or virus. In the same way, if you already have a virus, that virus might be causing your browser to redirect to the malicious site (which seems the same) whenever you type the "correct" url.

Oh the paranoia...

ChelloveckApril 25, 2017 9:12 AM

@Daniel: The same about:config change also works in Thunderbird. I don't know about anyone else, but I encounter these trick links more often in email than when browsing. I'm looking at *you*, www.paypaI.com! And that one doesn't even have to resort to punycode. Anyone know of good serif and sans-serif font families which have distinct glyphs for uppercase-I (0x49) and lowercase-L (0x6c)?

Peter A.April 25, 2017 10:06 AM

Some national registries limit the allowed set of Unicode code points to those used by languages spoken there. It limits the problem to an extent. However it is not clear what to do with general-purpose TLDs.

Would it be too restrictive to allow only ASCII in .org, .com, .net etc.? Personally, I am inclined to say no. Sites trying to reach global audience have to register ASCII-only domain names anyway (in whatever TLDs they choose) or otherwise many people would not be able to type them in. Language-specific sites can use domain names spelled out with non-ASCII character within their national domains (or in many cases in a domain of another nation where the language is spoken). They're going to have local reach anyway unless they ALSO register a pure ASCII spelling in some TLD.

One counter-example could be a .name domain where everyone may want to register a personal domain in its true spelling. But "unisquatting" is of very limited use there.

GeorgeApril 25, 2017 10:30 AM

@Thom -- yes, type it. With predictive URLs, that basically means using a search engine. So turn that OFF. But once you've got your bank's URL, bookmark it.

OR take the Apple route: Use apps not websites. NOT arguing in favor of that, necessarily. Apps have and certainly can be faked, but at least there's some kind of mediation.

Cegfault McIrishApril 25, 2017 10:41 AM

From the quoted article: "A simple way to limit the damage from bugs such as this is to always use a password manager"

There will always be slight-of-hand tricks and phishing attempts which fool users. A lot of the comments above talk about changing colors of characters. (a) this is not a solution for those color blind, (b) this is not a solution for people who won't notice near-colors (like gray instead of black), and (c) this doesn't solve the problem for one-letter-off attacks (I think of people changing .onion sites to .onion.to - no reason someone couldn't have "google.com.lskdjgldksfjhldsfkjhsdflhj.mysite.com", and it would probably fool a lot of people out there.

Changing colors may *help*, but at the end of the day we should encourage a mixture of person-verifiable solutions (changing colors) and machine-level protections (like password managers).

My password manager creates large, random passwords for sites and locks that form to that specific domain. Seems like the simplest solution to me. It also solves other problems (viz. weak passwords).

Patriot COMSECApril 25, 2017 10:52 AM

I saw this exact thing, which I did not understand at the time, while examining certificates in Nanjing a few days ago. I was using Fedora 25 and the latest Firefox. The site that was bad was https://www.telegraph.co.uk At first, that surprised me because I thought it was an uncommon site for anyone to target in Jiangsu Province, China.

"Visually, the two domains are indistinguishable due to the font used by Chrome and Firefox. As a result, it becomes impossible to identify the site as fraudulent without carefully inspecting the site's URL or SSL certificate."

AnonApril 25, 2017 11:37 AM

I can only repeat other comments:

1) Took long enough to be exploited

2) Browsers should highlight use of characters outside of ASCII a-z etc..

If you're visiting a website that's important (e.g. Banking) then type the URL. Search engines have been compromised in the past.

My InfoApril 25, 2017 12:35 PM


2) Browsers should highlight use of characters outside of ASCII a-z etc..

That is the blue-ribbon prize idea. Now to implement it... Any takers?

AlexApril 25, 2017 12:36 PM

No free lunch on this one, simple fixes like turn off unicode just won't fly. That said, we could smarten up the browser a little. For starters look at organic URLs from legitimate domains. How many stay within the code set of a single language/group? How many legitimate mixed domains out there?

Keep in mind that this is intended to trick a human user, to your computer the difference is clear as day. So the problem is to make it clear to the user with conditional logic what the risk of a link appears to be. I'd recommend something more than color highlighting for two reasons. 1) the site creator has control of coloring using css, etc. and 2) Color blindness is a thing, as are monochrome displays.

Thinking in terms of a spectrum of trustworthy to shady.

Well Known Domains
Single language code set URLs
Shortened URLs
URLs who's display doensn't match their destination
Mixed code set URLs
obfuscated URLs
Known attack sites

So you configure the browser to check the URL, check where it points, look it up, resolve it's shortened/permalink location and score it.

if the url looks shady enough then I'd treat it like a risky plugin and make it default to click to run. Show the user where their actually going before you send them there.

If I could figure out where to put the hook I'd do the same for redirects as well.

k15April 25, 2017 2:02 PM

Why do some sec. certificates' ownername get displayed in the address bar but for others you have to go to Developer Tools to view the cert.owner name? Is there a way to know *which* sites should be displaying the cert.ownername next to the lock icon in address bar?

David LeppikApril 25, 2017 3:02 PM

On my Mac, in Firefox it looks like "app1e.com" with a one instead of lowercase L. In Chrome the "l" is the same height as the "e". In both cases the real Apple website includes "Apple [US]" next to the HTTPS lock. These are visibly different but still subtle enough for phisihing attacks.

I like @Peter A's idea of limiting the alphabet by TLD. Perhaps for .com, use of non-ASCII should name the alphabet next to the URL.

The reason this hasn't been exploited until now is because it's not the main phishing vector. Most people don't look at the URL bar anyway, and there are plenty of psychological tricks to keep your attention focused elsewhere.

David LeppikApril 25, 2017 3:05 PM

I missed this comment about Apple's solution (at https://www.chromium.org/developers/design-documents/idn-in-google-chrome )

Safari has a whitelist of scripts that do not contain confusable characters, and only shows the IDN form for whitelisted scripts. The whitelist does not include Cyrillic and Greek (they are confusable with Latin characters), so Safari will always show punycode for Russian and Greek URLs.

So no Russian or Greek for you!

Darryl DaughertyApril 25, 2017 8:29 PM

It would probably be better to change the background color of the address bar to a uniform red in cases of mixed IDN/non-IDN than to change the color of particular characters based on their code page.

And throw an alert box with both the fully qualified domain name as requested as well as its de-obfuscated equivalent with and "OK to proceed?".

What one would do when the TLD is non-ASCII is an exercise left to others.

DroneApril 25, 2017 9:20 PM

@Victor Wagner said,

"We should assign a color to each script available in unicode."

According to most people in the Great Fascist-Socialist State of California, @Victor Wagner is Racist and must be dealt with quickly and violently. Resistance is Futile!

CR DrostApril 26, 2017 8:57 AM

The recommendations of "turn off the punycode domains" open up users to the same attack in a different form; all punycode domains in un-rendered form look alike...

A partial fix to this issue is to always render the IDNs both as ASCII source and Unicode... Still suffers from "these two non-ASCII domains are homographs" but there may be a way to render the ASCII source as an image which will look substantially different between the two, ghost it over the page with a message "this domain corresponds to this image, do not trust it if it looks unfamiliar to you."

NacnudApril 26, 2017 9:16 PM

I think I've made a similar comment to this in the past, but here I go again.

Users do not understand domain names and URLs. It's a mistake to expect them to. (Here when I say "users" I mean the general public, not computer scientists or people that read this blog. I mean your grandma, your banker, your lawyer, and your doctor.)

Therefore it's a mistake to expect a user to make security decisions (should I type in my username and password?) based on what they see in the browser address bar.

If you are with me so far, then read on...

The way the web works today, the browser checks that the X.509 cert matches the domain name that the user navigated to. If so, the connection is considered secure, and displayed as such to the user. The user can then rely on the domain name in the URL to know which entity they are talking to. Now, if you accepted what I wrote above, you must agree this is not a good solution, because it's expecting users to understand and make decisions based on domain names and URLs.

For example, even with the punycode fix in place, when I go to the https://www.xn--80ak6aa92e.com/ site, Firefox shows me a little padlock in the address bar, and tells me I have a "Secure Connection". And in fact it *is* a secure connection - but to whom? If that page content looked like Apple's home page, many users would be taken in, even with the punycode fix in place. After all, they have been told it's a secure connection, and it looks like Apple's page. Why is the hostname in the URL garbled? I guarandamntee you, most users out there would not know what to make of that, and would probably just shrug it off - they don't know what a hostname or a URL is anyway.

So here's how I think it SHOULD work. Every browser should prominently display a "Who am I talking to?" field to the user. This should be clearly separated from the page content. If using HTTP, or HTTPS but with an invalid cert, the field should clearly indicate to the user that they could be communicating with ANYONE (a big red warning sign). If there's a valid cert, then the user should see identitiy information from the certificate (NOT the URL). This should be something the user can easily recognize and understand and make a judgement on. For example, an individual or company name such as "Apple", "Google", or "Bank of America". The most likely source of this information is the common name (CN) within the Subject DN, but other elements of the DN or other fields might be also be used.

Of course, that brings us right back to the problem that, if we allow unicode characters in the subject of the X.509 cert, exactly the same trick could be used to create misleading CNs. So to make this work, we have to rely on CAs to check the identity information in the certificate signing request (CSR) and make sure its valid, an actual identity (NOT a hostname as is used in the POC site), and not playing homograph tricks. This is where the rubber has to meet the road. If someone asks a CA for a certificate with a common name of "xn--80ak6aa92e", it would be up to the CA to detect that a) this is a homograph for apple, and b) this name is not actually associated with the identity of the entity asking for the cert.

I think this puts the responsibilities in the right place:

The browser is responsible for checking that it's receiving a valid certificate that matches the domain name being used

The user is responsible for looking at the identity that the browser is presenting to them from the certificate, and making sure that it is the entity that they really want to be communicating with

Certificate Authorities (not domain registrars) are responsible for validating the identiy of entities to whom they are issuing certificates (and they need to do a much better job of this than they do today) and making sure the identities are not homographs.

Sorry this was super long, and all just personal opinion. If you've made it this far, thanks for reading. Would be interested to hear what others think of it. Am I missing anything important? Do I understand the technology correctly?


mostly harmfulApril 27, 2017 11:32 PM


You propose:

However I would say that the only fix to this problem is allowing only US-ASCII in the domain name system and other Internet core services.

The authors of RFC 3490 - Internationalizing Domain Names in Applications (IDNA) itself were way ahead of you. Your proposed "fix" was incorporated from the start!

See the first paragraph of the introduction:


IDNA works by allowing applications to use certain ASCII name labels (beginning with a special prefix) to represent non-ASCII name labels. Lower-layer protocols need not be aware of this; therefore IDNA does not depend on changes to any infrastructure. In particular, IDNA does not depend on any changes to DNS servers, resolvers, or protocol elements, because the ASCII name service provided by the existing DNS is entirely sufficient for IDNA.

Leave a comment

Allowed HTML: <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre>

Photo of Bruce Schneier by Per Ervland.

Schneier on Security is a personal website. Opinions expressed are not necessarily those of IBM Resilient.