Bypassing Intel's ASLR

Researchers discover a clever attack that bypasses the address space layout randomization (ALSR) on Intel’s CPUs.

Here’s the paper. It discusses several possible mitigation techniques.

Tags: academic papers, Intel, malware, mitigation, side-channel attacks

Posted on October 19, 2016 at 2:19 PM • 15 Comments

Comments

Daniel • October 19, 2016 6:23 PM

I would like to point out that while the paper discusses mitigations those mitigations are not mitigations the end-user can deploy. They require significant effort on the part of the software developers, the kernel coder, or the hardware vendor. So don’t expect fast solutions to be coming in an update tomorrow.

Clive Robinson • October 19, 2016 8:03 PM

I’ll need to ponder more on it but I suspect that there are further ways to improve on this attack, such that some of the mitigations given will not be as effective.

ab praeceptis • October 19, 2016 8:56 PM

Clive Robinson

…I suspect that there are further ways to improve on this attack…

I agree without so much as a closer look at the matter.

Simple reason: One of the few things everyone seriously engaged in or concerned with the field of security should have learnedis that security can’t be created based on afterthoughts leading to hodgepodge graft on action.

Security requires a profound understanding of all relevant factors and implications a priori.

One usually can extend security measures, iff the above requirement is met, i.e. if there is a solid basis.

But I see the same sins in hw that I see in software. “How do we get functionality X or performance Y?” (almost always) are sales or marketing driven approaches. I don’t attack them; companies, after all, need happy customers.

The conditio sine qua non of security (or a solid product anyway) in our highly complex field is fully understanding the problem matter and properly, fully and formally defining it and then the solution approach.

I want to mention though, for the sake of fairness that I happen to know that intel did understand and learn a lot meanwhile; they have some quite high class people around (I happened to gain quite some insight through one of them).
In a way intel is as f*cked by x86 as we are. With billions of boards and chips out there their space to maneuver is very limited.

So, my point is not beating on intel but rather a simple question: Did we really learn our lesson from what we’ve seen to happen with x86? Do we really properly define the premises and our approaches? Did we then really fully model them (and I don’t mean “will it work?” modelling but “what ould be potential problems?” modelling). I’d love to see the major chip corp that worked well so as to never come in intels x86 situation but I’m not holding my breath.

You want a concrete example? OK. Remember the x thousand cores chip recently? Almost everyone danced in circles and seemd to see mainly one thing “rrrraaw speed!”. I hinted that I doubted the impressive numbers for their buses and also, what little bandwidth was left for a single core. But hey, I didn’t want to ruin the fun.

Let’s look again: a thousand cores or even x thousand cores? How about DOS attacking that thing? If China happened to have strategic military systems or core infrastructure running on those chips and we happened to become enemies, I’d certainly look into cardiac arresting their infrastructure by DOSing their miracle speed demon processors.

So: Did they model their buses looking at more than “will it somehow work?”? I strongly doubt that. DOS works against boxen, it works against software, betcha it works against processors, too.
And how does that bus system behave under brutal stress? Has that been modelled? How does it behave when someone introduces spikes? Will it spill its guts, timings, and other sensitive information? Etc, etc.

Math is our friend. We should ignore him a lot less and rather gladly take his friendly stretchend out helpful hand.

Nicola • October 20, 2016 5:39 AM

Very good Clive, well said.
I didn’t have the time to read the paper properly, but do you think this specific attack will only work on Haswell or can it be tuned for other modern arch(sky lake, kaby lake, zen)?

r • October 20, 2016 7:09 AM

To anyone in the comments, the paper is fairly short. It hints at all branch prediction implementations potentially being vulnerable in this aspect through basically cache attacks. A limitation is guestimation attacks have to be performed on the same core, so that’s a limiting factor and potentially a mitigation for people who use virtualization.

Clive is right, there may be more problems in this area – the paper references several other papers performation cache and branch prediction style attacks.

ATS • October 20, 2016 12:27 PM

The attack described is pretty much guaranteed to work on any processor that uses BTB(branch target buffer) which is basically any processor you’d actually want to use(basically everything that ARM, Intel, AMD, IBM, or Oracle will sell you). It is also unlikely a hardware solution will be viable because not only is a BTB performance critical, its indexing is performance critical, and it generally is a critical path.

Realistically, the root of the problem is ASLR. ASLR is basically a bandaid trying to cover other software sins.

AJWM • October 20, 2016 12:47 PM

Realistically, the root of the problem is ASLR. ASLR is basically a bandaid trying to cover other software sins.

This. A thousand times, this.

The problem stems from treating data and code interchangably. Computers don’t have to do that. Tagged-word architecture (where hardware tag bits define whether the word is code or data, and if the latter, what kind of data) goes back nearly 50 years to the Burroughs 5000 series. Discrete address spaces (where a signal line indicates the CPU is fetching an instruction or fetching data) was present in even some early microprocessors.

Now, to be sure, at some level the machine needs to be able to somehow convert that which was data to that which is code (the Burroughs systems extended this distinction to the filesystem, where a file was typed as code or data, only a designated compiler could convert data to code, and only the OS could so-designate a compiler, although inevitably there was a way around that involved manipulating backup tapes on non-Burroughs hardware), but if you’re paranoid enough that will require manual intervention.

As long as processors permit op-code fetches from arbitrarily writable memory, the problem won’t go away.

(And that’s why my downloads directory is on a partition mounted non-executable, although that’s hardly perfect.)

Clive Robinson • October 20, 2016 5:02 PM

@ AJWM,

The problem stems from treating data and code interchangably. Computers don’t have to do that.

Actualy with most OS’s they have to.

It was once thought that the Harvard architecture would prevent the “treating data as code” problem but it only does it at one level.

Assume that your code is actually an interpreter which is turing compleate and it’s “tape” is the data, and you can see what the problem is. All you’ve done is move the problem up a fraction in the computing stack.

But…you can also move it downwards below the CPU ISA level. It has been found that due to the complexity of the IAx86 bus architecture, a “phantom” Turing engine exists in the bus control logic…

Solving these problems are extreamly costly, not just in terms of hardware but also in technical debt on prevention down the line… Which to all intents and purposes can only increase with time.

As has been observed on a number of occasions ‘Nobody said it’s easy…’.

r • October 20, 2016 9:05 PM

@All,

What @AJWM said is the same gist I got from the paper, specifically the ARM assertion he made. This particular paper is 0x0f 0x31 limited (rdtsc) making it platform dependant. But the timing attack and guestimation should work on any platform where fine-grained time can be inferred or measured. The make a specific point to illustrate the their payload is ‘heavily’ optimized (for size) as a previous paper attacked a different cache that worked across cores and utilized much more complex code to smash through the obfuscation of ASLR.

It’s funny to me, that we’ve been sitting on rdtsc measurement blocks for years (10++) and nobody ever publicly noticed this. It’s very very small and until this paper came out as far as I’m concerned this would’ve been a very benign looking peice of software.

Curious • October 21, 2016 1:12 AM

I am ofc no expert on this subject matter, but I hope ASLR wasn’t some gimmick with no deeper meaning, as if it could have been more for testing purposes, or as some half thought out security measure. I am assuming ofc that ASLR was intended as an effective security feature.

.Clive Robinson • October 21, 2016 6:53 AM

@ Curious,

I am assuming ofc that ASLR was intended as an effective security feature.

That depends on your viewpoint…

ASLR was in effect a limited response to a specific problem type of Return Oriented Programming (ROP) techniques –you change a call return (RTN) in memory to a jump to another address (JMP 0x… )– to carry out a return-to-libc attack etc. Changing the RTN can be done in a number of ways like a buffer overflow or busting the stack.

As a result ASLR was a lot of effort for limited results, and requires other supporting techniques, such as Data Execution Prevention (DEP). DEP prevents certain memory sectors, like the stack, from being treated as executable code…

The idea behind ASLR was, as malware authors know the “fixed” addreses in process memory of data structures –like a buffer in the heap or the stack–, library calls etc they can use them to their advantage. To limit this advantage you stop using “fixed” addresses and change the address of the calls etc for each process by randomisation such that they are at “random” addresses that “in theory” are, unknown to a malware author…

There are a number of assumptions behind the theory, which do not translate well or at all in practice or cause other issues such as significantly increasing the Virtual Memory space for each process.

Thus malware writers can and do use other or additional techniques to get at stack, heap and thus libc etc.

Have a read of either (or both),

http://security.stackexchange.com/questions/18556/how-do-aslr-and-dep-work

https://en.m.wikipedia.org/wiki/Address_space_layout_randomization

For a more indepth description.

May • October 21, 2016 10:16 AM

Wow I like this “.Clive” (dotClive) even more than regular Clive 🙂

Howard Chu • October 22, 2016 6:55 AM

A better solution to the problem ASLR tried to address would be a new ABI that uses 2 separate stacks, one for function parameters and a separate one for return addresses. Then a data overrun/stack-smashing attempt can’t affect the call-return stack.

Clive Robinson • October 22, 2016 11:00 AM

@ Howard Chu,

A better solution to the problem ASLR tried to address would be a new ABI that uses 2 separate stacks

It would also help keep things considerably more efficient, but I’m not sure how much more secure it would be.

As Nick P points out from time to time strongly typed languages would help as well.

The simple fact is there are to many programers NOT doing the things that would reduce the attack surface. So much so in fact, the option to not be secure should be taken away from them at all levels…

TJ • October 24, 2016 9:37 PM

NX, ASLR, Heap Cookies, RET/stack cookies, MPX, SGX were all defeated without side channels within weeks and months. Write-back hashing is the only undefeated protection and it’s only in a couple embedded systems like the xbox 360 and requires handler modification else it blocks everything from stack execution to page table glitching.

They all sale like anti-virus subscriptions though even though they are obviously limited.

Bypassing Intel's ASLR

Comments

Leave a comment Cancel reply