Exploiting Spectre Over the Internet

Google has demonstrated exploiting the Spectre CPU attack remotely over the web:

Today, we’re sharing proof-of-concept (PoC) code that confirms the practicality of Spectre exploits against JavaScript engines. We use Google Chrome to demonstrate our attack, but these issues are not specific to Chrome, and we expect that other modern browsers are similarly vulnerable to this exploitation vector. We have developed an interactive demonstration of the attack available at https://leaky.page/ ; the code and a more detailed writeup are published on Github here.

The demonstration website can leak data at a speed of 1kB/s when running on Chrome 88 on an Intel Skylake CPU. Note that the code will likely require minor modifications to apply to other CPUs or browser versions; however, in our tests the attack was successful on several other processors, including the Apple M1 ARM CPU, without any major changes.

Tags: exploits, Google, vulnerabilities

Posted on March 18, 2021 at 6:17 AM • 19 Comments

Comments

metaschima • March 18, 2021 10:02 AM

It seems to work on my Intel i5 laptop, but fails on my Android phone. I suppose ARM processors are more resistant to this attack? Although I see the Apple M1 ARM is susceptible.

Clive Robinson • March 18, 2021 4:47 PM

@ ALL,

Am I the only one thinking,

“Why has it taken so long to come up with an exploit”?

The fact that all time based side channels can be exploited in one way or another to leak either secrets or identification of the system or user, you would have thought would have encoraged students and similar to develop POC code and publish a paper.

As a time line the first big example of using the cache to leak the AES key across the network just a couple of weeks after the finalist was finalised would have woken people up a bit.

Then the research involving identifing that different host interfaces in a system by TCP timeing and the system clock drift. Again you would have thought would also wake people up.

We’ve been “Sleepwalking” into this problem and to be honest I get the feeling the industry has become almost entirely apathetic about it.

After all it’s not a problem that can be solved by an application. Becsuse the root cause is way further down the computing stack than the CPU layer let alone it’s ISA layer.

Like all “bubbling up attacks” it can not be solved by top down coding, it needs appropriate hardware in place. But Intel have made it brutally clear they care not a jot in real secure hardware design, because it would make those “oh so important specs look bad”.

As I’ve indicated for more than a decade or so these hardware securiry issues and their solution have been known for many decades and they were not that difficult to implement even back in the 1960’s.

Weather • March 18, 2021 5:09 PM

One program I wrote on x86 used a asm instruction to read a clock tick of the CPU, maybe if you read the clock then did ror24 then read the clock it would tell you a aes was at what point, if it was quick, you would know there wasn’t many LSB , but the people don’t have the skills Clive.

SpaceLifeForm • March 18, 2021 5:15 PM

@ metaschima, kai, Clive

Silicon Turtles

READ. LEARN. UNDERSTAND.

Note that the code will likely require minor modifications to apply to other CPUs or browser versions

Weather • March 19, 2021 1:27 AM

Windows allocate the seq/ack numbers in the TCP header by the clock instruction, maybe a leakage.

Weather • March 19, 2021 2:09 AM

Unsigned int vedx,veax
Asm(“rdtsc”)
Asm(“mov vedx edx”)
Asm(“mov veax eax”)
Asm(“ror esi 24”)
Asm(“rdtsc”)
Asm(“mov vedx1 edx”)
Asm(“mov veax1 eax”)
Printf(“%8x%8x”,vedx1-vedx,veax1-veax)

SpaceLifeForm • March 19, 2021 3:05 AM

@ Weather

Better than leakage. MITM. Watch for 429.

https://www.thegeekstuff.com/2012/01/tcp-sequence-number-attacks/

xcv • March 19, 2021 3:07 AM

@ Weather • March 19, 2021 2:09 AM

Unsigned int vedx,veax
Asm(“rdtsc”)
Asm(“mov vedx edx”)
Asm(“mov veax eax”)
Asm(“ror esi 24”)
Asm(“rdtsc”)
Asm(“mov vedx1 edx”)
Asm(“mov veax1 eax”)
Printf(“%8x%8x”,vedx1-vedx,veax1-veax)

Read Time Stamp Counter.
Move the results out to memory.
Do a rotate instruction on an unrelated (?) register.
Read Time Stamp Counter again.
Move the results out to memory, assuming variables vedx1 and veax1 have been declared.
Subtract and print out the number of clock cycles required to execute one rdtsc instruction, two double word writes to memory, and one ror instruction.

Of course the moves to memory, (assuming Intel notation,) do not have to be completed in order for the rdtsc instruction to be executed for the second time.

The moves to memory only have to be completed in time for a subsequent fetch to compute the arguments in preparation for the call to printf; if not, the processor’s pipeline will stall, and there will be a few clock cycles’ delay; if there is a memory cache miss, there will be a much longer delay until the cache is flushed and written out to main memory; but that only delays the call to printf, not the computation of the time delay between the two executions of rdtsc.

SpaceLifeForm • March 19, 2021 3:36 AM

@ xcv, Weather

“if not, the processor’s pipeline will stall, and there will be a few clock cycles’ delay; if there is a memory cache miss, there will be a much longer delay until the cache is flushed and written out to main memory; ”

You just do what Weather coded, twice, back to back. Everything should be in cache on the second pass.

It is extremely doubtful that would not be the case. Extremely doubtful.

I seriously can not envision a scenario where I could bury a core and interrupt it enough to cause cache misses every time.

In theory, yes I could with carefully planned workload. As an attacker? They would not.

Clive Robinson • March 19, 2021 5:27 AM

@ ALL,

With regards my earlier comment of,

“Then the research involving identifing that different host interfaces in a system by TCP timeing and the system clock drift.”

First came up on this blog back in early March 2005,

https://www.schneier.com/blog/archives/2005/03/remote_physical.html/#comment-3050

Then at the begining of September 2006 a year and half later, Steven J. Murdoch over at the Cambridge Computer labs, came up with,

“Hot or Not: Revealing Hidden Services by their Clock Skew”,

https://www.lightbluetouchpaper.org/2006/09/04/hot-or-not-revealing-hidden-services-by-their-clock-skew/

Whilst Steven used it to demonstrate finding services hidden behind Tor, I’d been thinking about to me a more interesting attack.

If you are a high end researcher into zero-day vulnerabilities you will know how valuable they can be in terms of time and other resources invested in finding new ones. You will also know that one of the ways such zero-day attacks become “Known and Negated” back then was “Honey Pots”.

So as an attacker how do you find out if the IP address range you are just about to go and “tap up” is actually real individual machines, or just one machine pretending to be a network of machines and is in fact a “Honey Pot” part of the “Honeynet Project” aimed at capturing things like freshly minted zero-days?

Well the problem for a “honey pot” is no matter how many hosts it fakes it still has only one master CPU clock… Thus measuring the clock drift is one good way to do it.

However Steve Murdoch’s original system was both very noisy and inefficient. He used many many probes across a given time period which would show up in the Honey Pot logs. Also as any comms engineer who knows what an “eye diagram” is will tell you all the timing information is in the clock edges. Which translates to when the TCP time stamp rolls over in the attack. So any proabs not at that time are wasted as they provide no usefull information –out side of initial syncing– and fill the logs up.

So by using a two stage approach of first “halve the difference” to obtain sync then just a few probes across the time of the expected TCP time stamp transition will give you an effective tracking loop (as are used in many Spread Spectrum and data receivers).

This still will leave traces in the Honey Pot log though… How to deal with that?

Well you can not stop them getting logged, so make them hide in amongst “expected traffic” and use the Honey Pot operators assumptions work against them. In short a little bit of “social engineering”. The way I chose to prototype this was to make the attack look like “A brain dead Script Kiddy Attack” which were very numerous back then.

The result is the chances were that the Honey Pot logs like todays IDS log watching processes would “note, classify, filter and ignore”. Thus as an attacker with a nice shiny new zero-day you would find the Honey Pot and thus ignore it thus your zero-day would be like the butterfly that escapes the lepidopterists net, fomalin jar, and being pined to a cork on public display.

But there is another use for the time stamp attack, which brings us back to cache attacks.

Mad as it might first appear from a security perspective people share computers in data centers to reduce costs of running an Internet connected service. Most such shared services are just like the Honey Pot machines but rented out from server farms and some Cloud Servers…

Many cache attacks require a probe on the machine they are trying to detect secrets such as AES keys etc leaking from via time based side channels.

Using a TCP Time Stamp attack would give information about what other services share the machine of interest, thus increase your chances of finding a way to mount a probe.

All of this was either known or easily worked out more than a decade and a half ago.

In times past I used to find that attacks I thought up would take about eight years to becoming found in the wild…

So “Why So Long?” this time…

Is it because the entire ICTsec industry has slowed down? Become inefectual? Lacks investment? Has become filled with snake oil salesmen?

I have commented on a number of occasions that in ICTsec “We fail to learn from our history” with attack methods and even attacks getting “re-cycled” every few years, so that “What was old is new again” like skinny leg jeans.

I’ve also noted that due to the failures of the big corporates our consumer grade Hardware, OS’s and Apps have more holes than “moth eaten string ubderpants” thus in the process of letting everyone down make the entire LAN, WAN, and Internet connrcted structure a very very “target rich environment” for attackers.

Thus is it just that modern attackers are so few in comparison that they are in effect “living off the fat of the land” with low hanging fruit and windfalls, thus have no real need to strive? To inovate? To learn, grow and evolve?

Weather • March 19, 2021 5:40 AM

@Xcv slf
The esi was example of another core doing a instruction and the CPU not doing the other core instruction again if the same.
The mov plus ror is the number of cycle the CPU takes to process it. It should be 3 I think?

SpaceLifeForm • March 19, 2021 5:46 PM

@ Weather

I do not see another core in use. Another ALU yes.

As to the cycles between the clock reads, it may be near 1.

The two MOVs possibly may be done together in one cycle.

https://www.agner.org/optimize/#manual_instr_tab

The number of cycles to read the clock is the heavy load.

Weather • March 19, 2021 6:24 PM

@slf
Assuming the compiler doesn’t change it to movd etc.
If you have two program’s the share the l2 cache on the CPU, say one program is just a loop of rotate, the second program runs the same instruction that happens to match the same value, if program is running or not… It doesn’t matter how long rdtsc takes, and you only get one arithmetic unit per core that I remember?

Weather • March 19, 2021 7:10 PM

@xcv slf
When my computer is free I’ll write a poc, has I’m having trouble explain it.

SpaceLifeForm • March 19, 2021 7:41 PM

@ Weather

ALUs per core vary. Newer processors may have more than two. But your program is stuck in one core.

Just as an experiment, try doing your clock reads and saves THREE times in a row.
Then FOUR times, and ignore the first result. printf the results after finishing all of the clock reads. See if you notice anything. Next, duplicate the number of MOVs at each step. I.E., make them redundant. Try tripling the redundant MOVs. Maybe more. Keep them alternating between eax and edx.

With enough effort, you will be able to discern a lot as to the cycles per instruction.

You should be able to figure out the cycles for the MOVs, and the RDTSC instruction for the machine that you are testing on. Very accurately with enough testing.

https://stackoverflow.com/questions/58893726/do-single-threaded-programs-execute-in-parallel-in-a-cpu

TRX • March 20, 2021 11:45 AM