Another Spectre-Like CPU Vulnerability

Google and Microsoft researchers have disclosed another Spectre-like CPU side-channel vulnerability, called “Speculative Store Bypass.” Like the others, the fix will slow the CPU down.

The German tech site Heise reports that more are coming.

I’m not surprised. Writing about Spectre and Meltdown in January, I predicted that we’ll be seeing a lot more of these sorts of vulnerabilities.

Spectre and Meltdown are pretty catastrophic vulnerabilities, but they only affect the confidentiality of data. Now that they—and the research into the Intel ME vulnerability—have shown researchers where to look, more is coming—and what they’ll find will be worse than either Spectre or Meltdown.

I still predict that we’ll be seeing lots more of these in the coming months and years, as we learn more about this class of vulnerabilities.

Tags: hardware, Intel, Microsoft, side-channel attacks, vulnerabilities

Posted on May 22, 2018 at 9:38 AM • 54 Comments

Comments

Lisa • May 22, 2018 10:45 AM

The question is, when if ever will it be possible to purchase x64 processors for PCs, which do not have any pipelining, speculative execution, or lack of cache & stack isolation?

Some of us would be willing to accept a significant slowdown in CPU performance in exchange for better security.

Intel typically prevents users to even install microcode patches directly from them, forcing users to go through their OEM vendors which do not all make those patches available.

Worse yet, we may be 1-2 years away before Intel starts to releases hardware fixes, in its Ice Lake and newer lines.

SomethingRandom • May 22, 2018 10:59 AM

There will probably always be side channel attacks for just about any conventional hardware.

What this actually highlights is the lack of encryption in application data. One company I worked at many moons ago thought that XOR-ing passwords (to mainframes!) was good practice. Talk about an easy plain text attack! In the published paper for the exploit, it shows passwords in plain text. If good practices were followed, then the passwords would be encrypted.

A.H. • May 22, 2018 11:39 AM

@Lisa, if you want a processor without pipelining, speculative execution nor cache, you should be looking to a different class of processors entirely. And I don’t know what you mean by “significant slowdown”, but I doubt you’d be willing to accept the significantly less than 1% of performance that you’d get from such a processor. On the bright side, it would also consume significantly less power. You won’t be able to run an e-mail client, but you will be unable to do so consuming an astonishing small amount of energy.

Who? • May 22, 2018 12:00 PM

It is known, there are more of these vulnerabilities coming. CVE-2018-3639 and CVE-2018-3640 are just the first ones of a set of eight new vulnerabilities that will be announced in the next months. The worst one, to be announced in august, will be able to exploit a virtual machine to attack a host system.

These eight Spectre variants were reported to Intel on february, so it took only two months discovering these variants once the first ones were announced. Intel may have more Spectre-like vulnerabilities under research right now.

On our new “dickensian with a nerd touch” world we will be haunted by the Ghost of Christmas Past quite a few more times.

Both Intel and AMD recommend leaving SSBD disabled. They should stop saying their customers security is a priority! No way!

Why not do the right thing and let the users decide if they want speculative execution completely disabled?

We will see how much will performance suffer once the new eight Spectre variants are fixed on our computers (if we are lucky enough they are fixed at all!) My bet is we will see a performance hit greater than a 50%.

Right now we have:

CVE ID        Description              Vector
CVE-2017-5753 Bounds Check Bypass      AV:L/AC:H/PR:L/UI:N/S:C/C:H/I:N/A:N
CVE-2017-5715 Branch Target Injection  AV:L/AC:H/PR:L/UI:N/S:C/C:H/I:N/A:N
CVE-2017-5754 Rogue Data Cache Load    AV:L/AC:H/PR:L/UI:N/S:C/C:H/I:N/A:N
CVE-2018-3640 Rogue System Reg. Read   AV:L/AC:L/PR:N/UI:N/S:C/C:L/I:N/A:N
CVE-2018-3639 Speculative Store Bypass AV:L/AC:L/PR:N/UI:N/S:C/C:L/I:N/A:N

Pretty serious hardware bugs!

On the eight new Spectre variants, four of them are classified as “medium risk” (including the one announced a few hours ago) while the others are “high risk”.

Who? • May 22, 2018 12:05 PM

Slightly off-topic, but shows hardware vulnerabilities are not “just local”:

https://thehackernews.com/2018/05/remote-rowhammer-attack.html

Lisa • May 22, 2018 12:08 PM

@A.H., I would personally be willing to accept an 80% reduction in performance for an x64 processor in a laptop, that would still allow be able to run OpenBSD, with a Windows VM guest (for my work software).

For super secret stuff, I do use HSM (Hardware Security Modules) and Smart Cards, but the software development for these still require a PC.

The reality is that Windows software on a PC with x64 architecture is what many employers and customers require, so running it in a VM is the best option I have for now.

Unfortunately this leaves out open hardware options, including those running on FPGAs.

OldTimer • May 22, 2018 12:20 PM

@ SomethingRandom:

Once upon a time, a little over a quarter of a century ago, my boss forgot his password into our “state of the art” CASE (Computer Aided Software Engineering) system sold with high-tech encryption. So, being stuck, he asked me to fix this, to figure out his password (or a workaround) for him. Turned out their high tech encryption system consisted of XOR’ing every single byte written with the exact same 8-bit value. This wasn’t just their data files, but also their otherwise plaintext password file. And most of the values in that file were NULL (0x00). (As in 0x00 XOR $FOO = $FOO.) Oh, and my boss reused the same password for his Sun accounts, and not just the desktop sparcs but the big iron down in the machine room.

I developed quite the positive reputation from that little incident. Everyone was very impressed. Although I’m looking at it all & going “It’s not me. Everyone else involved here is just far more incompetent that I would ever have dreamed possible.” Boss thought I was being modest, which didn’t hurt either…

Somedays…

Who? • May 22, 2018 12:41 PM

Lisa, it seems to me you are too clever. Intel never listens to people like you because you are not representing a reasonable market share.

Our only hope (if there is one) is that big cloud providers will request secure, simple, auditable processors at a reasonable price. Then there will be a small chance we will get these low-power, secure processors that meet our computing needs. However cloud providers are so performance hungry as other users so I doubt they will request these processors.

I am not as optimistic as Clive Robinson about the possibility of seeing simple, low-power, secure processors in the next years.

In fact, I doubt we will even see processors that allow disabling dangerous features like speculative execution (even if enabled by default) at BIOS level. Intel and AMD did nothing for those of us that though management technologies were dangerous, why will they give us the tools we need this time? Intel and AMD say they care about their customers security but it sounds more like marketing hype.

Each day I see security more as “binary,” in the sense that now it seems the smaller breach into our computers is catastrophic.

You are taking the right decisions. OpenBSD is lightweight and reasonably secure (when compared to mainstream operating systems) and SmartCards are probably the best technology to avoid your credentials being stolen on a “Spectre-ready (but fast…) processor”. At most your PIN will be stolen (if not using a PINpad), and it is useless without the physical card.

I started using SmartCards in OpenBSD a few years ago, after following Thoth’s advice, and must say it has been one of the best decisions I took.

The only way I see to protect ourselves now is doing a strict compartimentation of data, having as few computers connected to Internet as possible (in most cases just one computer) while filtering both ingress and egress traffic from a computer that has no user accounts or passwords shared with other systems (or, even better, only uses SmartCard authentication from OpenSSH/OpenSC).

Who? • May 22, 2018 1:03 PM

Obviously, where I wrote “Intel never listens to people like you because you are not representing a reasonable market share” I should should have written “Intel never listens to people like you because you are not representing a healthy market share”.

Sadly big corporations do not listen to people that represent small market shares, even if they are right. We must accept it, security is not important to most people. What would we expect from people that gives their personal information to corporations like Google and Facebook for free?

neill • May 22, 2018 1:48 PM

everything “in the cloud” is unsafe. we don’t know what CPU we’re using, which VMs are running on the same, what OS or other vendors (NAS,LAN,WAN etc) are involved … and since that’s much of our data (willingly or not)(and metadata as well) we must assume that we’re all screwed.

SAD!

echo • May 22, 2018 2:31 PM

I agree being able to switch mitigations on and off has benefit. I wonder if this could work on a core by core basis with outward facing functionality running on a mitigated core versus full performance for local stuff only? As for performance issues given so much software adds very little value compared to equivalents running on what would now be vintage systems maybe the whole issue of performance could be reconsidered for some use cases? Welcome back RiscOS, your time is now!

Oh, this is akward. Thankfully my needs are fairly limited and apart from my browsers (and email client) I don’t believe I have any stupidly obvious security issues.

Oh what no!!! Nethammer? Oh, that is so not fair.

uh, Mike • May 22, 2018 2:37 PM

Now we have a new channel, bearing microcode updates, to attack.
Same lessons, focused on another layer.

A.H. • May 22, 2018 2:40 PM

@Lisa, I’m afraid we’re talking significantly more than 80% performance reduction. I’m not kidding when I say that you’re looking at less than 1% of performance. There are actually processors without caches, speculative execution and without pipelining, they are commonly known as microcontrollers. Doing things a little bit “smarter” (focused on a consumer market) you might get 4-10x the performance of one of those (maybe). There are good reason to do things this way.

echo • May 22, 2018 3:00 PM

Oh, dear. The circle of shame widens…

http://www.osnews.com/story/30384/C_is_not_a_low-level_language

In the wake of the recent Meltdown and Spectre vulnerabilities, it’s worth spending some time looking at root causes. Both of these vulnerabilities involved processors speculatively executing instructions past some kind of access check and allowing the attacker to observe the results via a side channel. The features that led to these vulnerabilities, along with several others, were added to let C programmers continue to believe they were programming in a low-level language, when this hasn’t been the case for decades.

Processor vendors are not alone in this. Those of us working on C/C++ compilers have also participated.

https://queue.acm.org/detail.cfm?id=3212479

Clive Robinson • May 22, 2018 3:23 PM

Pardon me folks whilst I walk around with a big grin on my face for “The Xmas gift that just keeps giving”

And yes it’s becining to make me feel Gringh like :-S

Seriously there is a lot more yet to come out of this mess, and realistically if you are looking to get some security back with these CPU’s you are looking at more than a 50% slowdown in quite a few areas.

My advice would be to two fold… Firstly just accept the fact that for the next half decade or so you are going to be using insecure computing systems that are vulnerable below the “ISA level” in the computing stack. Worse for some vulnerabilities there will be no security fixes and in all likelyhood “Drive By Attacks” will almost certainly be developed…

So secondly most will need to consider their privacy / security in a different way. In the past I’ve talked about “gapping” two computer systems with one system being the “Offline and private / secure” system whilst the other system becomes the “Online insecure” system… You will also require some “gap crossing” technology to alow information to cross from one system to annother.

I’ve mentioned much of this in quite some detail in the past, I’m guessing some will need to go hunting then implementing…

Jesse Thompson • May 22, 2018 3:24 PM

@SomethingRandom

In the published paper for the exploit, it shows passwords in plain text. If good practices were followed, then the passwords would be encrypted.

Well the vulnerability we are discussing involves eavesdropping on RAM. Are you suggesting that the passwords in ram need to stay encrypted?

Because when RAM’s not a safe place to decrypt them to, where is?

Bear in mind that Encryption cannot solve all problems, most notably it can enable communication through an untrusted shared media, but there’s always got to exist an endpoint with a safe oasis of working space to anchor that encryption down to.. and that safe working space cannot benefit from being further encrypted without just kicking the can down the road to a more secure endpoint still.

@A.H.

I don’t think it’s fair to classify complete lack of cache as 100x times slowdown overall. Because while a cache miss might be ~60 times slower than data in L1, your entire computing experience doesn’t run out of L1 either, and some percentage of the time you will be dipping into slower RAM.

Nor can cache speed up your computing much more than the software you run is built to cooperate with it, and a lot of the software that we run is only performance optimized until it is “good enough”. Failure to cache optimize means that benefit would be less missed by that software were it absent, just like 32-bit software wouldn’t run any slower if CPUs fell back to 32 bit architectures.

Also, most microcontrollers have less transistor density and larger feature size than top of the line mass produced CPUs do: they trade speed for cost of production as much as they lose speed for lacking memory caches.

Ultimately, I feel like it’s an interesting design challenge for academics to figure out
brand new branch prediction and caching strategies in hardware that resist these classes of data leakage.

I’m reminded of the CS engineering vs academic debates around variable storage, where the early engineers abused globals while the academics spent half a century demanding immutable variables and perfectly black boxed lamdas so that it would be easier to derive deductive proofs about the properties of the software.

Some similar academic-lead approach to provably leak-proof designs (implementation being it’s own circus) for branch prediction and caching may allow us to keep much of the speed that messy design can offer, but also ultimately save designers from themselves by pruning some of the possibility space to fewer options that have more consistency and thus preventing them constantly stepping on their own toes, just as it does in software.

stan • May 22, 2018 3:49 PM

Lisa, see arch/x86/kernel/cpu/common.c in Linux for a list of Intel CPUs without speculation: all of FAM5 and some Atoms in FAM6. If hyperthreading exists you may want to disable it. It should be possible for a kernel to flush the cache on each context switch or to fully disable it. (Similarly, one can disable speculation even on CPUs that have it: “Processors are free to fetch and cache data speculatively from regions of system memory that use the WB, WC, and WT memory types”, according to Intel.)

RockLobster • May 22, 2018 4:04 PM

The simple fix with no performance hit is take your computer’s offline and use a throwaway device for internet.

A.H. • May 22, 2018 4:14 PM

@Jesse Thompson “I don’t think it’s fair to classify complete lack of cache as 100x times slowdown overall”

Read again. The lack of cache was only one of the three (the other two being speculation and pipelining). But yes, remove caches and ALL memory access go from one cicle (amortised) to over 100. Still, the major performance drop will come from going from OoO back to multicycle execution (not even simple pipelining).

“Also, most microcontrollers have less transistor density and larger feature size than top of the line mass produced CPUs do: they trade speed for cost of production as much as they lose speed for lacking memory caches.”

Microcontrollers trade performance for energy consumption. They remove all the fancy features (such us out-of-order execution, deep pipelines, caches and even memory protection) in order to massively reduce the power consumption.

Remove that (minus the memory protection, which I guess you want anyway) and the first thing you’ll notice is that your core will be completely stalled for a hundred nanoseconds or more while it waits for every single memory access (which comprises a significant amount of the total number of instructions; I’ve seen as high as 1 memory access every 4 instructions).

Also, since you go from OoO execution to multicycle (not even pipelined), instead of blurting out 4 to 8 instructions per cycle at 3GHz, each single instruction will take several cycles at 10s of MHz (because there’s no point in cranking up the frequency of your processor to GHz speeds if that means waiting over 100 cycles for memory instructions).

The last time we had desktop processors without OoO execution was back with the old Pentium processors (Pentium I), and even those were pipelined. Sure, this is a serious issue, but I’m positive we can do better than going back to the early 80s (when we saw the last Intel processors without instruction pipelining).

Jesse Thompson • May 22, 2018 5:07 PM

@A.H.

Yeah, we’d probably benefit from dedicated secure co-processors instead then, and all sensitive work can be done on the smaller, slower core that not only has it’s own dedicated L1 cache and no code re-ordering or prediction but no hardware cache assistance either. Each thread gets it’s chunk of slow RAM and chunk of fast L1 on the dedicated processor, and can directly address both so must handle it’s own fetching. Then how to fetch in a side-channel resistant way gets to be relegated to a software problem that’s easier to fix after the fact.

That can’t protect data we’d need to share with the user by flashing up onto their screen or anything, but the most sensitive things like encryption keys could do their job in the tight bunker at least. 🙂

John Smith • May 22, 2018 7:51 PM

from Lisa:

“The question is, when if ever will it be possible to purchase x64 processors for PCs, which do not have any pipelining, speculative execution, or lack of cache & stack isolation?…”

Raspberry pi clusters are looking more and more attractive.

It’s too bad that Transmeta is no longer in business, to develop compatible but more secure processors.

Intel does have people working on formal hardware verification. John Harrison was a former hire (http://www.cl.cam.ac.uk/~jrh13/).

The problem with big corporates, however, is that they get captured by their best-selling products. Any alternative, disruptive approach gets only weak support, because it threatens the status quo. (IBM and its lack of vision for the PC is a classic example.)

If Intel can keep selling flawed products, how strong is its incentive to develop formally verified ones?

Dave • May 22, 2018 9:23 PM

@Lisa: The question is, when if ever will it be possible to purchase x64 processors for PCs, which do not have any pipelining, speculative execution, or lack of cache & stack isolation?

These already exist, they’re called the 80386 and 80486. You do however have to accept a slight reduction in performance in exchange for the enhanced immunity to these types of attacks.

Dave • May 22, 2018 9:25 PM

@John Smith: how strong is its incentive to develop formally verified ones?

There has been at least one formally verified processor in the past, the Viper. How many Viper-based systems have you ever seen?

Looking back at what happened with the Viper, I’d say anyone’s interest in developing a formally-verified processor for commercial use would be zero.

Jeff Laughlin • May 22, 2018 10:03 PM

@John Smith

Raspberry pi clusters are looking more and more attractive.

Current RPi CPU has an 8 stage 2 way superscalar pipeline. No speculative execution. It does have branch prediction. https://en.wikipedia.org/wiki/ARM_Cortex-A53

Nick P • May 22, 2018 10:52 PM

@ Lisa

You can buy embedded CPU’s like that but not x86. The market for x86 didn’t want secure or lean stuff. So, nobody is producing it far as I’m aware. So, Who? is right about that. The closest was Centaur’s processors they make for VIA which avoided some problem areas just to suit their market better. You might be safer buying their stuff just because few would be attacking it. Most don’t even know they exist. This is even true of hackers that I can tell. Which is sad since they were first x86 to be energy efficient, accelerate crypto, and have onboard TRNG all at same time. The VIA Artigo’s used to be super-cheap, too. Their processors are more complex now.

It’s also possible that a Loongson doing x86 emulation might be helpful. I neither have experience with those nor know anyone with experience with those. Can’t say anything about them.

@ A.H.

” I’m not kidding when I say that you’re looking at less than 1% of performance. ”

You really are without numbers to back it. A single-core, 5-7-stage, in-order CPU doesn’t have 1% of the performance of the same CPU done with some extra stages and speculative execution on average. I could imagine a huge speedup for some specific algorithms that the optimizations benefit, though. That’s actually why they’re there.

“There are actually processors without caches, speculative execution and without pipelining, they are commonly known as microcontrollers. ”

That’s misleading again. Comparing what a CPU can do to a MCU is already highly misleading by itself since it’s so apples to oranges: Lisa is talking about desktop chips and you counter with what’s in stuff like what’s in its keyboard or mouse. Most microcontrollers are made to be cheap and flexible (esp w/ IO) on older nodes whose equipment has paid itself off. Most CPU’s keep pushing to new nodes with a bigger focus on performance even for the simpler ones. Looking at recent bookmarks, the top RISC-V chip in existence right now claims 1.7 DMIPS/MHz at 1.5GHz on 28nm HPC. That’s with the fancy stuff. That company’s 32-bit microcontroller with 5-6 stage, in-order pipeline that came before that claims 1.61 DMIPS/MHz at 1.5GHz on 28nm HPC. They cite worst-case as 900MHz. The 64-bit MCU is 1.7 DMIPS/MHz at 1.5GHz (900MHz worst) on 28nm HPC. And most designs like these are standard cell designs rather than full-custom Intel/AMD/IBM can do. If these weren’t custom, they’re not upper limit.

And note with all this that I used to do my programming, hacking, gaming, etc on a 200MHz Pentium II with 64MB of RAM. My 1GHz PPC Mac Notebook I bought for a project even runs YouTube videos. I think we can do a lot more than you think without the boosts from vulnerability-increasing techniques. It would be a niche market but it surely can do more than 1% of current hardware.

@ Dave

Viper wasn’t fully, formally verified. The CLI stack was the first with FM9001 being a big deal. The VAMP was verified for the Verisoft project. Far as no commercial work, you’d be wrong. That one also has triplicated registers with voting and noise mitigation for some fault-tolerance. Targeted at embedded, its datasheet says it runs at least 100MHz on about 500 milliwatts on a 180nm process node. It also has separation kernel built-in.

@ All

There’s a lot of techniques for dealing with this stuff. Aside from making fewer assumptions, the hardware can also use things like partitioned caches and/or OS’s that put possibly-malicious stuff on different sockets. That’s right: we need to bring back multi-sockets with individual caches on CPU cores. Anything shared has to implement an isolation model or trust what it shares with. That’s just the way it is. The stuff that’s untrustworthy shares as little as possible.

Far as techniques to analyze these things, there are many. I made a list of examples here when the Intel attacks were published. There’s probably more out now. The state of the art as of a few years ago was proving information flow down to the gates. It’s definitely a solvable problem to have a chip with good performance. You can’t hit where Intel, AMD, and IBM are right now. Not with the share-everything ISA’s. Good that many researchers doing alternative CPU designs like SiFive, OpenPITON, and Epiphany-V are multi-core by default. 🙂

Clive Robinson • May 23, 2018 2:46 AM

@ All

One of the reasons all this junk around the inner CPU core is “memory issues”. Put simply the speed of light is something you can not beat so we have various levels and types of caching.

Also with memory is the issues of task switching for which you need the so called Virtual Memory (VM) given by the Memory Managment Unit (MMU). One side effect of the MMU is that memory on the CPU side of it such as caches needs clearing out, whilst memory on the otherside gets it’s addresses changed via page tables which is faster than clearing it. The problems with the MMU is it is effectively another CPU core in it’s own right which shares system memory with the real CPU (which is why RowHammer amongst other attacks work).

Task switching can be grossly inefficient in CPU cycles and if being done securely by clearing cache etc very power hungry, in ways that effect system design all the way back to the inductors in the Switch Mode Powers supplies and all conductors in between which then need to be treated as transmission lines. Thus the cost of task switching securely is very high.

A question that could and should be asked is “What savings can be made by not task switching?”

The answers are very many, but firstly consider not having to use cache memory as cache. That is use it correctly as “local memory” you reduce complexity which reduces power consumption and the local memory can run almost as fast as register memory.

The swap from slow system memory to large registers was a consideration in super computers and we are using those same tricks these days in graphics procesing units (GPU).

The other thing getting rid of task switching does is it gets rid of anything upto 19/20ths of the non core CPU logic that runs at core CPU speed and is a major reason due to “heat death” why decreasing the size of transistors is giving diminishing returns.

Thus for the same chip die area you could have vastly more cores running in parallel and at three to ten times the speed of current big CPU cores without needing any of the gubbins that have been tucked in along the years.

Thinking correctly about parallel processing would significantly reduce the need for task switching. But programmers tend not to like parallel programming because it’s not the way many can think in effectively.

As I’ve said a few times before the future is parallel at the core it’s just a question of how we get there, but ditching the wasted investment in x86 and similar devices is going to be the hardest cul-de-sac to reverse out of.

Clive Robinson • May 23, 2018 4:26 AM

@ Lisa,

Some of us would be willing to accept a significant slowdown in CPU performance in exchange for better security.

You don’t need to actually have a decrease in “your performance” in fact you could increase it…

I mentioned before, https://www.schneier.com/blog/archives/2018/01/spectre_and_mel_1.html#c6767321 that an early version of BSD runs quite happily on a 1USD chip, with the performance of a four user microvax.

The thing is we do not need all the power in our desktop computers, they mainly sit their doing nothing waiting for a key press or mouse movment, or some other user generated input.

Then vast amounts of CPU cycles are wasted in redrawing the graphical screen as fast as possible, because humans do not like fractional second delays in response…

Go back to a genuine Command Line Interface (CLI) and some $1 microcontroller MCU chips will outperform the Intel and AMD x86/64 architectures in getting answers to questions and most users real work being done.

Back in what seams an eternaty ago I had a 486SX –without maths co-pro– CPU runing at 50Mhz with 16Mbyte RAM 100MegaByte Hard Drive with a version of Unix and an eight port serial line concentrator running an entire accounts Dept of six users and a manager on terminals with DOS like screens and applications quite productively…

Around 1993 I had another 486 box running Unix with “DOSmerge” with four serial ports driving Motorola CPU In Circuit Emulators (ICE) and two driving terminals with one set up for IEEE-Bus test instruments so two hardware design engineers could work on developing communications products comfortably at their desks so they would not have to try to squeeze in together in a tiny airless and very hot screened room. As the version of Unix was a full port of Unix Systems Laboratory SysVR4 with full BSD extensions it was a nice environment to work in. It also had “virtual screens” which ment you could have eight full screen applications running at the same time just using a function key code to switch between them. It made an engineers life way way easier thus caused a degree of jealousy in a lab of twenty engineers where the others had to do it the hard way still 😉

The point many have not realised is just what a terible time sink Windows computer screens realy are not just for the computers but those who are opperating them…

Almost year by year base office productivity drops, but we don’t get to hear this because of the faux increase in productivity that marketing and corporate image consultants claim. All aided and abeted by “data collectors” who want data entered in their prefered application / file format so they can provide “efficiency reports” to those way up the command chain who have no interest in anything of that sort other than that magic efficiency figure by which their pay rise / stock option goes up…

When you actually get down to “brass tacks” humans actually “talk to each other” not “paint to each other” that is we are comfortable with hearing 1.2-2.5 words a second (72-150 words a minute). With the average word being the equivalent of six characters in length. Most people can read comfortably at a rate of two to five times that (but can do much better by reducing eye movment). One of the reasons we have very short words is error correction and in turn the reason we have longer words is to increase information flow rate.

Whilst pictures might set mood they are not effective at transfering specific information it’s why we have the written / typed word. Thus all that “corporate image” stuff is like giving a pet dog a makeover, it might make you feel all warm and cuddly but is the dog any different in the way it behaves…

A.H. • May 23, 2018 5:03 AM

@Jesse:

“Yeah, we’d probably benefit from dedicated secure co-processors instead then, and all sensitive work can be done on the smaller, slower core that not only has it’s own dedicated L1 cache and no code re-ordering or prediction but no hardware cache assistance either.”

That, for instance, is a much better solution (and one being considered). And it’s also compatible with other solutions, like having context identifiers for cache lines, tlb entries (already there) and branch predictor entries, together with a more aggressive flushing policy. This class of side channel attacks is rooted in the fact that those microarchitectural components are shared between different privilege levels, we can fix that. Take a hit but don’t sink the ship.

@Nick P

Those processors have both, caches and instruction pipelining. They take the massive hit of not having OoO execution (and a massive energy reduction) but they don’t go all the way back to multicycle, which is what was being suggested. Inorder cores have their place and will still have it for years to come, so do caches and OoO processors.

And while I might not have the numbers (nobody would be insane enough to actually try to use such a processor for desktop applications) I do have a good enough understanding of computer architecture and knowledge of computing history to predict the range of performance it would have. I can’t give you exact numbers but I can give you an estimation within one order of magnitud.

echo • May 23, 2018 5:15 AM

I like how everyone said everything I said with more words. How do you do this?

As nice and shiny as modern IT is it is very busy and demanding. Languages and development tools have become overbloated and tortuous like a bureaucracy. I’m sure this doesn’t help workplace stress. I think we can do better than 80 column green screens but I don’t believe status anxiety caused by “high end” applications is worth the bother outside of a tiny niche.

BeOS (succeded by the open source Haiku) introduced the idea of file filters being objects which encapsulated the functionality of an application which meant all you neededwas the file filter to access a document. RisCOS also introduced the idea of applications as objects which meant you could derive a new more sophisticated application from something like a painting or document processing widget.

Near the end of the lifecycle of VCRs as a product category manufacturers manipulated the market to dumb down features of the standard product while reintroducing previous standard features as “high end” which effectively forced the average price up. My memory is hazy but I believe one of the major manufacturers was on the receiving end of regulatory action for this.

Thoth • May 23, 2018 5:16 AM

@all, Clive Robinson

I guess we are better off moving away to 32 bit and 16 bit CPUs in tiny clusters. Simpler architectures that have been around for ages (i.e. 16 and 32 bit CPUs) are a better choice.

neill • May 23, 2018 5:44 AM

@thoth

sometimes you just need high single thread performance, and there’s no easy way around it …

wonder how Cell or itanium would be affected by all this. IMHO itanium is a marvel and amazing it works at all, but it’s extreme VLIW overwhelmed its designers …

BTW if i recall correctly one POWER generation came w/o OoO, maybe IBM had some insights before anyone else?

neill • May 23, 2018 6:48 AM

@thoth

PS the specs for the POWER 8 (6 core, 22nm) say

“8-wide in-order instruction dispatch”

but of course it has lots of L1, L2 etc cache on-and-off-chip

A.H. • May 23, 2018 7:30 AM

@neill, there are many reason to have in-order pipelines. For instance, for massively multiprogrammed workloads you don’t need OoO to keep your pipeline busy and you can save a lot of power and free precious transistor by eliminating the whole OoO engine. GPUs, for instance, are in-order, as far as I know. The EPIC architecture (Itanium) left the exploitation of ILP for the compiler, you didn’t really need an OoO core when the fine grain parallelism is exposed like that.

Still, just because the way we currently implement OoO is broken it doesn’t mean it’s unfixable. It’s not. It will take some transistors and energy, but it will be fixed.

Name (required) • May 23, 2018 8:58 AM

@A.H.

You won’t be able to run an e-mail client, but you will be unable to do so consuming an astonishing small amount of energy.

My Pentium 1 used to handle an e-mail client and more just fine. It’s not right that when performance and capacity increases multiple orders of magnitude yet there’s barely any performance improvement as the software is systematically bloated with forced features and efficiency being an afterthought in the best case. Web pages require astonishing amounts of cpu and memory comparing to NN years ago yet increase in usability or features is questionable (I’m not saying there’s none, I’m just saying cost of improvements is unreasonable)

I personally think we need to rethink PC (and probably mobile) architecture and ecosystem from scratch. Use development methodologies adapted for auditability, ease of inspection and maintenance, maybe based around reliable user controlled per-app per-service per-resource permissions engine.

CallMeLateForSupper • May 23, 2018 10:38 AM

@SomethingRandom
“One company I worked at many moons ago thought that XOR-ing passwords (to mainframes!) was good practice.”

In the 1980’s, the IBM mainframe operating system “VM” stored the administrator PW in clear text. My, how times change.

CallMeLateForSupper • May 23, 2018 11:43 AM

@All
Clive wrote: “The thing is we do not need all the power in our desktop computers […]”

That assumes that those ‘puters are running command-line OSes instead of GUIs. Today the “practical” user uses GUI, and today’s GUIs are the poster child for code bloat. They consume a lot of cycles to get from “click” to “done”. Accepting GUI as an unavoidable prerequisite to computing (it isn’t) ties one to the higher-power computers.

(Just yesterday I was thinking about my first preemptive, GUI OS. The full install consumed 35MB of hard drive. That’s MB, not GB. While I was reminiscing, a 42MB browser update was downloading.)

echo • May 23, 2018 12:23 PM

I read through Bruce’s Twofish paper yesterday. He said he wanted to design a cypher whose design could be held in your head all at the same time and explain everything well enough that things followed on in a sensible and discoverable way. To some degree this was true of OS and aircraft and I’m sure other things once upon a time. I wonder if this could be used as a quality metric?

RealFakeNews • May 23, 2018 1:14 PM

As I predicted over 20 years ago, computers became faster, with no appreciable increase in performance.

Back in 2004, I re-built a 486DX4 100 MHz with math co-processor and stuck Windows 95 on it.

Next to it was a Pentium III 600 MHz running Windows XP.

The Win 95 machine left it in the dust.

One item I found curious was the sheer time it took Windows to draw windows. Win 95 seemed nearly instantaneous, whereas XP took about 2 seconds (it’s almost imperceptable, but there is a tiny animation for every window create/destroy on XP).

I did some more digging, and discovered that window message processing, as well as thread processing, is suspended for the duration of the window animation.

This time delay increased on Vista, and increased yet further to ca. 3 seconds, on Win 7 and later.

To say this raised an eye-brow is an understatement.

Clicking as fast as possible is pointless, as the window will not respond until it is FULLY drawn. Actions associated with a button click are not executed until the window has fully closed.

If you consider this 6-second delay exists with the full process of opening and closing applications, that is a lot of wasted time sat waiting for things to happen.

It really is time that we threw out the bloat and returned to simpler systems that get the things we need to do, done.

For too long application and systems developers have relied on ever faster computers to take up the slack in shoddy programming, and it is now starting to hurt far more than we thought possible.

bttb • May 23, 2018 2:41 PM

I skimmed the comments above
My interest here is somewhat off-topic regarding AMT, ME, and so on.

From posts above:
“And note with all this that I used to do my programming, hacking, gaming, etc on a 200MHz Pentium II with 64MB of RAM. My 1GHz PPC Mac Notebook I bought for a project even runs YouTube videos. I think we can do a lot more than you think without the boosts from vulnerability-increasing techniques. It would be a niche market but it surely can do more than 1% of current hardware.”
&
“Back in what seams an eternaty ago I had a 486SX –without maths co-pro– CPU runing at 50Mhz with 16Mbyte RAM 100MegaByte Hard Drive with a version of Unix and an eight port serial line concentrator running an entire accounts Dept of six users and a manager on terminals with DOS like screens and applications quite productively…”
&
“I guess we are better off moving away to 32 bit and 16 bit CPUs in tiny clusters. Simpler architectures that have been around for ages (i.e. 16 and 32 bit CPUs) are a better choice.”
&
“(Just yesterday I was thinking about my first preemptive, GUI OS. The full install consumed 35MB of hard drive. That’s MB, not GB. While I was reminiscing, a 42MB browser update was downloading.)”

In storage I have a grave-yard of older computers (PC and MacIntosh) (roughly 1995 to 2013)

1) Are there any rules-of-thumb for what older laptop and desktop PC and MacIntosh computers to consider keeping?

1b) How about a Pentium iii IBM small server from about 2001 (currently about 256 MB error correcting RAM and a 30Gigabyte SCSI drive)?

2) How to decide what computers to keep? Keeping them all seems sub-optimal.

4) For keepers, I assume I should consider maxxing out the RAM or at least increase it, in general.

Similarly I have a grave-yard of monitors and printers (roughly 1984 to 2014).

5) Are there any rules-of-thumb for what older printers and monitors to consider keeping? Regarding printers, any ink/toner purchase refill/purchase issues to consider?

6) Alternatively, maybe just keep accumulating stuff in a storage shed for now; and see what info the future brings. I assume humidity doesn’t need to be controlled. Any relevant heat and humidity storage specs?

In summary, relevant links would be appreciated, for example, in order to not re-invent the wheel. Or, perhaps, I am trying to justify “recycling” some of this stuff.

7) Is Rowhammer DDR3 and later?

8) Anything else?

echo • May 23, 2018 2:43 PM

@RealFakeNews

You are not alone!!!

Sauce For The Goose! • May 23, 2018 4:24 PM

For Linux folk:

#cat /proc/cpuinfo

and scroll down to: “bugs”

####THIS#IS#FUN!####

#cat /sys/devices/system/cpu/vulnerabilities/*

and/or:

#head /sys/devices/system/cpu/vulnerabilities/*

It doesn’t fix anything, but it does show vulnerabilities and what not.

Z.Lozinski • May 24, 2018 5:15 AM

@Neil

“BTW if i recall correctly one POWER generation came w/o OoO, maybe IBM had some insights before anyone else?”

Maybe, if 1955 counts as before anyone else

There are three good candidates for the first machine with out-of-order execution.

IBM 7030 “Stretch” (shipped in 1961) has partial out-of order execution for memory operations. This is based on Gene Amdahl and John Backus’ asynchronous non-sequential (ANS) control design from 1955.

CDC 6600 (shipped in 1964) implemented register scoreboarding, to allow multiple floating point arithmetic instructions to be executed in parallel, which implies out of order, because multiple operations are usually slower than addition.

IBM 360/91 (shipped in 1968) implemented Tomasulo’s 1967 algorithm which defined how to implement out-of-order execution in hardware.

The technology of out-of-order execution was not well known until the 1980s. No-one who grew up on minicomputers or microprocessors encountered it. I think it was so complex to implement out-of-order, that even people who had read Tomasulo’s algorithm, which appeared in university textbooks, assumed it was only suitable for supercomputers. Within IBM it was used in high-end mainframe processor designs, including the 3033. The best general description was in Peter Kogge’s 1981 book “Architecture of Pipelined Computers”, which I’m pretty sure was based on the 3033.

The processor in the first IBM RS/6000 (variously known as POWER-1, RIOS or AMERICA) implemented out-of-order execution. That’s when the idea became more widespread in the industry

The rest of the 1990s has lots of out-of-order machines, including for x86 Intel Pentium Pro and AMD K8, probably helped by the fact that Hennessey and Patterson started describing micro-architectures to implement o-o-o in their “Computer Architecture” and “Computer Organisation’ series of books.

I think the recent challenge and the reason for SPECTRE and related vulnerabilities was that the focus on program correctness was lost somewhere along the way. Discuss ..

de la Boetie • May 24, 2018 7:14 AM

While comments that the UI absorbs/requires the power are true, the business model for the UI is not for the benefit of the consumers.

What the corporates want most of all is to have you by the eyeballs, so the desktop power is all about ensuring that happens.

My best computing experience was doing research with overnight batch jobs – bliss.

There are no technical impediments to recovering that situation (where the computers are our agents and slaves rather than vice versa). The impediments are that the business model is about pushing addictive web-browser glitz at us.

Security and privacy could be greatly improved by having rather more modular and simpler Cpus which performed headless communications and parsing of structured and semi-structured messaging – leaving the UI rendering to a processor that had no internet access.

echo • May 24, 2018 8:24 AM

@Z.Lozinski

Computer science theory developed for prioritising tasks and throughput in mainframes could be used in healthcare and much of the state sector too. I perceive this may make better use of resources and prevent some of the more obvious whoops oh sorry too late disasters. I expect similar equivalent security issues arise.

Z.Lozinski • May 24, 2018 10:19 AM

@echo,

One of the very important lessons we learned, was about the importance of “straight through processing”. In a complex computer-mediate workflow it is essential that most transactions flow through all the process steps without error. If you get errors, for example because one of the pre-requisites is missing, you get process “drop-out”. Then you have to put the failed transaction in a bin for later remediation, which is a manual (expensive, error-prone) process.

The interaction of this with healthcare and security, can be seen in a recent example from the UK.

A wheelchair-bound woman was denied her disability payments, because she had not attended a mandatory visit to an assessment centre. The mandatory visit was presumably required to prevent benefits fraud. However no-one required that the assessment centres were to be wheelchair accessible. The failure that disability assessment centres were not in fact accessible had been reported multiple times in Private Eye (a UK newspaper), but neither the outsourced assessment company, nor the Government department took any action.

This is the logical follow-on from Larry Lessig’s principle that “code is law”. However what is missing is the discipline to ensure that there are no gaps in our process definitions. These are the gaps that a) cause problems for citizens, and b) give criminals a way to commit fraud. For “process drop-out” in the law, there is a well established method to resolve the matter using the courts. Typically this applies to complex legal issues, involving companies, and weeks the High Court in work. But there is no easily accessible way to resolve this classs of problem when they affect ordinary people. As more government processes are digitised, this will only get worse.

TRX • May 24, 2018 12:04 PM

Some of us would be willing to accept a significant slowdown
in CPU performance in exchange for better security.

Not just “hell yes!”, but I’m very glad I didn’t get around to disposing of some of those older PCs in the closet…

I’ve put some older off-lease Core2 machines at a client’s site after topping them off with RAM and installing an SSD. Most of the users didn’t even notice the difference. No ME backdoor, no UEFI, internet access through a second NIC just in case. BSD on the servers, moving to Linux on the desktops. There are probably undiscovered security flaws in their old CPUs, but it’s the best I know to do until someone ramps up x86-compatible production again, or the ARM people ramp their clock speeds up.

echo • May 24, 2018 4:24 PM

@Z.Lozinski

I was discussing strictly prioritising and throughput (helping with creating better quality outcomes for less money) though I did recognise as I wrote it there must be protections against fraud but didn’t click about drop outs. Your explanation explained the critical issues very well.

The UK state sector system is terrible. You can’t get any of these people to admit dropping a pin without filling in a million complaint forms. Box ticky doesn’t begin to explain how bad the system is. I was told by one state sector “manager” that “we are not solutions people”. Yes, really. The same Guardian is currently beginning a campaign to ask if people have been asked to pay for healthcare. Without putting too fine a point on it I witnessed when standards of care were being discussed one member of staff interjecting with the issue of staff cuts. This is the same department where exactly the same clinical is done in the private sector by one third the number of staff because the protocols are not top heavy with job titles or makework/cost cutting criteria. I have also witnessed a lawyer who having been paid a six figure sum and who had actually not done any work began demanding who was going to fund a case when the funding was covered by human rights criteria and work was still pending on a strategic legal funding application. Work distribution within healthcare is so badly managed I have personal knowledge of one person who twiddles their thumbs and was told of somebodys friend who is working unpaid overtime because if they didn’t carry the workload the system processing patients would collapse.

Pretend to Speculate • May 24, 2018 10:40 PM

with apologies to C. Hynde

https://m.youtube.com/watch?v=CK3uf5V0pDA

Back on the cmd line

Found a speculative read, OoO OoO
That hijacked my process last night
To a place in the cache
Nothing was flushed out of, OoO OoO
Now the stack has been smashed
And my system has crashed
I’m back on the cmd line

Predicted a branch beyond my control, OoO OoO
A circle of wait in an endless whirlpool
The register held a pointer from hell, OoO OoO
Switched x and the y and the recursion died
Ctrl-Shift-Ampersand
OoO, back on the cmd line

…

neill • May 25, 2018 2:43 AM

@Z.Lozinski

it’s not really about in-order/out-of-order but more a combination of speculative, plus OoO, plus dozens (hundreds a times) of chip errata, plus exceptions (hard or soft) at a bad moment …

all this complexity seemed to have overwhelmed developers, that do not know all intended (or in case of erratas unintended) functionality of the chips … i understand they all have $$$ pressures and deadlines …

reading thru several intel errata docs however was stomach wrenching and at the same time made me realize what a miracle it really is to have anything working at those speeds!

going forward i hope they learn from this and have ‘cleaner designs’ and not so much microcode (and machine code) trickery to make things work anyways!

echo • May 25, 2018 6:32 AM

@neill

I am constantly surprised this stuff works. Multi-dimensional house of cards is underestimating it’s flakiness. I am so glad I am not an engineer who needs to deal with this stuff.

Z.Lozinski • May 25, 2018 8:18 AM

@neill,

Yes. Complex processor implementations are driven by the difference in access time between on-chip DRAM and pretty much everything else. There are many orders of magnitude involved, so anything you can do to avoid stalling instruction execution wins in performance terms. Hennessy and Patterson do a nice job of explaining this. That is what causes all the “microcode and trickery”.

Would radical simplification help? Maybe. That was what gave rise to RISC in the first place. I see a lot of interest in RISC-V.

I think you have to start with radical simplification of the software. Looking at the Mac on which I’m writing this, Firefox has 88 threads for 4 tabs. Word has 16 threads and Excel has 22. 12 threads for one web-page, why? Interestingly the application I use the most, TextWrangler, which is also the most responsive, has only 2 threads. There’s a lesson there, somewhere.

Once you remove some of the software bloat, maybe then you can reduce the hardware requirements.

Complexity is also the enemy of security, especially as you cross the boundary between instruction set architecture and how it is implemented. I think this is the real lesson of SPECTRE and friends, the ISA and the implementation are now so complex, the engineers didn’t even realize there were problems.

Clive Robinson • May 25, 2018 6:04 PM

@ echo,

Computer science theory developed for prioritising tasks and throughput in mainframes could be used in healthcare and much of the state sector too.

Yet another coincidence… Not so long ago I was in hospital occuping a bed that I would rather not have. In the bed next to me was a quite senior member of a University administration, and the problems with prioritizing ques came up. I explained the basics and he looked bemused. So I had to borrow a pack of cards to explain the simpler cases. He was very supprised that nobody had thought to transfer the knowledge / process across.

As I pointed out to him, the problem is, when it comes to “natural systems” where the entities being processed have a degree of free will it can go horribly wrong. It’s why in most cases over all nature only goes for around 63% efficiency, not the 85-95% certain political incompetents thing must be possible by decree…

echo • May 26, 2018 3:03 PM

@Clive Robinson

Oh, Clive! You don’t want to know about the coincidences we’ve had. I had to furiously re-read one of your comments and double check my diary to make sure we hadn’t met that week. The Russian beard concidence was a bit too much. You are much taller than the relevant suspect so ruled yourself out.

Yes, I have noticed a lack of domain knowledge being shared. Communication between specialities (and professions and varous stakeholders) can be atrocious even for basic tasks. I have American medial and legal citations around this area which prove people can have more expertise than the experts, and that paying attention to the client/patient can be very fruitful. I explained as much to coder in another department when I was doing an admin job (I was actually the one out of the pair of us who was professionally qualified) and he threw a wobbly until his systems analysis course tought him exactly this kind of thing. He had the grace to apologise later which was nice of him. I really don’t get it. Why are people so aggressive when you try and explain something, or so jumpy like they’re scared of being attacked when apologising? Inforation theory law isa thing too and apart fro one barrister in the South East I seem to be the only person in the UK who has heard of it or considered its application.

The problem with wearing a skirt is too many men especially think you’re dim and a pushover. The number of times I wished I could hire you because nobody listens to me.

Tv Repair • July 25, 2019 5:52 AM

Researchers have found new flaws in Intel processors that could allow hackers to defeat the security boundaries enforced by virtual machine hypervisors, operating system kernels, and Intel SGX enclaves, putting data on both servers and endpoint systems at risk. The new attack techniques can be used to leak sensitive secrets like passwords or encryption keys from protected memory regions and are not blocked by mitigations for past CPU attacks.

Schneier on Security