Research in Secure Chips

Unsuprisingly, the U.S. military is funding reseach in this.

Posted on July 5, 2011 at 6:14 AM • 44 Comments

Comments

Clive Robinson • July 5, 2011 6:40 AM

Yup I would expect them to after all the noise they have recently been making about “China APT”.

The question then arises will it be like NASA and Apollo feeding the results outwards and downwards or like previouse NSA research, destined for the bottom of a dusty basment some where…

Anony • July 5, 2011 7:15 AM

I have mentioned the possibility of IC corruption/ subversion to CESG here in the UK.
I was met with a response that ‘We can’t do anything about that’

Maybe CESG will follow where the NSA lead?

Anon for this one!

Clive Robinson • July 5, 2011 7:21 AM

One “Oh dear they don’t get it moment”,

“IARPA is also interested in hearing ideas on chip obfuscation. The idea is to hide the “intent of digital and analog functions and their associated building blocks” of an integrated circuit in the FEOL manufacturing stage. If potentia adversaries can’t reverse-engineer or understand how a circuit works, it’ll be harder for them to modify it for malicious purposes”

It does not matter if they understand the internals or not in most cases it’s what goes on at the inputs and outputs during verification testing that give that game away to those who can think and are payed to do so.

As a very loose analogy, we don’t know what the human brain does or how it does it, however the use of Fast / Functional MRI is enabaling us to see in real time where brain activity happens when subject to external stimulus.

Also obsfication is difficult to get right, often it can have an “end run” around it or be subject to a “short cut”.

For instance a couple of decades ago people had problems with Power Analysis on chips used in smart cards and the like.

One solution proposed at the time was to use non-synchronous “self clocking” logic. In theory it sounded great untill it was pointed out that “transducers are bidirectional”.

That is by putting special signals on various pins of the chips they could [1] just like the colour burst oscilator in your television be brought into synchronisation for sufficiently long to get out the details the general non synchronous behavior had obsficated.

It is possible to put circuit elements on chips that resonate at various frequencies, and thus act as “direct conversion receivers” [2] in that the envelope gets translated down to base band just like it does in your AM radio.

Thus it is possible to design a chip that can have signals injected into speciffic internal areas and signal paths remotely on an RF carrier, that would pass all normal forms of testing with flying colours.

However these sort of circuits can also work in reverse, that is when illuminated by an RF carrier they will cross-modulate signals from inside the chip onto the carrier, where a strategically placed receiver can IQ demodulate the signal to recover the internal signal without overly effecting the operation of the chip.

As these methods are known in public literature you might expect that there are counter measures (and in most cases there are). However how many other ways are there to do similar things that may only be in the heads of a limited number of specialist chip designers?

[1] Google “loose locked oscillators”.
[2] Google “On frequency oscillator”, “zero IF”, “Direct conversion receivers” or “Self quenching receiver” or “Autodyne receiver”.

Danny Moules • July 5, 2011 8:29 AM

More to the point if the chips are obfuscated, it becomes that much harder to tell if they have actually been broken which was the initial intent in the first place.

Peter • July 5, 2011 9:48 AM

Why do they (or at least the article) make the assumption that American-made chips are safe? I know it is very hard to imagine, but U.S. companies have sometimes proven to be less than honest, to have less-than-perfect security, etc.

S • July 5, 2011 9:49 AM

So they’re saying they expect to be able to find added ‘features’ that enemy agents may have slipped in at the foundry, but that their own obfuscation of functionality is impenetrable?

Slight logical disconnect there, methinks.

Maybe they should just hire Clive, he’s a regular one-man IARPA.

RobertT • July 5, 2011 9:56 AM

Interesting, lots of old ideas but I didn’t see anything new and absolutely nothing to address SOC (system on a chip) level product exploits and the unique complexity related problems that SOC creates.

The chip Xray picture to tape-out data-base level verification is an interesting idea, because it potentially closes the loop, on one of the problems that I’ve talked about before. Basically there are mask making stages that are completely open-loop.

Counterfeit chips is an interesting problem, but it mainly effects low level functions like opamps and regulators, because coping a 60M transistor chip is just a little bit to complex for todays tools.

Sam • July 5, 2011 10:34 AM

Well its unsurprising that the military is funding this because thats not the news here.

The news is that it is the intel community (through IARPA), not DOD (=DARPA) that is driving this. Remember, it wasn’t that long ago that NSA ran their own fab.

See also some of the press coverage of the DARPA program, such as this May 2008 IEEE article:
http://spectrum.ieee.org/may08/6171

And Peter, they don’t make the asumption US companies in general are safe- they want to split chip fab such that the final stage can happen at a specific secure, cleared, US facility.

It is that split which was not present in the original DARPA thinking, and represents the major new work here.

NobodySpecial • July 5, 2011 10:51 AM

American-made chips are safe?
I think the NSA can at least make the assumption that the NSA aren’t forcing American chip makers to introduce NSA spy functions into chips so that the NSA can spy on the NSA.

The NSA spying on the CIA, the CIA spying on the FBI, the FBI spying on the AAA etc … however.

JJ • July 5, 2011 11:03 AM

Of course there is that risk that they introduce “the need for trust” as a precept of “the need for involvement” or rather “the need to make sure the chips contain what we want them to contain”.

Aesop • July 5, 2011 12:07 PM

We’re worried because we’ve done it…

The Farewell Dossier – The New York Times [Safire]
http://www.nytimes.com/2004/02/02/opinion/the-farewell-dossier.html?pagewanted=all&src=pm
…”Why not help the Soviets with their shopping? Now that we know what they want, we can help them get it.” The catch: computer chips would be designed to pass Soviet quality tests and then to fail in operation. In our complex disinformation scheme, deliberately flawed designs for stealth technology and space defense sent Russian scientists down paths that wasted time and money. The technology topping the Soviets’ wish list was for computer control systems to automate the operation of the new trans-Siberian gas pipeline. When we turned down their overt purchase order, the K.G.B. sent a covert agent into a Canadian company to steal the software; tipped off by Farewell, we added what geeks call a ”Trojan Horse” to the pirated product. ”The pipeline software that was to run the pumps, turbines and valves was programmed to go haywire,” writes Reed, ”to reset pump speeds and valve settings to produce pressures far beyond those acceptable to the pipeline joints and welds. The result was the most monumental non-nuclear explosion and fire ever seen from space.”…

More here:
The Farewell Dossier – CIA [Weiss]
https://www.cia.gov/library/center-for-the-study-of-intelligence/kent-csi/vol39no5/pdf/v39i5a14p.pdf
…The Pentagon introduced misleading information pertinent to stealth aircraft, space defense, and tactical aircraft. The Soviet Space Shuttle was a rejected NASA design…

moo • July 5, 2011 1:25 PM

@NobodySpecial:
But surely the NSA can’t assume that foreign interests will never be able to influence or subvert an American chip maker’s products. Thats why they used to have their own fab.

Nick P • July 5, 2011 4:01 PM

I seem to be alone in thinking the best way to solve this problem is to just build a state-of-the-art fab over here. Sure they cost billions to build. But, the Defense budget is hundreds of billions of dollars. Money could be allocated within the Defense budget to build and operate the fab. We might even be able to get some of that advanced foreign technology if we offer a big one-time payment to one of the firms. Construction and operation would be supervised by DOD. All the technology would be American-made. With all the jobs and money flying around, it shouldn’t be too hard to get politicians to get onboard.

If that can’t pan out, then splitting the front-end and back-end processing is a nice idea. Verification technologies might be able to do the trick. Maybe, maybe not. With hardware, especially EMSEC, there’s so many different ways to leak information or modify functionality that it’s hard for me to predict what they can accomplish with this method. Having a fab here carefully built and run by vetted Americans is an option I have more confidence in.

While we’re at it, I’ll throw in another thought. I had recently been working out how to build a fab in a subversion free way that multiple nations could trust. Essentially, each would be having their most trusted people making components and monitoring each others’ activity during installation and configuration. After the production floor layout is done, each spot would be filled with a randomly chosen component. Trusted people from different nations would run and monitor the facility. During production, fabs would be randomly assigned to production machines. Aside from all the monitoring, there’s no way for a customer to know which devices made their chips and the devices themselves are built by mutually distrusting parties. Another plus is that the nations spending all this money will want trustworthy chips more than subverting others’ chips, so all parties involved would be motivated to not subvert their hardware.

Bryan Feir • July 5, 2011 4:32 PM

@Nick P:
Not so sure about the last line there. After all, if you can subvert the production in such a way that you can detect and work around it after the fact, that means that you can treat the ‘subverted’ chips as normal but still use them against others.

chris • July 5, 2011 4:45 PM

@Aesop: “The Soviet Space Shuttle was a rejected NASA design…”

How could we tell? Ours SHOULD have been rejected, too.

How many catastrophic failures did the soviet shuttle have? Maybe we sent/kept the wrong designs….

Andy • July 5, 2011 5:12 PM

@Clive Robinson, just think about something I hope you can answer, with that theory about reading crt and jamming secuirty camarea using the clock speed of cmos, could you modifer that princple and remoltly sniffer the data on the fsb of the computer.
It would be intersing.
Cheers

And tempest shouldn’t work, as its earthed, and with high negtive voltage pluged into earth plus rain and salt you could remolty sniffer any temptest network.

RobertT • July 5, 2011 7:01 PM

@NickP
Sounds like a fantastic idea but with one small snag, namely the fab yield would be zero. Now given the intended customer, is the DOD, that may or may not be a problem. Just another DOD SNAFU

BTW in terms of the possible vectors to subvert a chips function, modifying the processing equipment would be one of the most bizarre methods since you are not altering the projected image I don’t really understand how the function is substantially changed. Sure you could easily generate random shorts or opens and parametrically degrade the analog + reduce the digital speed but that only really effects yield, not function.

To change function you must either change the Masks or add or delete polygons to the masks or the chips themselves (FIB / Ebeam)

Richard Steven Hack • July 5, 2011 9:22 PM

Clive: “It does not matter if they understand the internals or not in most cases it’s what goes on at the inputs and outputs during verification testing that give that game away to those who can think and are payed to do so.”

Agreed. As in my argument with Nick P., “where there’s a will, there’s a way”. Maybe not in absolute terms, but close enough for horseshoes – or government work.

This is why I believe such approaches – trying to construct something which is “high assurance” or “un-subvertible” – are a waste of time. Better to operate under the assumption that your enemies know your intentions, your plans, your vulnerabilities, and as Richard Marcinko said, “always assume Murphy is running the show.”

In other words, stop trying to control the universe. You don’t and you can’t. The only thing you can control – IF you’re trained to – is your response to what happens to you. As Masaaki Hatsumi, the head of Togakure-ryu ninjutsu has said, there are no guarantees. No matter what you do or how well trained you are, you can lose and you can die. Translating that to computer security terms, we end up with my meme.

Which is not to say that the US military shouldn’t be worried about compromised chips manufactured in other nations (and including Israel is that list, not just China). But they’d probably be better off just doing a decent inspection program than all this complicated “outwitting the other guy” stuff.

Nick P: I agree that the DoD definitely could fund their own fab. Hell, even the intel community “black budget” could fund their own fab without it being even in the official budget.

As for screwing other government’s programs, the CIA tried to screw Iran’s nuclear program by offering them plans for a nuclear trigger which had flaws in it. But the third party scientist they hired to pass the plans to the Iranians spotted the flaws and, worried that the Iranians would see them and realize the scam, he TOLD THEM about the flaws. The end result was that the CIA gave Iran the plans for a nuclear trigger that worked!

Except of course Iran has no nuclear weapons program (other than a research database which the military of any nation threatened by nuclear neighbors would have as a matter of due diligence), so that was a complete waste of everyone’s time.

RobertT • July 5, 2011 10:29 PM

I’ve had a chance to read through all the available information, and concluded it’s an interesting task, well worthy of my utmost concentration, especially if I were suitably remunerated.

Bottom line, however, it’s a Boondoggle, so it’s something that young aspiring engineers should distance themselves from.

Unfortunately at a certain age, one seems to loose the youthful indigence towards Boondoggles and from that age onward the only distinctions worth considering are:
Is it my Boondoggle?
Or your Boondoggle?

For the British readers, convert the word Boondoggle to Quango.

Richard Steven Hack • July 5, 2011 10:42 PM

Ultimate proof that TSA security is a joke:

TSA Can Grope Dying Old Ladies; But Can’t Catch Guy Boarding Flight Illegally?
http://www.techdirt.com/articles/20110702/01460214949/tsa-can-grope-dying-old-ladies-cant-catch-guy-boarding-flight-illegally.shtml

Referenced CNN report:

FBI: Stowaway slips onto cross-country flight
http://articles.cnn.com/2011-06-30/travel/flight.stowaway_1_delta-flight-flight-crew-tsa?_s=PM:TRAVEL

Caught by his SMELL the SECOND time!

All this guff about “detecting weapons” and the guy walks on the plane!

The TSA does say: “TSA’s review of this matter indicates that the passenger went through screening. It is important to note that this passenger was subject to the same physical screening at the checkpoint as other passengers.”

So he had no weapons – but he also didn’t have any proper boarding documents! And are they SURE he went through the screening – they have video proof? (Not that he had any reason to evade the screening since he was relying on his phony documents.)

TSA says: “In an updated statement Thursday, the agency said its “initial review of this matter indicates the officer reviewing the passenger’s travel documents did not identify that the passenger was traveling with improper travel documents.”

Duh!

“A Virgin America spokeswoman said the airline “maintains security and other screening systems in place to prevent such an occurrence; however, in this case it appears staff may have missed an alert when the passenger presented a boarding pass from a prior flight.”

Duh! It’s that simple to evade their “screening systems in place to prevent such an occurrence”?

“After discovering that Noibi should not have been on the flight, the crew kept him “under surveillance, but at no time felt there was any threat to the security of the flight,” the statement said, adding that the man slept for most of the flight.”

I’m so happy no one is at risk from a sleeping terrorist…

“The man whose name was on the boarding pass told Hogg that his boarding pass had disappeared from his back pocket after he took the subway to the airport last Thursday, the day before the flight Noibi was on, according to the affidavit.”

That was easy.

“It was not clear how Noibi got to the gate for the flight at JFK.”

Exactly.

“He also said he spent the night at LAX in the secure portion of the airport, the affidavit said.”

Like those guys who were racing wheel chairs in a secure area of a Texas airport…

“Noibi claimed he was able to go through passenger screening by obtaining a seat pass and displaying his University of Michigan identification and a police report that his passport had been stolen.

Authorities found he had two boarding passes in his pocket and more than 10 in his two bags. “Noibi did not have any boarding passes in his own name,” the affidavit said.

FBI spokeswoman Eimiller said the FBI has not determined how he came into possession of the boarding passes.”

I’d say that is probably obvious – the same way he got the other guy’s – pick pocketing.

Idiots.

Richard Steven Hack • July 5, 2011 10:54 PM

I find this funny. I was reading about the Army’s $2.5 billion computer system for battlefield analysis which crashes all the time:

US Army spent $2.7 billion on a battlefield computer that doesn’t work
http://www.extremetech.com/extreme/89048-us-army-spent-2-7-billion-on-a-battlefield-computer-that-doesnt-work

…and a link to more about it which goes here:

https://secureweb2.hqda.pentagon.mil/VDAS_ArmyPostureStatement/2011/information_papers/PostedDocument.asp?id=151

Firefox gives me the “This connection is untrusted” dialog on that site!

Bwahahahahahaha!!!

I know the reason is Firefox hasn’t a clue about that site, but the circumstances are just…well, funny. 🙂

Clive Robinson • July 6, 2011 2:13 AM

Here is a thought for people to dwell on.

In most of history our security or defence has relied on individuals.

As (currently) it’s not possible to see inside our heads sufficiently well to tell if a person is trustworthy or not it means in reality that they cannot be trusted.

So when it comes to humans we have put systems in place to deal as best we can with the problem but still get work done.

Now the question arises why should silicone be treated any differently?

That is do we actualy need chips that are provably correct or can we build systems where they don’t need to be “trustworthy” in 100% of cases 100% of the time?

After a little thought you will realise that we don’t, all you need is some way of determining when they are not working the way they are expected to, and the question changes to “what do we do when an error is detected?”

In an IEEE article linked to by @ Sam (10:43AM) one of the people asked made a comment about using a 512bit keyword as a “kill switch”.

Essentialy the base of their argument was that an adversary could place a circuit inside a chip that when a particular number was presented to it the chip would stop working either entirely or the way it was expected to. Their argument then went on that there was not enough time to go through all 512bit combinations to test that such a “kill switch” circuit did not exist within a chip.

Whilst the argument is correct in of it’s self, it is based on a set of assumptions that are not necessarily correct for any given end system.

That is it assumes a few things for the argument to hold three of which are,

1, The adversary can get the 512bit number into the chip when it’s in a system.
2, That a “single point of failure” will stop a system from working.
3, That only that chip and that chip alone is in use in a system.

Change any of these (and several other assumptions) and the argument does not hold.

The first assumption is actually very iffy and based on an argument that the adversary is omnipitant.

That is when the chip is designed the adversary knows exactly what sort of system the chip is to be put in, but also “knows the system” sufficiently well that they can guarantee getting the 512bit number to the correct pin as and when required.

Whilst this might be likley for front end devices in open communications systems it is by no means true. Importantly the further you get into a system the less likley it is to be true.

As a general case this sort of “omnipitance” only lies with the system designer not with a sub-component designer. Thus the less of the system goes on any given chip the less likley this attack is to work.

The second and third arguments are well known to design engineers and have well known solutions that originated from trying to design reliable systems using unreliable components and suppliers.

One of the logical results that was seen was the use of “voting protocols” by NASA to get the level of reliability required.

Put simply critical parts of the system where designed and built by three seperate unconnected organisations. The three different parts where all used in parallel in the final system. Each part received the same inputs as the others, however the outputs were put into a voting system as long as they all agreed then the action was taken. However if one part disagreed with the others the majority vote was used.

So for this “Kill Switch” idea to work with a voting system sufficient parts have to have kill switches built in that the adversary knows about and can actually use.

If you think about it a little further the voting idea can also work with a single device but using time displacment, and this is how we generaly realise a human has become unreliable. That is those around them notice that their behaviour has changed to a given set of circumstances and this raises alarm bells.

Similar detection systems can be used around chips, and inside chips when usuing imported blocks from other designers.

RobertT • July 6, 2011 4:51 AM

@Clive R

In general a good analysis expect that we are already firmly in the age of Systems on chip (SOC). As such the inputs are RF / Analog signals and the output is the desired display (eg GPS system displays location) There is only one chip doing the entire function.

Typically the ideal integration level is determined by yield, so the cheapest system occurs at the 90% to 95% yield point. typically this is a chip of about 50mm2 die size. In today’s technology you can integrate 1M gates / mmsq. The resultant die costs less than $5USD to produce.

For your reference an 8051 microcontroller requires about 10K gates, so a 50mm2 chip could contain 1000 full microcontrollers each executing their own independent program.

Today many chips are starting to look like several ADC’s and DAC’s feeding arrays of processor cores. The function might be a cell phone or a TV controller but the actual chip looks very similar. The chip function is determined by the code loaded onto all the micros.

There are two points I’m trying to make
1) The discrete single function chip market is slowly disappearing. (no more standalone DSP’s ADC’s PLL’s they are all integrated together)

2) Because of 1) supply of discrete functions is getting more difficult. so the DOD needs to look at changing from a model of assembling systems based on available discrete functions. The new model will involve finding alternate ways to utilize very high volume chips.
(e.g 3G baseband cell-phone SOC with RF + ADC’s + RX + Tx + uC’s + DSP + HID +Turbo, LDPC, IFFT/FFT… gets used with a different program, and maybe some different metal layers, to make a completely new Military radio. By doing this they leverage commercial technology, which moves much faster than military technology. Since the die cost is VERY low for this approach the defense companies can afford to purchase 10 times the number of parts that they ever expect to need. They just store the rest at a wafer level.

We are not there yet, but it is clear that the old defense model is dead.

annn • July 6, 2011 6:04 AM

At the policy level, it is difficult to deal coherently with a country that is simultaneously friend and foe. We buy chips from China as a friend, but the chips are delivered potentially counterfeited or sabotaged as a foe.

Tackling conflicting, two-sided political relationships may be as tough as dealing with the technological issues of creating secure chips and by some standards not politically correct.

David • July 6, 2011 6:39 AM

One way to deal with the problem of counterfeiting is to reduce the performance timeline to a point where it is impossible to take a never-before-seen design (this could potentially include significantly altered designs) and deliver completed chips without “unexpected delays.”

By writing this into the delivery contracts, the “no fault, no harm” clause merrily kicks in.

Clive Robinson • July 6, 2011 8:36 AM

@ RobertT,

“… expect that we are already firmly in the age of Systems on chip”

Yes and thereby hangs the problem, as you noted,

“… so a 50mm2 chip could contain 1000 full microcontrollers each executing their own independent program.”

It is not realisticaly possible to check all of these CPU equivalent cores on a chip, possibly only at the design stage and that’s questionable these days (how many SOC designers actually do any gate level development these days? as I understand it the majority just use libraries of functional blocks).

Which means effectivly we can not get what IARPA is chasing after in the way described, hence why I think we should look at other ways.

As you note breaking SOC systems down into seperate functional blocks on seperate chips is not likley to happen, not only for efficiency but also because some systems just won’t work if you try. However from a security aspect it should be high on the list of consideration as a method.

So let’s make an assumption that we can do things the “masked programable way” That is suppliers make available various functional blocks on chip but not connected up. The DOD or who ever puts out the spec for the same functional blocks to N independent suppliers and get back N different basic chip wafers that the DOD then track up as required.

The question then arises do they actually need to trust any of these functional blocks if they have some way of verifing their black box function in use?

I suspect that with some appropriate thought this could be done (with some loss of performance both in functional power and speed).

If you remember back some time ago Nick P and myself where having a conversation about methodology (Castles-v-Prisons). My view point was to have a large number of low gate count CPU cores connected to a shared bus to various resources such as system memory. However the important thing is the connection was not direct it was through a hardware interface the likes of an MMU that was controled by a hypervisor. That is the CPU was the Prisoner the MMU interface was effectivly the cell&door and the Hypervisor the trusty or guard. The CPU had no knowledge of any external environment and was limited in any function it performed and any memory resourses it was allocated. The hypervisor is responsible for loading the program into an arbitary CPU resource area and all comms would be via a write only + read only buffer mechanisum controled by the hypervisor. Importantly the Hypervisor can halt any CPU core and examin it’s registers state etc at will but importantly the CPU core would be unaware of this. Likewise when the core writes to a buffer the CPU core is automaticaly halted the Hypervisor reads and acts on the buffer contents and puts the appropriate information into the read only buffer and un-halts the core.

At first it appears overly complicated, however the idea is to segregate the CPU core from other system activity and have a minimal function that can be easily verified by the use of other cores and direct examination of the state. Also each function a CPU core could carry out has a signiture interms of CPU cycles/time memory/register usage and I/O buffer behaviour. The signiture is monitored by the hypervisor system.

The design assumption was that the CPU cores although trusted in terms of gate layout etc could not be trusted in the code they were running (ie it might be possible to get malware in). Likewise the trusty level hypervisors were only trusted a little further and so on up a multilevel hypervisor stack to the governor hypervisor. However the CPU core would never actually see or be able to influance the trusty layer and so on up the stack.

Now the question becomes can I extend the model to use untrusted CPU cores etc? and I think in some respects it can be done.

However at some point some hardware needs to be trusted, and originaly I assumed the hypervisor at some level was not a general purpose CPU but a limited functionality state machine with all states known.

So the question becomes can I have untrusted CPU cores on chip and a series of uncomited gates, that I can wire up by the mask to provide a trusted state machine. Obviously the wire up of the mask in this area remains unknown to the chip supplier as does anything other than the base functionality.

Richard Steven Hack • July 6, 2011 1:49 PM

Clive: I believe the voting model was brought into play by NASA after their previous method of having TWO systems check each other failed. I think it was a Venus probe or something like that. One of the CPUs failed causing it to believe that the OTHER CPU had failed and it shut it down and began to take the craft off course. NASA had to “trick” that CPU from the ground into believing that it had failed so it would relinquish control to the other correct CPU.

Switching to three systems made that sort of failure harder to occur. 🙂

Dirk Praet • July 6, 2011 5:55 PM

All other aspects aside, it’s really quite a laugh that a nation that spends 2 billion USD a year on airconditioning for the troops resorts to buying rigged Chinese cr*p to save a few pennies. I wonder if anybody is investigating whomever was in charge of procurement and to which lobbyist(s) the money trail leads.

Nick P • July 6, 2011 6:16 PM

@ RSH

“Switching to three systems made that sort of failure harder to occur. :-)”

I once read a statement from a guy who designed storage area networks that, if it wasn’t at least triple redundancy, it was going to fail. And he wasn’t even so confident in that. 😉

@ Clive Robinson

Yeah, three independently developed systems and a voting protocol. Remember last time you brought this up I pointed out that it’s developed into a new approach to COTS security called security through diversity. The idea is to reduce the effectiveness of exploits without really getting the developers to write good code. NASA’s scheme was the original one and I took it further in designs where the instruction sets and OS also differed, although they all had a POSIX interface.

The recent approaches have been automated source transformations and ways of converting a program into a custom, unique instruction set run through a VM. The original approach is proven in theory and in the field. I think security through diversity is a good idea and deserves more research funding, but I can’t say I feel too great about the recent developments. It will only stop low-level attacks on things like C++ programs, whereas application-level attacks still make it because the logic itself is flawed.

RobertT • July 6, 2011 7:21 PM

@Clive R
On a Single chip system, all global and local interconnect use the same 6 or 7 metal layers. Generally cell level interconnect is don on M1 and maybe M2. Usually global power supplies are on the upper 2 layers of metal (M6 and M7) but the middle layers M2, M3, M4, M5. are used as routing resources at the disposal of the top level place and route program. Generally there are some attempts made to keep the lower layer metals (M2, M3)used locally while (M4, M5) are more commonly used at a block / function interconnect level.

OK, so why the lesson on routing , because it is important to understand what a complete mess top level SOC chip routing is, and because of this how easy it is for someone to compromise a chip at the routing level. Think about 50M gates (4 transistors / gate = 200M transistors) each has at least 3 vias so there are 600M vias on the chip (connections between layers)

Now you can employ some brilliant digital guy who can calculate possible information flow along the available signals. He will tell you that Two wires can only have 4 possible states. However ask an analog guy and he’ll tell you that he can easily pack 8 bits of information onto each of those routing wires. What this means is that your formal verification analysis, of the possible information flow is complete BS for just two wires, so what’s possible with 20 million wires.

Good you might say that I cannot make an analog comms channel because I cannot make ADC / DAC out of digital gates, and you would be almost correct, most IC engineers could not do this, but it is incorrect to suggest that No IC engineers know how to do this.

I know none of this is new to you, so I’m just trying to point out that even in the perfectly designed and specified system where critical signals are checked and monitored by a hypervisior it would be possible for two blocks to communicate using a built-in side channel.

A simple case of two digital signals (High > 0.7V Low< 0.4V) I could maintain these signal levels at all times but still add say 3 bits of amplitude information 4 level above 0.7V and 4 levels below 0.4V ) these could be 100mV levels which would be very easy to detect with normal logic looking gates but with ratioed N/P gate sizes. The driver is a little more difficult. So this is crude 3 bit ADC built out of gates.

I could also communicate on the differential mode of the two signals (get about 1/2 bit extra information) this comms could happen easily with 10mV difference in the two levels.

There is also the problem of signal cross talk to consider. Layout / digital design tools only consider signal cross talk to be a noise margin / signal integrity issue, so as long as the coupling is insufficient to cause erroneous bit flipping they decide there is no problem. Well this does ignore that someone might intentionally route an Analog signal close to a secure Digital signal (never contacting it) but close enough to couple say 10mV of signal. For an Analog person it is trivial to restore the 1V signal given 10mV of coupled signal. So again where even the best formal verification digital tools would declare that there is a perfect match function to logic to layout. There is a signal coupling side channel that gets missed.

In this SOC complexity is the enemy of security because the chip complexity has moved so far beyond anything that any human can ever hope to check by hand so we must believe what the tools tell us, however as I’ve just tried to point out the tools are very easily tricked.

Nick P • July 6, 2011 7:29 PM

@ RobertT

Maybe we just need to abandon current silicon strategies altogether for certain critical chips. Maybe microfluidics, alternative silicon strategies, micro-mechanical strategies. Could take another look at that declassified NSA file that showed all the radically-different computer designs they used over the years. Might give us some ideas. I figure I should just modernize the Babbage machine and be done with it. lol

Additionally, Robert, what do you think about homebrew CPU’s like Magic-1 made out of simpler or more easily verified parts? Any potential in that area?

Magic 1
http://www.homebrewcpu.com/

Some others, including PDP reimplementations
http://www.homebrewcpu.com/links.htm

Richard Steven Hack • July 6, 2011 9:05 PM

RobertT and Nick P: One word. Nanotech. Maybe Drexler’s rod-and-cone mechanical nanocomputers. 🙂

Then figure out how the brain works and dropkick all this stuff for real conceptual processing AI and learn to deal with a machine that hallucinates… 🙂

“What this means is that your formal verification analysis, of the possible information flow is complete BS for just two wires, so what’s possible with 20 million wires.”

Exactly what I said in the earlier thread about “high-assurance operating systems”. They can’t exist, not formally proven ones anyway.

Nick P • July 6, 2011 9:09 PM

@ RobertT on July 5

“BTW in terms of the possible vectors to subvert a chips function, modifying the processing equipment would be one of the most bizarre methods since you are not altering the projected image I don’t really understand how the function is substantially changed. Sure you could easily generate random shorts or opens and parametrically degrade the analog + reduce the digital speed but that only really effects yield, not function.”

Remember, I’m not a hardware guy and I know almost jack about building it. That’s why I made no assumptions about what could be subverted and to what end. I leave that to you hardware gurus. I figure it’s only going to be a certain segment of the whole processing chain that really needs subversion resistance.

Another thought came as I was reading the Wikipedia article on chip fabrication. I noticed that some processing features could strengthen the electrical properties of certain spots. Couldn’t a subversion of this equipment be used to cause EMSEC failures. For instance, an attacker knowing which part of a cryptocontroller was processing key material might increase that parts strength to leak the critical operations via EMF. Is something like this possible?

Again, I’m speculating based on very little information or knowledge. I just figure we’d take one of two approaches: assume things like this can’t or won’t be subverted (risk management); assume the highest subversion potential of every device and protect any that pose any risk (risk mitigation). The level of sophistication required for fab subversion means I’m leaning toward risk mitigation.

Again, though, I’m not qualified to assess the risk level of any part of the process except where the specs are given to the factory and fed into the components: that should require integrity/authenticity checking because, as you’ve pointed out, modifications are easiest there. As for the rest, domain experts determine the risk & requirements and I just come up with a safe way to design/build it.

RobertT • July 6, 2011 10:22 PM

@Nick_P
“Couldn’t a subversion of this equipment be used to cause EMSEC failures. For instance, an attacker knowing which part of a cryptocontroller was processing key material might increase that parts strength to leak the critical operations via EMF. Is something like this possible? ”

The EMF emanating from normal chip interconnect wiring is practically immeasurable, because in general antenna lengths are too short to be useful. The longest local interconnect within a CryptoCPU might be 100um long so it is only a useful RF antenna for THz radio waves, which are easily absorbed by the packaging materials.

Trying to increase a barely measurable EMF effect by miss-processing the chip is kinda beyond wishful thinking.

If you want to prevent subversion of the chip, you need to be focused on protecting the Mask’s (up to 40 layers for modern 40nm devices) and the Wafers / chips themselves during processing.
Additionally you need to protect the chip creation data-base, although once the chip is deployed, you must assume that the attacker has extracted his own version of the chip, so the protected data-base is only useful if you can guarantee that attackers will never gain access to the actual chips. That’s a physical security problem

I’ll give some thought to any possible processing equipment attack that could result in chip functional or EMSEC change but it seems unlikely to me. The only thing that comes to mind is a possible race condition exploit. Example is two identical signals are created and feed through a long series interconnect of Metal 2 to Metal3 steps for one and metal3 to Metal4 for the other. assuming all the processing is perfect the loadings of the two paths would be identical so the two signals could arrive at a phase comparator at the same time. If however I mis-processed the M3/M4 via I could substantially increase the resistance and thereby selectively slow this path. But this is a combination of weird digital self timed logic with mis-processed chips to create a logic fail / selectively enabled switch.

RobertT • July 6, 2011 11:07 PM

@Nick_P
“Maybe we just need to abandon current silicon strategies altogether for certain critical chips.”

For exotic apps there are no problems finding and using expensive alternate strategies but for everything else there is Silicon.

Think of smart munitions, like an RPG round that costs only $5 extra but can be fired over a wall and look at the uniform of the guys on the other side of the wall before deciding to explode or not. Maybe it even has integrated Identify Friend or Foe systems which ping RFID tags built into the NATO uniforms. My point is all this is possible Today with a single $5 chip, if the military can find ways to piggy-back onto commercial high volume products.

Think of VERY cheap drones built from hobby level model airplane kits that use GPS to control flight but use 3G networks to send back the pictures, change the flight plan, and maybe even selectively fire weapons. This product is possible for under $10 extra if commercial cell phone chips are utilized.

At the moment the US military / industrial complex wants to sell the US govt the same concepts that I’m outlining, BUT wants to collect $1M per system. The US strategy only works until someone else (say the Russians or the Israel ) decide that 10K units of a cheap smart small $100 drones beats the US $1M system and delivers equivalent capabilities.

Think about how game changing smart munitions are, especially in an Urban conflict environment. Now the US can choose to ignore the low cost way to do this, but will everyone else also ignore this opportunity?

Seafood Fanatic • July 7, 2011 8:04 AM

If they can’t also make secure fish, then I don’t want to hear about “secure chips”. What’s a secure Friday evening fast-foods meal, if only one part of it is “secure”?

Fire the whole kit-n-kaboodle of them. Let them eat secure cake – once they’ve secured it, of course!

GregW • July 13, 2011 8:31 AM

@Clive/@RSH/@NickP…

Regarding 3-way (2n+1) voting schemes to reduce/eliminate security errors (or faults a la NASA)… I suspect you might need 4-way (3n+1) due to the existence of “Byzantine faults” to resist subversion from a single source, unless your system design meets certain other properties to prevent tampering of the voting.

I noticed this possibility when reading a summary of one highly-regarded researcher’s fault-tolerance-related papers at http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#reaching (particularly the comments about that paper, “Reaching Agreement in the Presence of Faults”)

Clive Robinson • July 13, 2011 1:25 PM

@ Gregw,

“I suspect you might need 4-way (3n+1) due to the existence of “Byzantine faults” to resist subversion from a single source”

Yes in a 2n+1 system could be fritzed with one path owned, but the faults are not easy to generate because the other two systems should still agree with each other.

However going to 3n+1 means that if no pathways are owned, but a minor fault may cause two paths too go one way and the other two the other way you end up with an even vote so in a simple system you get deadlog (in a more complex system you can use voting history to see if one path is degrading but this in turn has it’s own issues). The next obvious solution is 4n+1 to give a 3of5 majority vote but this in turn has it’s own problems.

Voting protocols have advantages but… the original assumption is honest systems that are either functioning correctly or at fault and are thus switched out. Not that one system might be dishonest and therefore cheating.

One solution is using cheating protocols within the voting protocol where each system that makes a minority vote gets it’s vote value reduced. Another is to use two or more sets of voting systems but randomly flip the input values to the pathways so that any subverted system can be detected because it does not know it’s inputs have been fliped but the voting system does.

Nick P • July 13, 2011 4:08 PM

@ GregW

I appreciate your contribution. So, 3m+1 it is if Byzantine failures are an issue. Clive’s counter should make 3-way hold in my case. The main use case I have for triple modular redundancy is increasing the trustworthiness of program verification, compilation, signing, encryption, etc. These can be encapsulated in a simple process whereby each unit receives the same input, does some stuff, and sends the output to the voter. They don’t share among each other, so if one is compromised then nothing happens.

This setup can still only handle one compromise. This is why my designs use the strongest, most minimal software possible for each unit. It would consist of an OS, a memory storage area, the executable program, a simple IO system, some crypto, and possibly a execution manager of sorts. I could employ three stripped DO-178B or EAL6 RTOS’s in certified configuration. The likelihood of even one compromise is small in this case.

If we wanted more, then we’d just add more platforms. Handle two failures? “3 out of 5” (Clive) Handle 3 failures? 4 out of 7. I’d also use mix up the CPU, board types, software sources, protection mechanisms, and compilers used. The voter would be as robust as anything could get, reusing existing EAL6-7 hardware/software (AAMP7G processor + verified voter software).

Here’s an example setup:

Integrity-178B on PowerPC
VxWorks MILS on different PowerChip
LynxSecure or LynxOS-178B on hardened x86
Xstraatum on Leon SPARC
PikeOS on MIPS board
OKL4 Microvisor on ARM
VAMOS microkernel on VAMP (both formally verified)

Any system that can be modeled as a series of single steps transforming input into output could run on this system. The resulting system would require three compromises to subvert the output. Each of these systems has been thoroughly vetted, pentested and/or deployed in the field. The odds of taking down one is slim. The odds of taking down three is so slim that I’d classify the system as a whole “high assurance” so long as the controlling unit is the same.

The controlling unit would probably do IO, voting, randomized choice of message targets, and the ability to load (signed?) software onto any of the seven nodes. Again, we might reuse efforts in DO-178B or security certifications. I’ve noticed that these high assurance smart card platforms actually have a lot of this functionality in them. If designing the controller for high assurance, it might pay to reuse some of the components of MULTOS or Caernarvon, at least the high level, formal design. (Formal requirements, security properties, design & correspondence proofs are half the work, with implementation & testing the other half.)

RobertT • July 13, 2011 6:35 PM

@Greg,
I understand how triple redundancy is intended to work for something like an ABS breaking system, because I’ve actually designed them, but it is a little less clear to me that this helps for secure chips.

If the intention is to reduce the failure space by creating 3 independent systems that that makes sense, but with secure chips you are usually more concerned about critical keys leaking from the chips than the electrical failure of a particular chip. It seems to me that Triple redundancy in this case just increased the attack surface area without adding any additional security.

Typically the triple redundant system will have three different software / firmware programs running in parallel with a voting system 2 of 3 determining actuator actions. The system usually starts life as a formal definition of the problem given to 3 independent teams whereby the resultant code (from each team) is formally verified to the requirement spec. In my mind triple redundancy in a crypto sense actually creates additional timing side channels. Assuming that all 3 are working correctly, unless they are clock cycle locked than one software might happen a cycle or two earlier than the other two (maybe because they were smart about some multiply algorithm) This voting system is now leaking timing information that would otherwise be difficult to discover.

Nick P • July 13, 2011 7:49 PM

@ RobertT

“In my mind triple redundancy in a crypto sense actually creates additional timing side channels. ”

I agree. It’s why my uses require a trusted administrator and I guess you could say they do batch processing.

RobertT • July 14, 2011 12:55 AM

@Nick P
I know it is probably just my paranoia showing But don’t you ever ask yourself who is really behind the developments of these secure OS’s

Take MULTOS and follow the money trail, it takes you to lots of interesting places….

Integrity: is not really targeted at embedded RTOS so it’s not on my radar.

L4…lots of interesting DNA went into that pool and got mixed to form NICTAL4, OKL4, Qualcomm’s kernel (forget the name), TUV.RTOS…

PikeOS …oops we’re back in Germany!

Andy • July 14, 2011 1:54 AM

@RobertT, “Integrity: is not really targeted at embedded RTOS so it’s not on my radar.”

with the added benefit of computers are never wrong…don’t pay for your trip or take a ride in a cop car.

Got to love digital ids 🙁 😉

Nick P • July 14, 2011 9:33 AM

@ RobertT

“Integrity: is not really targeted at embedded RTOS so it’s not on my radar.”

Surprising comment from you. Actually, the Integrity line was an embedded RTOS that was ONLY targeted at medical, aerospace, industrial, etc. until about five years ago. They even bill themselves as the “largest independent embedded software provider.” You should look at their web site sometime. Most of their products and partners are still targeted for embedded boards, with support for many good ones, but they’ve recently retargetted their platform to support phones, laptops, desktops, etc.

Research in Secure Chips

Comments

Leave a comment Cancel reply