Comments

echo July 2, 2021 7:32 PM

https://www.independent.co.uk/news/education/education-news/tiktok-fake-covid-positive-test-uk-b1876087.html

Videos circulating on TikTok that show British teenagers sharing tips and tricks for faking positive Covid tests to get out of school have been viewed millions of times.

The little blighters! That said some of the more hardline scientists aren’t happy with how the UK government handled the pandemic or children returning to school sans vaccinations. I’m undecided which way to take this one.

https://www.theguardian.com/world/2021/jul/02/mps-condemn-social-media-firms-chris-whitty-death-threats

Social media companies have been accused of “taking their eye off the ball” over online death threats against public servants such as Prof Chris Whitty after he became the focus of violent language in channels set up by anti-lockdown activists and members of the far right.

I’ve mentioned before there are members of the Tory party who are far right and view the Tory party as a “soft touch”. They are not slow to use genocidal type language or death threats. The wave of terrorist and extremist action taking place within the UK, Europe, and the US as well as this more targetted action by far right anti-vaxxers fits in with this overall pattern.

The description they have of themselves of “common law policemen” whiffs of Ye Olde England Magna Carta mythology worshipping brownshirt vigilante’s.

echo July 2, 2021 7:45 PM

https://techcrunch.com/2021/07/01/twitter-considers-new-features-for-tweeting-only-to-friends-under-different-personas-and-more/

With the proposed Trusted Friends feature, users could tweet to a group of their own choosing. This could be a way to use Twitter with real-life friends, or some other small network of people you know more personally. Perhaps you could post a tweet that only your New York friends could see when you wanted to let them know you were in town. Or maybe you could post only to those who share your love of a particular TV show, sporting event or hobby.

[…]

The third, and perhaps most complicated, feature is something Twitter is calling “Facets.”

This is an early idea about tweeting from different personas from one account. The feature would make sense for those who often tweet about different aspects of their lives, including their work life, their side hustles, their personal life or family, their passions and more.

Unlike Trusted Friends, which would let you restrict some tweets to a more personal network, Facets would give other users the ability to choose whether they wanted to follow all your tweets, or only those about the “facet” they’re interested in. This way, you could follow someone’s tweets about tech, but ignore their stream of reactions they post when watching their favorite team play. Or you could follow your friend’s personal tweets, but ignore their work-related content. And so on.

This is an interesting idea, as Twitter users have always worried about alienating some of their followers by posting “off-topic” so to speak. But this also puts the problem of determining what tweets to show which users on the end user themselves. Users may be better served by the algorithmic timeline that understands which content they engage with, and which they tend to ignore. (Also: “facets‽”)

I don’t know how this will work in practice and it isn’t an exact clone but this isn’t unlike the scheme I proposed a few weeks ago of a network of trust and aliases with a common root. Twitter is managing it within their own platform which comes with its own skews and, of course, it’s really only a local system to them not a global system capable of being used across the entire network. Twitter have the root and alias links which they would otherwise not have access to.

SpaceLifeForm July 2, 2021 10:29 PM

Quantum Perception

The illusion of the Ames Window is understandable because there is movement.

Here are two still shots. There can be no movement. Right?

hxtps://twitter.com/AkiyoshiKitaoka/status/1409661706603679748/photo/1

hxtps://twitter.com/AkiyoshiKitaoka/status/1410234484343988226/photo/1

No movement, right?

Right?

David Rudling July 3, 2021 4:24 AM

And today’s supply chain attack is

ht tps://us-cert.cisa.gov/ncas/current-activity/2021/07/02/kaseya-vsa-supply-chain-ransomware-attack

(fractured url)

Clive Robinson July 3, 2021 8:11 AM

@ Fake,

No, really. They stole my logins and all my friends logins. Now? I am here.

+1 the way you put that has made me smile (not that easy when you are as old and dripping with cynicism as I am 😉

But more seriously, if you read my comments yesterday about passwords and the large Silicon Valley Corps over in,

https://www.schneier.com/blog/archives/2021/07/more-russian-hacking.html

You can see why I’m not in the slightest supprised.

echo July 3, 2021 10:49 AM

https://theconversation.com/covid-19-kids-are-using-soft-drinks-to-fake-positive-tests-ive-worked-out-the-science-and-how-to-spot-it-163739

Children are always going to find cunning ways to bunk off school, and the latest trick is to fake a positive COVID-19 lateral flow test (LFT) using soft drinks. So how are fruit juices, cola and devious kids fooling the tests and is there a way to tell a fake positive result from a real one? I’ve tried to find out.

[…]

Is there then a way to spot a fake positive test? The antibodies (like most proteins) are capable of refolding and regaining their function when they are returned to more favourable conditions. So I tried washing a test that had been dripped with cola with buffer solution, and sure enough the immobilised antibodies at the T-line regained normal function and released the gold particles, revealing the true negative result on the test.

Children, I applaud your ingenuity, but now that I’ve found a way to uncover your trickery I suggest you use your cunning to devise a set of experiments and test my hypothesis. Then we can publish your results in a peer-reviewed journal.

Smug “certified professional” forgets that the fake was A.) Good enough to fool the test and B.) Discovered by children yapping on TikTok.

https://www.theguardian.com/society/2021/jul/03/allowing-people-in-england-with-covid-both-jabs-to-skip-quarantine-will-cause-resentment

Allowing those who have received two doses of a Covid vaccine to skip quarantine could breed resentment and result in mass non-compliance, a scientific adviser has warned.

Downing Street has confirmed it is looking at whether to drop all legal self-isolation measures for fully vaccinated people who come into contact with someone who is infected “as part of the post-step 4 world”.

It comes after the Times reported that a meeting of the Covid operations committee would take place on Monday at which ministers are expected to sign off a plan that will mean those who have been double-jabbed will be “advised”, after 19 July, to take daily tests but not be required to do so.

Robert West, a professor in health psychology at University College London’s Institute of Epidemiology and Health, told Times Radio he could “see the rationale” for the policy, but that there were significant problems that “outweigh potential benefits”.

[…]

However, Dr Bharat Pankhania, a senior clinical lecturer in communicable diseases at the University of Exeter’s medical school, said he thought it was “perfectly OK” for people who had received two doses of a coronavirus vaccine to be exempt from quarantine measures.

He told BBC Radio 4’s Today programme: “The gold standard would be to be cautious even if you have been immunised twice – in other words, fully immunised.

“However, as a measured action going forward I think it is OK and my reasons are as follows: an immunised person is less infectious and furthermore, the testing of people who are in quarantine isolating is pretty inaccurate. So balancing both, I think it is perfectly OK.”

[…]

West warned infection rates were “getting out of control”, and that the discussion was “a bit of a distraction” from rising case numbers.

“The hope is that of course that it won’t reach the levels that we saw in January, but it doesn’t have to, because the NHS has already been so badly hit, and the staff are so stressed now,” he said.

“We have to catch up on routine [treatments], so it wouldn’t take a lot for the NHS to be very severely affected.”

[…]The British Medical Association said that keeping some protective measures in place after 19 July was “crucial” to stop spiralling case numbers having a “devastating impact” on people’s health, the NHS, the economy and education.

Having put myself in near complete lockdown since day one and still maintaining “high assurance” levels of safety I am very very deliberately not using the NHS test and trace application for a number of reasons. Having had both vaccines one of those reasons is if I don’t use it I won’t have it flag me as potentially infected if I walk past someone who is. I’m not having my life turned upside down any more than it is.

I stupidly brought the second jab forward by one week from 12 weeks to 11 weeks because the government was hustling people due to the young getting impatient efficacy dropped from 90% to 70%. I discovered both of these items a few days afterwards.

I would like everyone to note the word “advise” in this article very very closely. The word “advice” is not dropped in there by accident. This is the microscopic tip of an extremely large legal iceberg. What this means is A.) You can listen to whichever doctor you want or completely ignore them. B.) Doctors are a stack of unexamined policies and practices with varying degrees of expertise someonetimes informed by third parties and they cannot always for a consensus of uncertain provenance among themselves and C.) It’s on you.

The word “advise” isn’t just there to get the government off the hook for making a “sovereign decison” based on “informed consent” but also a legal device to get them off the hook for legal action should anything go wrong one way or another.

This gets more and more murky the deeper you dig. The government is both legally insulating itself and has a habit of legally insulating itself from lots of other unrelated policy areas for quite some time. It is also aggressively managing the news to maintain a higher perceived competence and consent and to disguise disagreement and anger with below “gold standard” governance across a range of public policy areas. I have noted in previous comments the issue of far right activity and also throwing disabled people under the bus. These are just two indicators of a failure of governance and breaching of consensus and failure of government to act to fulfill its legal obligations with respect to security of the state and the people. I have also posted links revealing how ministers evade accountability although these only go so far.

I had another meeting with the lawyer yesterday. The meeting didn’t go as planned but it went better than I hoped. I cannot discuss details for obvious reasons but a range of law was discussed from building legal instruments to touching on public law. Having had previous experience of catching a minister lying to the House and being in the uncomfortable position of being caught by the Official Secrets Act, and experience of Whitehall mandarins acting as a buffer of deniability between ministers and institutionalised money laundering by the City I have some idea how tricky these things can be.

It’s interesting at times how blurred the lines are between state and citizen are with respect to the roles of intelligence analyst and spy, and how ministers and agents of the state may be the cuckoo in the nest. But this gets into a philosophical discussion of what is the “state”.

It is notable in law in matters political the people are sovereign.

lurker July 3, 2021 1:15 PM

@David Rudling: from Kaseya:

we are continuing to strongly recommend that our on-premises customers’ VSA servers remain offline

and then they spoil it by continuing

until further notice. 

For emphasis the latter message is repeated in all-caps

UNTIL FURTHER INSTRUCTIONS FROM KASEYA ABOUT WHEN IT IS SAFE TO RESTORE OPERATIONS.

Surely a number of parties here might suggest that is never…

lurker July 3, 2021 3:56 PM

Krebs has a story about more woes with WD NAS devices. I gave up on WD consumer grade drives some years ago: the MTBF was just too low.

vas pup July 3, 2021 4:57 PM

Israel’s new camouflage technology can make soldiers virtually ‘invisible’

https://news.yahoo.com/israels-camouflage-technology-soldiers-virtually-082656169.html

“•A new camouflage technology can make soldiers virtually “invisible,” according to reports.

•The Kit 300 is made of thermal visual concealment material that reduces the detectability of soldiers.

•The technology has been procured by the Israel Defense Forces, and is now being tested in the US.

The Israeli Ministry of Defense (MoD) and Polaris Solutions, an Israel-based survivability technology company, have unveiled a new camouflage technology that makes soldiers virtually “invisible,” The Jerusalem Post reported.

The Kit 300 is made of thermal visual concealment material that combines metals, microfibers, and polymers to reduce the detectability of soldiers.

The material, which can double up as a lightweight stretcher, makes it harder for those wearing it to be seen by both the human eye and thermal imaging equipment, according to the Polaris Solutions website.”

SpaceLifeForm July 3, 2021 5:30 PM

@ lurker

Last check, over 1000 orgs hit.

Note: this article is ancient in internet time, it’s from yesterday. But has been updated.

Also note that which did not happen here in last 24 hours. Coincidence?

hxtps://doublepulsar.com/kaseya-supply-chain-attack-delivers-mass-ransomware-event-to-us-companies-76e4ec6ec64b

To be clear, this means organisations that are not Kaseya’s customers were still encrypted.

As an example victim organisation who do not use Kaseya, Coop in Sweden have closed 800 stores indefinitely — their point of sale terminal supplier uses a Manage Service Provider who uses Kaseya

Apotek Hjärtat — also not a Kaseya customer — which runs over 390 pharmacies in Sweden have also confirmed they cannot take payments due to this incident. SJ, the government owned rail operator in Sweden which has a near monopoly on train services, also has no payment facilities on point of sale terminals on trains — they also confirmed they are not a Kaseya customer, but have been hit via supply chain.

[Cash is king. I would avoid POS terminals for a while, especially if it will not operate in chip mode]

SpaceLifeForm July 3, 2021 6:10 PM

@ Fake

Yep. The lesson here is: Do not outsource to anyone that requires internet access to your machines.

You manage your own kit.

You fail to manage, you manage to fail.

SpaceLifeForm July 3, 2021 6:50 PM

@ ALL

REvil is like the SolarWinds Orion attack. A platform to sell to attackers.

There are 4 keypairs.

I will not be surprised that there a universal decryptor in next 24 hours for REvil. The TOR site is already down, so no ransom can be paid.

Reverse Engineering evil.

https://threader.app/thread/1411281334870368260

Clive Robinson July 3, 2021 7:11 PM

@ Fake, SpaceLifeForm, ALL,

Wow that’s like a double supply chain attack, customers of customers of?

It’s the expectation of,

“The weakest link in the chain”

Now in real life chains have rather more than two links, and mostly the only links you have any control over are the ones within your range of touch thus can reach/see.

Neo-con mantra is to use the lowest cost chain possible without looking at it let alone testing it as that’s “leaving money on the table”.

Any engineer that’s been around for a while in the real world will tell you not only do chains fail they also fail at near enough the worst time. That is they generaly fail when they are under most stress, which usually happens because of the “leaving money on the table” mantra and thus ignoring “entropy” which is about the most fundemental real law in the universe that we currently know.

Are there “engineering solutions” to chains failing, yes and we’ve known about them since the 1850’s or so.

But untill recently most failures were the result of “probability” thus in large enough numbers averaged out, hence why “fire insurance” is a reality that mainly works.

But computers with more holes than a second hand string vest are very very very susceptible to the “army of one” issue. Where just one individual could break all links of type XXY as near simultaniously as possible.

The only reason they don’t is that they see “profit” in not doing so. Hence Ransomware is directly attributable to the neo-con mantra…

Or as @SLF puts it,

You fail to manage, you manage to fail.

Which is what has happened in the West this past year or so. That is COVID-19 has made the failings of a “free market” or “capatilist deregulated economy” so obvious they have been stacking the bodies up like cordwood.

Truth.is.stranger.than.fiction July 3, 2021 8:18 PM

Has anyone ever heard of Kaseya? It markets itself as a security and ransomware company.

Their tech staff is in India and Belarus.

They don’t even have a Wikipedia page.

I thought Russia and Belarus were friends?

They do have a few US Federal Government contracts. Strangely enough with the Consumer Product Safety Commission. But the biggest contracts they had were with Veterans Affairs.

Why is the US Government buying “security” software developed in Belarus?

There’s a Wikipedia page for this company that claims to be affiliated with Kaseya – https://en.m.wikipedia.org/wiki/Unitrends

They are a RANSOMWARE prevention company.

Unitrends only has 1 job opening. You would think ransomware prevention would be booming business now.

Google “Kaseya New York” and look at the reviews. It might not be a real company because 1,056 people voted for their 1.5 star review which both says it is fraud.

More of their federal contracts show claw backs. Scary that the media does no fact checking.

Truth.is.stranger.than.fiction July 3, 2021 8:36 PM

Here’s what I think happens. MSP’s use any software they want and never tell their customers. There’s a fine line between vaporware and malware.

Remember Theranos had over 900 employees at the peak and none knew it was fraud. It was worth $9 billion.

Enron has 22,000 employee that went to work everyday and none of them knew it was entirely fake. And then there’s Bernie Madoff that controlled Wall St and no one knew his company was also fake for decades.

It happens. But we need to figure out how to stop it.

We have the FDA that makes sure our food and medicine is safe. We need a FDA for technology.

We deserve to know what MSP’s were involved in this. Shame and banishment is the only way to stop this.

Weather July 3, 2021 10:05 PM

@slf,Clive,john
I new that pos were easily hacked, it took three papers, what happen to the first, anyway side rant can you send me 32< sha256 hash, I've updated the software, so should be interest. Don't say you get you're project from me.

Weather July 3, 2021 10:35 PM

@fake
How said they are enemy, side note I can’t understand there logic, mayb

SpaceLifeForm July 3, 2021 11:41 PM

@ Fake, Clive

Well, I have questions.

hxtps://www.eenewspower.com/news/infineon-sells-newport-fab-power-foundry-startup

lurker July 4, 2021 1:25 AM

@SpaceLifeForm: did you notice what language that 4yr old story was written in? Looked like weasel to me, as in there was an agenda still unfolding.

Meanwhile we’re 4 years further down the track where the West has priced itself out of the market; that is those who still have any incentive to actually make stuff…

SpaceLifeForm July 4, 2021 2:20 AM

“Smoke On The Water”

I call this intentional. Soon to be insurance scam. The company is bleeding debt. Over $110 Billion in debt. What good does it do to pump ocean water onto ocean water? And why put the fire out? And why was there fire in the first place? And they say the put the fire out with Nitrogen. Sorry, not buying.

I must must say though, those fireboat captains were so brave. Look how close they got to the flames. Why bother? Pure theatre.

hxtps://twitter.com/i/status/1411059695503261697

hxtps://twitter.com/i/status/1411074599803142146

Clive Robinson July 4, 2021 3:43 AM

@ Fake,

The Newport Wafer Fab has a longish history, it was set up by UK Defense company INMOS to mass produce what was the worlds first 16bit chip set.

But politics interviend and instead of pushing the chips out to the world the UK Government under influance from the US made the chips “defence proucurment only” thus left an open field for US Company Intel and Israeli company Motorola…

Almost every time a “Conservative Gocernment” gets into power it starts selling off “the family silver” and now there is virtualy none left they are alowing short term investors to flee the UK with bags of cash.

Why would they want to do that? Well ask yourself a question.

Nexperia used to be NXP who used to be part of Philips of Eindhoven Holland. They became “Chinese” when the EU was not doing well economically. Now the shoe is on the other foot and the UK is doing economically very badly due to repeated political stupidity and cupidity investors want out…

What is being sold is a second line 200mm wafer plant that unlike most others is very highly integrated and has some of the fastest turn around times in the world. Thus there is a huge amount of Intellectual Property (IP) involved just as there was with ARM.

It’s hard to say what the real value of the Newport Wafer Fab is, but just for the site and buildings 100millionUSD would be on the low side. Throw in the IP and you would be up nearer 1billionUSD… But now add in the current “good will” of one of the most reaponsive Fabs in the World when there is a major upset in the normall chip manufacturing and that price should be higher a lot lot higher.

So the Chinese legitimately buying up 1billion of IP for 10cents on the dollar plus a fab plant plus customers for atleast two years is kind of a “no brainer”.

But now consider that the runours of the USD is going to crash and burn due to the massive massive debt the US FED / Government has created by “stimulus give away” to big business and thus will probably soon cease to be the sole international trading currency. Driven by the fact household property in the US is being purchased for well over the odds “silly money” it’s clear many want what they think are going to be “inflation proof assets” such as domestic property for rental and the “rent seeking income”.

Put simply enough people want to get their money out from USD and into other investments that a tipping point may well occur fairly soon.

The Chinese have a lot of “US Paper” that they likewise want to get rid of whilst it still has some value.

So the Chinese have been getting rid of increasing amounts of USD by purchasing investments they understand, that prior to Trump they purchased in the US. The US is now forcing “fire sales” of Chinese owned assets in the US so the Chinese have moved to EU prior to Brexit they purchased UK at high prices because the world regarded the UK as the EU doorway… Well the UK is now desperately trying to hold onto the US coat tails and failing… all because of the political stupidity and cupidity of a handful of “Stale White and Male” and a couple of wanabee women (May/Patel) hangers on who all did/do as instructed for not even fourty pieces of silver. So the UK has “Fire Sale++” on the go…

The Chinese dump their unwanted US Paper into the EU for some choice assets, these then “launder” more unwanted US Paper into the UK to mainly US based investors who want out of what they see as a Sinking UK thus will take the unwanted US Paper and turn it into other safer assets as fast as they can and are glad to be getting the “criminal rate” of 10cents on the dolar, it’s better than what they think is comming a few months down the road…

Will the US Dolar be toast by year end, honestly I’ve no idea, but some are thinking that way and their numbers are growing as they follow what they think is “the smart money”. But remember in phony markets such as the “finance industry”, “wish fulfillment” for disaster becomes quite real in their terms when enough people believe it…

Remember “inflation” has a ratchet effect, the finance industry takes out creating inflation, and when a crash happens they do not put back. That falls on the “insurers of last resort” the tax payers, who have seen nothing out of the inflation they already paid for via flat incomes or worse for two or more decades, grabbing of any assets they owned etc by legal trickery and leaving them with massive debt whilst the finance industry just grabed the assets for next to nothing. This is what the MAGA confidence trick is all about, and don’t thing because there is a different coloured stripe on the hill it’s going to stop. Nope it will just get re-branded and carry on. You will get poorer, they will get richer and you will pay three fold over as a minimum, once for the inflation, twice for the asset loss, thrice for the national debt… Oh and fourth for all the kick backs so politicians “can have a taste” that they can use to keep them in power as those that give, take it back via the costs of political campaigns…

Clive Robinson July 4, 2021 4:19 AM

@ Fake, SpaceLifeForm,

You should also read,

https://www.eenewspower.com/news/us-restricts-silica-and-polysilicon-imports

Then ask who’s investments were thretend by “Solar Panel’s”…

Some “kick-backs” start with just a spider wobble across a dotted line… The best ones are covered by more than one layer of FUD.

In this case the Orwellian “bash the chine” and the liberal vapours of faux SJW… All whilst cheap clothes and other sweat shop products flood in from India and it’s environs.

Politically what is there not to like about this give away to the Coal and worse Power generating industry, that is a nice inflation hedge investment for the fraction of a fraction that own the US…

Am I being cynical?

Honestly no, I’ve lived long enough to have seen all this crap several times before. It’s a well worked out “hoodwink the prols” policy that gets played over and over ad nauseam, and if you challenge it, it has that built in “think of the children” FUD for an ad hominem attack to stop you in your tracks, by making you look evil.

ADFGVX July 4, 2021 10:28 AM

@ Fake

“Health insurers are threatening not to cover some patients’ ER bills”

If the bills are fraudulent, as most ER bills are, for mental health and other non-emergencies, insurers are neither obligated not encouraged to pay them.

Of course it’s not good for the patients who are stuck with the bills, gun rights are revoked, the debt collectors start coming late at night to tamper with cars, smash headlights, break windows, break kneecaps.

You’re never going to pay those bills off as a patient with mental health record, but do realize that even though you are not allowed to own weapons or use force to defend yourself, they are legally permitted to judiciously use force, damage or confiscate your property, and injure, maim, or mutilate you to collect a legal debt.

Blank Verse July 4, 2021 10:59 AM

https://ourworldindata.org/does-the-news-reflect-what-we-die-from

@bruce. You haven’t written about terrorism recently but there is some interesting new data out as part of a larger study. It compares how often people die from terrorism vs how often people Google about terrorism vs how often the media reports on terrorism. Unsurprisingly, the data shows terrorism to be radically overhyped by the media compared to the threat it poses.

JF July 4, 2021 11:04 AM

@Clive Robinson

“and Israeli company Motorola…”

Motorola Mobility is owned by Lenovo, a Chinese firm, and prior to that, was owned by Google.

Motorola Solutions is still a US corporation.

Clive Robinson July 4, 2021 12:23 PM

@ JF,

Motorola Mobility is owned by Lenovo, a Chinese firm, and prior to that, was owned by Google.

Go back and read what I wrote again.

I’ll give you a hint as to what you missed, neither Google or Lenovo existed.

Clive Robinson July 4, 2021 12:56 PM

@ Blank Verse,

It compares how often people die from terrorism vs…

First define what you mean by “die from terrorism”.

Some say that the increasing deaths on the road were “due to terrorism”.

Some healthcare professionals can show increased physical illnesses and deaths due to poor mental health care for the increased psychiatric and psychological disorders caused not just by 9/11 it’s self but the on going “War on terror”.

Others say that the increasing deaths in society from “lost opportunity costs” due to the diversion of money it basically pointles posturing and “guard labour” and it’s toys.

And speaking of “guard Labour” and their toys, how many deaths are related to all the nonsense of turning “Blue2Black” of Cop-to-SWAT or other military look alike that has gathered significant pace in the “War on Terror”?

What ever you say you are going to get attacked from one side or the other for what are basically two underlying reasons,

1, Political posturing.
2, Political Funding.

On the second point if you “follow the money” you will find “certain people” scream for both “smaller government” and “increased anti-XXX spending” where XXX is anyone of fifty or so “Wars on XXX” the US currently has running. What those “certain people” are realy saying is “You should put all the tax money in my pocket”… And every year hundreds of billions at the very least does indeed flow into their pockets but it is never enough for them. Nor will it ever be enough, because the “Know they are entitled by birthright and you are not” and only their oppinions are valid…

You can see where the argument will go and it is only going to end in acrimony and political mud slinging and not produce any agreement that is of use.

echo July 4, 2021 3:30 PM

https://www.independent.co.uk/news/morrisons-takeover-fortress-investment-group-b1877517.html

Morrisons has agreed to a takeover offer from the global investment manager Fortress Investment Group, which values the British supermarket group at £6.3bn.

Fortress Investment group is a front for Softbank and Koch Industries. Needless to say after the ARM debacle and a lot of US far right political dark money interfering in UK and EU politics I’m not happy about this. They’re taking the ****.

I do not like being directly or indirectly caught up in the games of billionaires and shady financiers and callow politicians. I hope they get what is coming to them.

SpaceLifeForm July 4, 2021 4:25 PM

@ Curious, ALL

The original tweet had this:

The Internet has a serious fundamental flaw: the transmission control protocol/internet protocol (TCP/IP)—the primary engine underpinning the Internet—is less secure. Is #blockchain the solution we need to eliminate this flaw?

When I first saw this (while original tweet still existed), I had two questions:

  1. less secure than what?
  2. What is the flaw?

Neither question was addressed in the blog post that the tweet referenced. You can read between the lines. It does have valid points, yet seems to want to reach a predetermined conclusion without fleshing out the details.

You can find a copy of that here:

hxtps://www.rsaconference.com/library/blog/understanding-blockchain-security

SpaceLifeForm July 4, 2021 4:38 PM

@ Curious, ALL

They disappeared the original post. I did improper C+P.
Here is the link I meant to use above.

hxtps://web.archive.org/web/20210703220948/https://www.rsaconference.com/library/blog/understanding-blockchain-security

Clive Robinson July 4, 2021 4:49 PM

@ SpaceLifeForm, Weather, ALL,

First of all I hope your 4th is going well.

But a little present for you…

You might or might not know there has been an argument in theoretical physics about the “down wind car”.

In essence the argument is you can make a car with a propeller to drive the wheels that will actually move down wind faster than the downwind speed…

Well it kind of has a perpetual motion machine feal about it only the physics theory says it sgould be possible, though probably impractical…

Well I won’t spoil the fun just watch the video,

https://m.youtube.com/watch?v=VUgajGv4Aok

And remember she is just “a gal with a home workshop” so anyone should be able to reproduce the experiments.

echo July 4, 2021 5:37 PM

I think blockchain is a neat little psychological hack of peoples minds. Blockchain is to computer “enthusiasts” what “computers” are to rote learned pen pushers. It’s got this gee-whizz technobabble quality to it and it has technical jargon propping it up so it must be good, right? It’s like the men’s equivalent of beauty product adverts. It’s complex enough to sound funky but simple enough world and dog can bikeshed.

Makeup look design can get very technical for lots of reasons I won’t go into. Why ruin the magic? But looking past the branding and marketing the key thing is active ingredient e.g. the new wave of super strong fixer sprays which retail for £20+ are nothing more than glorified rosewater and glycerine with about 0.5-1% PVA glue.

I’ve been trying a new eyeshadow look this week. My old look did its job and was quick and easy but really wasn’t there. The new look achieves something similar but is more sophisticated. It’s also more complex as the base outline is more curvy and has more elements to it and is using at least four different colours including eyeliner. Every man on the planet will scream “What has this to do with security?” Social engineering covers this adequately. I’ve also shifted over to using waterproof mascara and eyeliner. Mostly this is for convenience so if I get tears in my eyes or it rains I won’t look like a racoon. It also means I can have a shower or go swimming and my face won’t fall off. In a roundabout way this brings us back to my interest in fixing spray. That 0.5-1.0% PVA is the active ingredient in fixing spray first developed for stage actors now commoditised for the mass market. It’s not going to budge under hot lights or physical exertion.

I’ve examined making my own makeup. Partly for fun, partly for saving money, partly for a theoretical post-apocolypse situation. Yes, and survival shelters and food too!

So back to “blockchain”. Blockchain utterly bores me and as for “cryptocurrencies” I missed the boat on generating one bitcoin every three days on low end hardware years ago so my interest in this domain can be described as near zero. I just don’t see the use for it like men don’t see the use for makeup although, yes, makeup can have tactical uses.

In a funny sort of way “blockchain” is online makeup for men. It is the remote keyboard warrior equivalent of the fake Rolex watch and made to measure suit. Muttering “blockchain” automatically makes you sound clever and savvy and on point. It confers status and membership of a tribe and the entry point is low.

I never go out without makeup even for a bottle of milk, and for men on the internet they seem never to have a website or discussion without “blockchain”. Now you can have your Chanel using “cryptocurrency” types or your drugstore knock off “blockchain” types but makeup is makeup.

So what is makeup? Fundamentally makeup is a clever neuro-psycho-social trick.

SpaceLifeForm July 4, 2021 5:40 PM

Strange. It’s almost like REvil knew the patch was coming.

A leaker? Or comms intercepted? Or just pure coincidence?

hxtps://www.bleepingcomputer.com/news/security/kaseya-was-fixing-zero-day-just-as-revil-ransomware-sprung-their-attack/

hxtps://www.kaseya.com/potential-attack-on-kaseya-vsa/

Weather July 4, 2021 5:47 PM

@clive, all
If the car is rolling down hill, I think you are taking a jab, I did find out what pom meant, prison of magtesy of England, but the English person was still pissed off 😉 looked at that idea with Tegs.

Clive Robinson July 4, 2021 5:48 PM

@ Curious, SpaceLifeForm, ALL,

Re RSA “editing for neutrality”

Well it’s a compleate “load”, but then the comment about “little table with nuggets and chips” does kind of portray the correct way of thinking about the proponents.

But what did make me laugh was the claimed “thirty years experience” of the comments author

Did J.K.Rowling give “The Philosophers Stone” blockchain or just good old fashioned “Endless Gold creation” along with immortality back before it was published in June 97 just under a quater of a century ago?..

The first blockchain was proposed by an “entity” –person or persons– using the name “Satoshi Nakamoto” in 2008, just 13 years ago…

Arguably the first vague conceptual design for a blockchain like protocal was by Dr David Chaum in his 1982 PhD Thesis “Computer Systems Established, Maintained, and Trusted by Mutually Suspicious Groups.”

So I’m guessing that the comment poster was not exactly being honest to put it politely…

What this does kinda show yet again, is the ICTsec industry is not learning it’s own history…

Mind you it could also learn by reading the likes of Charles Dickens, or more general history and find out what a Ponzie Scheme and similar are,

https://simple.m.wikipedia.org/wiki/Ponzi_scheme

No name July 4, 2021 5:55 PM

Coop, the Swedish supermarket chain breached by Kaseya – the link below is from their MSP. The reason why I am posting this is because MSP’s may deploy software that their clients are unaware of. Also this MSP announcement about Coop details another software product which performs data scraping of their IT operations. But the software is owned by the MSP. Any Corp doing business with this MSP should have a third party scan for the presence of Kaseya or the other software mentioned. I don’t like that this other software’s website says that it is used in financial services and utilities. Do they know that?

https://www.tcs.com/coop-sweden-partners-tcs-accelerate-digital-transformation-program#section_1

https://www.tcs.com/coop-sweden-tcs-expands-strategic-partnership-long-term-transformation-goals

Look at the description of what this MSP’s AI software installed at Coop does.
https://www.linkedin.com/company/igniobydigitate

Clive Robinson July 4, 2021 6:01 PM

@ Weather,

If the car is rolling down hill

You’ve clearly not watched the video have you?

No Name July 4, 2021 6:22 PM

@SpaceLifeForm

What if all of these attacks aren’t breaches, but insiders pretending they are breaches? Heck they wouldn’t even lose their contract while they get enormous sums for the restoration work too. Brilliant, no?

Whatever the case may be, this continues to prove that no one knows how to perform ample vendor risk assessments. Government and the private sector buy the scariest software. Due to graft.

And it always involves subcontracting too. Subcontractors should never be permitted in technology contracts. Because what you don’t know CAN hurt you.

Weather July 4, 2021 6:28 PM

@clive
Viewed it, the drag on the wheels which count as a lose in, this case add.

Clive Robinson July 4, 2021 6:43 PM

@ Weather,

I did find out what pom meant, prison of magtesy of England

Entirely wrong

First of it’s always “His Majesty” or “Her Majesty” so it would be “POHM” not “POM”.

Secondly the usage was Pommy or Pommie not Pom for some considerable time befor Pom got used in the expression “Five Pound POM” which was the price of a ticket to emigrate.

Thirdly there is some evidence that it’s short for Pomegranate said in a way to rhym with immigrant.

What you probably do not know is that British Sailors were named Lymies from being required to drink about an egg cup full of lyme juice to prevent scurvy. Originaly it was lemon juice but due to the vagaries of European Warfare Lemons became unavailable.

More palatable but less astringent by far than lime or lemon huice is pomegranate juice used in the Gimlet Cocktail. If you look at the route that would have been taken by the ships carrying immagrents to Australia you will see where the pomegrabates would have been brought on board as part of the “fresh” provisions.

Apparently the joke about pomegranates was the colour of the juice changed with time. Just like the colour puce[1] and how it changes with time and fresh air. Much as many pale skined British Immigrants would have changed.

Sometimes when people become “socially embarrassed” by a “faux pas” they have made, it is said they are “looking a little puce about the ears”…

[1] Puce is the French word for flea, and the colour the result you get if you squash one recently after it has fed on blood. What is never clear is if someone means the dark red it starts off as or the brown it eventually becomes.

lurker July 4, 2021 7:07 PM

@SpaceLifeForm, @No Name

Why do people use MSPs? If they have been convinced that the MSP can provide the service as good but cheaper than they can from in house personell, then this is just another stage of the race to the bottom.

Can they trust the MSP more than their own hires that they write the contracts for? If you can’t trust yourself to read, write, or understand the code, can you trust yourself with the hiring of staff, or MSP, to do it for you?

If security is all about trust, who can you trust?

Weather July 5, 2021 12:41 AM

@clive
It was ment of a joke, like saying will or sheep shackers, its factual so not a insult.
That YouTube video (don’t post that) was quite good, what I posted before about the rubber bans on the wheels.
Why don’t you post a sha256 hash of 32< ?

Weather July 5, 2021 12:49 AM

@bruce
You said release it like most PhD do, but what would it take to you to make a post on your site? One upper says send it with note, the rest, cert well there com’s didn’t amount to much, so who am I talking too?

Clive Robinson July 5, 2021 2:28 AM

@ Weather,

It was ment of a joke…
…its factual so not a insult.

As I pointed out it’s not factual. There is no evidence that is where the term came from.

However the evidence of how scurvy and malaria brought it about is actually very interesting from both a historical and anthropological reasons.

The fact sailors mixed their anti scurvy med with thair daily rum rassions to make “grog” and how officers later mixed it with gin to make a “Gimlit” (measure of lime, measure of gin, sugar syrup to taste) says a lot about the desperate need of some to have “status indicators” over and above just uniforms, servants, accomadation etc.

How the “Gimlit” advanced to be mixed with soda water that had quinine in it to stop malaria. How the lime got augmented with pomegranate juice again as a “status indicator” (extracting the juice is a labour intensive process) and how others then used it as a method of derision in response against the idiocy of mindless status indicators.

And quite a lot more besides about trading routes and similar and early industrialisation driven by the need for “status indicators”.

You don’t find that interesting?

Ahh well…

As for,

Why don’t you post a sha256 hash of 32< ?

To what purpose?

Veritas July 5, 2021 3:18 AM

Bruce,

what I would like to highlight and bring up are the connections that DuckDuckGo has with Bing…like how the ad-clicks (if you click on them) are routed through bing-servers. And many of the images in their image search results are hosted on bing-servers.

Maybe DuckDuckGo is not what it had made itself look like.

Curious July 5, 2021 4:12 AM

Moto’s comment above is probably just spam, and includes a link which I wouldn’t want to click on.

Curious July 5, 2021 8:39 AM

I couldn’t find any ‘Cryptographer’s panel’ video for 2021 (RSA Conference) on youtube.

Maybe there wasn’t any in 2021 because of covid-19.

No Name July 5, 2021 9:10 AM

@lurker

The Swiss recently summed up what this is about. They speak about a ransomware event too. Doesn’t sound cheaper to me. This shows it was $3.3 million per person. And the loss of a data center and offices. https://www.finews.com/news/english-news/46514-ubs-cognizant-technology-outsourcing-mike-dargan

The Swiss only identified 1-in-3 people were worth keeping. That’s a lot of attrition in sensitive roles.

As this article explains, a little known secret is during the last financial crisis the offshore MSP’s purchased their client’s offshore data centers and development teams. How do accountants treat this? Is it written off as a loss? What’s worse, these regulated institutions entirely lost control and insight into their IT operations. Not only banks do this. It is common among international businesses. Doesn’t matter if they are defence contractors or even Big Tech. None of them know who works in their outsourced IT. It’s not an extension of your team. It is a black hole. No information is shared.

During the last financial crisis if the MSP’s purchased offshore data centers as some type of new profit model. Now there’s new crisis hitting them, so what is their new profit model now? No one is hiring more contractors, no one is hiring white collar anywhere. Also MSPs are losing their software development business. Software is now primarily SaaS. Also regulators are paying more attention to ITSM and third party risk. A bank was just fined $400M because an offshore WFH contractor miskeyed $900M payments. The bank was in the midst off migrating off that contractor too. Then there’s another bank that was fined $60M recently because their contractors stole client data not once, but repeatedly.

If a Doctor repeatedly hurts patients they lose their license. Yet the US Gov routinely recalls food or medicine that is dangerous too. So why do the same MSP’s have their identities hidden by the US Gov when they are repeatedly involved in massive and extremely impactful, expensive breaches? It is the same 5 over and over.

Every time you hire sometime into IT operations you are giving them keys to your kingdom. They shouldn’t be workers you don’t know. You need these sensitive employees to have a sense of ownership and dedication. That’s earned. Contractors never have this. I speak from experience.

Weather July 5, 2021 12:21 PM

@clive
I find it interesting in the sense trips to Mars and beyond will take long times like back then. I’m not to fused about status symbols everyone has them ,fighting over others.

If I can attack a sha hash that I didn’t pick, some people might listen.

And the last two days under the weather so probably didn’t explain correctly what I was trying to say.

lurker July 5, 2021 1:24 PM

@No Name: UBS get it. But there will be a lot more blood and tears before enough others get it…

Freezing_in_Brazil July 5, 2021 1:26 PM

@ All

It’s not my intention to disrupt any ongoing discussion with this.

Dissatisfied [no reason to be, except paranoia] with passwords generated using date, I’m playing with the idea of using sensor data [CPU temperature, pixel values of a camera frame, etc] to try to get better performance – via bash script or command line, for example, changing the classic pipeline.

date +%s | sha256sum | base64 | head -c 32 ; echo

into something like

<sensor-data> | sha256sum | base64 | head -c 32 ; echo

A web search returns no result about it, so maybe it is stupid [but I’d like to confirm]

Clive Robinson July 5, 2021 2:10 PM

@ Freezing_in_Brazil,

Dissatisfied [no reason to be, except paranoia] with passwords generated using date

Err “date” is not a good source of entropy, so “paranoia” should be “prudence”…

You don’t mention which *nix you are using but they all have a fairly decent RNG these days backed by a TRNG, that should be available from the command line.

I suggest you think about using “dd” as a universal “get me data” tool bung that into other tools inxluding awk / grep and you should get something you want.

Don’t forget the vuilt in crypto functions in many *nix’s that way you don’t need to use sha256 and can make your “input source” more or less what you want.

With a bit of skill you can vuild a shell script password manager, but just remember that any “command line arguments” can be seen with an appropriately timed “ps”.

This is a decade old but not much has changed when you use /dev/urandom

https://www.cyberciti.biz/faq/bash-shell-script-generating-random-numbers/

More uptodate and more ways,

https://www.howtogeek.com/howto/30184/10-ways-to-generate-a-random-password-from-the-command-line/

Bob Paddock July 5, 2021 2:14 PM

It has taken far longer than I expect for the Government to start mandating safer software beyond FDA, FAA, et.al. type stuff that we have now.
That day has arrived with White House Executive Order 13873.

This is about TicTok. However they make no distinction about ‘connected software’ from embedded devices such as Their ‘(d) end-point-device’, which could cover IoT etc.

https://www.whitehouse.gov/briefing-room/presidential-actions/2021/06/09/executive-order-on-protecting-americans-sensitive-data-from-foreign-adversaries/

“… In evaluating the risks of a connected software application,
several factors should be considered. Consistent with the
criteria established in Executive Order 13873, and in addition to
the criteria set forth in implementing regulations, potential
indicators of risk relating to connected software applications
include: ownership, control, or management by persons that
support a foreign adversary’s military, intelligence, or
proliferation activities; use of the connected software
application to conduct surveillance that enables espionage,
including through a foreign adversary’s access to sensitive or
confidential government or business information, or sensitive
personal data; ownership, control, or management of connected
software applications by persons subject to coercion or cooption
by a foreign adversary; ownership, control, or management of
connected software applications by persons involved in malicious
cyber activities; a lack of thorough and reliable third-party
auditing of connected software applications; the scope and
sensitivity of the data collected; the number and sensitivity of
the users of the connected software application; and the extent
to which identified risks have been or can be addressed by
independently verifiable measures. …

(d) The Secretary of
Commerce shall evaluate on a continuing basis transactions
involving connected software applications that may pose an undue
risk of sabotage or subversion of the design, integrity,
manufacturing, production, distribution, installation, operation,
or maintenance of information and communications technology or
services in the United States; pose an undue risk of catastrophic
effects on the security or resiliency of the critical
infrastructure or digital economy of the United States; or
otherwise pose an unacceptable risk to the national security of
the United States or the security and safety of United States
persons. …

Sec. 3. Definitions. For purposes of this order:
(a) the term “connected software application” means software, a
software program, or a group of software programs, that is
designed to be used on an end-point computing device and includes
as an integral functionality, the ability to collect, process, or
transmit data via the Internet; …”

If this was already covered on the Blog here I missed it, sorry.

CrystallographicQuantumPercolation July 5, 2021 2:56 PM

@vaspup: Future news post: “Experts have doubts the discovery that hundreds of invisibility suits are missing and presumably sold on the black market had any connection with recent surge in daytime break-ins across the country. The experts had no comment about homeowner videos showing car doors opening and then closing by themselves, after which their autos were driven away with nobody at the wheel.”

Weather July 5, 2021 3:14 PM

@frezzing

I tried ‘cat /dev/urandom | head -c 32 > key.text, you can base64 if you want to do web stuff.

You helped answer the question I asked

SpaceLifeForm July 5, 2021 3:41 PM

@ lurker, No Name

Why do people use MSPs?

Convenience.

THe MSP is providing a managed Service.

Lets say you are a business operation. But, as a convenience to your customers, you bring in an outside MSP to provide to your customers the convenience of credit card, debit card, even an ATM machine.

When you made that decision, you just created dependencies. Your customers eventually come to expect that those features are available.

When those expected features are not available, it is a disruption.

It’s similiar to that which a software developer encounters when trying to rebuild software using a newer version of a library. The feature that the library provided to your code as a Service no longer functions as expected.

Actually, it is not similiar, it is exactly the same. Welcome to dependency hell.

Cash is king. Less dependencies.

lurker July 5, 2021 3:58 PM

@SpaceLifeForm: Lets say you are a business operation.

As a convenience to my customers I bring in my bank to provide credit/debit cards, ATM, &c. The dependency is at least a (so far) known trusted party. One less cog in the machine on the KISS principle. But maybe that’s why I am not a business operation…

SpaceLifeForm July 5, 2021 4:21 PM

@ Freezing_in_Brazil, Clive

I’d throw in some amount of dice rolls appended to /dev/random and /dev/urandom. Grab some from each preferably even though /dev/random may be low on entropy.

Grab say 32 bytes from /dev/random, append 35 bytes from /dev/urandom (odd offset amount, preferably prime offset), then append your random amount of (preferably prime) of dice rolls.

Then hash that.

Freezing_in_Brazil July 5, 2021 4:23 PM

@ Clive Robinson

Thanks for the feedback. Much appreciated.

You know… the plague…

I’m in the country [ an Issac Newton] for a brief winter break [ie dry season] with social distance. My big *nix is ​​an old dell running my small Puppy Linux with its limited set of commands and wishing that even this basic distribution could count with a more robust feature.

I know urandom, but I’d like to rely on a native command to accomplish this task. I remember reading about controversies about urandom and how it behaves on Linux systems, leading to long, heated discussions about how randomness should be obtained in Python programs. I confess that I did not dwell on urandom and I also ignore its sources of entropy.

I’m looking for a homemade solution [I know the stupidity of the ‘own crypto’? but it can’t be worse than using date for the same use case], combining the entropy sources available on any system.

A more secure output can be obtained with the same pipe by replacing the date command with an a random [pun intended] file in your documents.

Regards

SpaceLifeForm July 5, 2021 5:25 PM

@ lurker

I understand what you are referring to.

It’s still, at least, two dependencies outside of your control. The network connection, and the backend bank side.

If either of them get attacked, you are down. Your business is down. Or at least crippled.

Cash is king.

Then you can still function.

SpaceLifeForm July 5, 2021 5:58 PM

@ JonKnowsNothing, Winter, Clive

The global petri dish is alive and kicking. Delta is flying. Bad Down Under. Now spreading in Vietnam.

The anti-vaxers are in for an awakening.

Darwin is watching.

SpaceLifeForm July 5, 2021 6:25 PM

@ lurker, Clive

I was not clear. It’s actually possible there are additional dependencies on the bank backend side. And additional internet Service providers that you do not know of, and other Service providers that provide DNS and BGP Services.

That you can not easily see.

It may actually involve banks talking to other banks to complete the transaction, and possibly going thru other network providers. And thru DNS and BGP.

So, Welcome to Dependency Hell.

The Show That Never Ends

JonKnowsNothing July 5, 2021 7:28 PM

@ SpaceLifeForm, Winter, Clive

re: The global petri dish

In this dirt dry part of California, Delta is ticking but the local public health folks are 3-4 weeks behind in reporting numbers and status at their periodic-weekly PR-Media reports.

  1. They do little sequencing.
  2. They do not get the sequencing results until 2-3 weeks later
  3. The PR says:
    We got it, and we got MORE of it … but we cannot tell you HOW MUCH MORE.
  4. 100% hindsight with cold leftovers

Statewide the numbers are a tad more accurate

  • In May 2021 you had 58% chance of getting Alpha B.1.1.7
  • In June 2021 you had 35% chance each for Alpha B.1.1.7 OR Delta B.1.617.2
  • For July 2021 expectations will be A Delta Wave

Also of interest or as they say “a concern”…

Reports of Delta outbreaks here and about, quietly mention that some folks have already gotten 1 or 2 jabs and are getting Delta anyway.

The outbreaks in the Care Homes in Australia, 4 of 5 residents who are positive all had 2 jabs. The 5th was a new resident sans-jab because the jab crew has not returned to jab all the new folks in the home. No one knows Why.

So the intermittent reports of Jabees getting COVID-DeltaMut are seeping out but not in great numbers.

Some of this is seen with Lambda and with several vaccines where folks are getting “booster jabs” of all sorts and mixes. Nothing scientific here. If your PCP/GP has a jab, they take it and they are not worrying about which jab it is or if you need 1 or 2 or 3 or 4…

To be determined…

  • If the death rate is lower generally among the Jabees than last waves, a good portion will become sick enough to be in hospital and NOT DIE.
  • During the last waves in the USA 600,000 died becoming an overflow relief value for the Health Care Systems (TRIAGE or Other)
  • If the Delta Wave magnitude is similar to the previous waves, some portion of those 600,000 will NOT die but remain in hospital along with the 30,000,000 (USA) sick-recovered folks.
  • This implies that there may be 30,000,000 + (percentage of 600,000) = Case Load
  • Pick a percent.

Countries with higher jab rates are not doing all that well either. Israel is rolling out extra jabs all over. They have 1,000,000 jabs about to expire at the end of July 2021, which they are trying to swap for 1,000,000 with a longer shelf life. Unfortunately, not that many countries have the ability to dispense 1,000,000 jabs in 20 days or 500,000 double jabs in 20 days.

Vaccine hording with a directly quantifiable effect: 500,000-1,000,000 people at risk. Do not ask why…

World Odometer 07 04 2021:

+621,000 07 04 2021 USA Dead
+34,588,000 07 04 2021 USA Cases
+8,000 07 04 2021 USA New Cases
+1,866/M 07 04 2021 USA Deaths/Million (187/100K)

ADFGVX July 5, 2021 8:30 PM

@ Fake, EEL

What set me back was an unsupervised visit, almost stone aged me… But if you teach a man to fish…

Woops

Moderate me if I’m wrong.

Makes me think now that I recall … the term “true fish” is really properly speaking only applied to the “bony fishes” or Osteichthyes and not to the Chondrichthyes or other classes of “cartilaginous fishes” such as sharks and eels, some of which have been classified as Agnatha or “jawless vertebrates” rather than Gnathostomata or “jaw-boned vertebrates” (*) along with amphibians, reptiles, birds, and mammals.

But now there’s a government conspiracy because there is no longer any scientific classification for true or bony fishes.

https://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=914179#null

The Osteichthyes or true or bony fishes were apparently split into two superclasses, the Actinopterygii or spiny “ray-finned fishes” and the Sarcopterygii which include cœlacanths and lungfishes with fins vaguely resembling primitive limbs or wings.

(*) There’s a “jawbone of an ass” with a certain odd religious-like zeal for the pig-Latin classifications of Charles Linnæus.

SpaceLifeForm July 5, 2021 9:32 PM

@ Freezing_in_Brazil, Clive

Rolling the dice.

Eventually, you roll a seven when you do not want to roll a seven. You get on a hot streak, and think you can just keep rolling numbers.

There have been a lot of players playing with house money.

hxtps://fcced.com/bolsonaro-allegedly-involve-corrupt-scheme-57212223/

Weather July 5, 2021 9:44 PM

@slf
Yeah it might be interest, but why attack, the others how were working out a hard hash, thanks. Bgp is broken, I could do a red team into a uni, that was 5 hops away from the backbone to USA, and no redhat would block it. But you are strensh the grey matter.

Clive Robinson July 6, 2021 12:36 AM

@ Freezing_in_Brazil,

My big *nix is ​​an old dell running my small Puppy Linux with its limited set of commands and wishing that even this basic distribution could count with a more robust feature.

It’s been years since I used Puppy Linux, but I did remember the “BusyBox” issue I had with it…

BusyBox replaces many standard though stripped down *nix command line utilities with a single executable. It can save a lot of space and can sometimes make maintainance easier…

However to get the utilities you want means using “symbolic links” to the executable using the name of the utility you actually want. However symbolic links do take up inodes amoungst other things, so the Distro bods paired things down to the minimum (you see the same mentaliry on Android).

It would appear Puppy Linux still uses busybox,

http://wikka.puppylinux.com/busybox

Thus you might just need to make the required symbolic links.

Clive Robinson July 6, 2021 12:52 AM

@ lurker, No Name, SpaceLifeForm,

Why do people use MSPs?

It’s not just the “Convenience” of writing out a cheque once a month etc, it cuts other things like over heads from head count etc, you would normarly get from “exonomies of scale (it’s why you get layoffs when two organisations merge).

But there is more to “economies of scale” than just cost reduction. There is “redundancy of effort” and “skills”. An MSP might have a hundred or so organisations, under their control. Thus it makes sense for them to invest in “domain specialists” rather than “generalists” smaller organisations would have.

Thus when things go wrong, you’ve not got a generalist desperately scrabling around looking for highly specific information that might be hard to find. You have a specialist who’s probably seen the same or similar issue in the past thus can “jump in” a day or so ahead of a generalist and come up with a “fast fix” and then “more general mitigation” than the generalist who will come up at some point with a “specific fix”.

SpaceLifeForm July 6, 2021 1:12 AM

@ Clive, ALL

it’s why you get layoffs when two organisations merge

Which is why one should plan on leaving near term. Especially if you can figure out if you are on the bought side. It’s not always clear in a merger who is really acquiring whom. But someone is behind the scenes trying to extract money. That is a given.

In other news, I expect that Kaseya is going to pay about $25 Million in bitcoin, and then the universal decryptor will come out. Just a hunch.

Winter July 6, 2021 2:39 AM

This was in the local news yesterday:
Dutch team was a day away from saving Kaseya when hackers struck; Ransomware demand hits $70 million
ht tps://nltimes.nl/2021/07/05/dutch-team-day-away-saving-kaseya-hackers-struck-ransomware-demand-hits-70-million

Within a few days, the Dutch team was working with Kaseya’s top technical officer, Vrij Nederland wrote. They intended to release a software update to close that vulnerability on Saturday, but they were just too late. On Friday evening, Boonstra received a message from Kaseya that an attack was in progress, after which vulnerable customers were hastily warned to turn off their systems.

Freezing_in_Brazil July 6, 2021 8:21 AM

@ Weather, SpaceLIfeForm

Thanks for the comments guys. I’ll be experimenting with some ideas I have in mind and will be back to the issue.

@ Clive

Yes, BusyBox definitely a feature of Puppy.

SPL, as to Bolsonaro, he always belonged to the rotten band of the Brazilian congress. He tries to sell the image of an independent politician, but he is a man who has been using and abusing the political system for 30 years now. His adventures in the company of corrupt politicians in Brazil are very well known.

The problem in Brazil [maybe elsewhere too] is the huge functionally illiterate mass, people who get their information from popular TV networks and social media [Social media deserve a separate analysis. I see the social fabric being ‘blown to smithereens’ by the polarization of networks. The need fo a complete overhaul of its functioning is obvious – but I have no idea where to start].

At the moment Bolsonaro bleeds and loses popularity by the day. But he still has 1 1/2 years in power. He will be dangerous and unpredictable. There is an atmosphere of tension in the air.

Freezing_in_Brazil July 6, 2021 9:53 AM

@ SpaceLifeForm

Grab say 32 bytes from /dev/random, append 35 bytes from /dev/urandom (odd offset amount, preferably prime offset), then append your random amount of (preferably prime) of dice rolls.

Then hash that.

Yes, my friend, that would be about the core of my reaosning. 🙂

*All this fuzz about it, on my part, is because i had never really thought with care about the quality [or lack thereof] of entropy sources in a standard environment. I developed an enlightening curiosity recently. Searching for references i stumbled upon this – quite in line to my questionings:

htps://unix.stackexchange.com/questions/209901/sources-of-entropy-for-linux

There are still many opportunities open for improvement.

ADFGVX July 6, 2021 10:19 AM

6×6 pair transposition cipher. Arrangement is the key.

5 6 7 8 9 0
4 M N O P Q
3 L C D E R
2 K B A F S
1 J I H G T
Z Y X W V U

For each pair of plaintext letters or digits, substitute the other two corners of the rectangle.

73->5C, JR->TL for vertical axis mirroring.

C5, LT for horizontal axis mirroring.

Invent special rules for doubles or pairs in the same row or column.

Weather July 6, 2021 10:31 AM

@freezing
Base64 might not be a good idea, a 5 byte password is 20 after base64 which some website limit to 8-20

About entropy a long movie with a screen capture, pixel x/y and rgb values.

Clive Robinson July 6, 2021 2:13 PM

@ Freezing_in_Brazil,

Searching for references i stumbled upon this – quite in line to my questionings:

Like many such postings on “entropy” it should carry a “Health Warning”.

I could go through it statment by statment but then it would be a very very long and still probably not very helpfull reply[1].

The easiest way to generate a “True Random Password” is with a pair of dice and a six by six grid on a piece of paper. Fill the grid with the Alphabet set A{a…z,0..9}[2].

Put two differently coloured dice in a large glass like a 1pint beer glass and tape a beer mat or similar across the top. This makes your quicker to use “TRNG”[3].

Shake it and always reading the dice in the same order (say red first blue second) use these to provide the X/Y grid coordinates again keeping the same order to a charecter in the set. Keep throwing and writing down untill you have the desired length of password[4].

Each throw gives you ~2^5.15 bits of data not all is “true entropy” there will be some “bias” and some “noise” but it mostly is. For passwords just assume 5bits for each final output charecter as the string length is to short to detect “bias” and in your “TRNG” (and the “noise” likewise is too short to detect).

So decide how “strong” which is long you want your password and generate it.

Every additional two chars in the output length makes the password a little over a thousand times stronger on just a “guessing” or “brut force” attack.

Once you have such an output “string” you could then use it to search the *nix spelling dictionary to get the XKCD “horse, battery, staple…” type Pass Phrase. But what ever you do “DO NOT REORDER THE WORDS” to make it easier to remember as that losses you bits of entropy.

That tells you about all you need to know practically for generating even “secret master keys”.

If you think having just dice around the place might be suspicious especially if you keep them in a locked safe. Then consider buying one of those “Home Casino” sets with the green baize felt and chips etc. If somebody says “why do you keep it in the safe” just say you don’t want people “stealing the chips or putting loaded dice in do you?, you know some people can be over competative…

[1] Unless you want to get into the maths maybe a 20,000ft fly buy will give you a sufficient understanding. Firstly though remember your actual aim is to,

A, Produce a short binary string of bits,

where

B, Each bit is fully independent of each of the preceading and each of the following bits”,

and

C, Idealy would be “balanced” in the number of ones to zeros or other sub-string ratios in all measurable dimensions.

There are three main issues you have to worry about in a “True Physical” system,

1, Bias
2, Noise
3, Coupling

Bias : would be where the output “tends” towards an “offset” that is you might get 5 times as many zeros as ones in a simple case or more of one bit pattern than another. There are statistical tests for these but over a short run of bits for a password only the grosser biases need be wored about (look up von Neumann di-biasing if you want to see how you go about getting rid of bias).

Noise : is supposed to be “random” it’s not it’s actually mostly “chaotic” in a true physical system. All mechanical processes generate frequences in one way or another and these have harmonics. You shaking your arm will be at a frequency and harmonics of this will effect when the dice hit the glass and at what angle. Thus the dices tragectory is mathematically chaotic not random. Does it make a difference, unlikely too for a password, plus as you tire the frequencies involved will change. But it can make a difference in a coin flipping machine as some researchers found out…

Coupling : This is a harder one to explain simply, but all mechanical systems have “memory” of one form or another, so the actual output bits are not “fully independent” of each other and coupling can look like “noise” for the same reasons of “chaos”. Over longer terms machines drift due to wear and thus it also shows up in bias. Is it going to make much of a difference for a password, unlikely.

Whilst Bias, Noise and Coupling can be detected they need many many measurments to spot, but it is the reason you should always monitor your entropy source with statistical tests BEFORE it goes into any be-bias or hashing that chip manufactures use to hide the degects of their systems (yes Intel are major culprits in this game)… But even if you can detect them, it takes many many more multiples of bits to be able to use these deficiencies predictively. Passwords are at best some very very small fraction of the number of bits, unless you are doing something wrong.

[2] For practical purposes just fill in in the order that’s easiest to check for mistakes. From a slightly deeper view if you change the positions fairly frequently, if you do have bias in your dice, the pattern will get chopped up thus harder to detect.

[3] Getting different colourd dice of the same physical size can sometimes be hard to do. In which case buy them with “white spots” and colour them in with two different marker pens.

[4] For the slightly more “careful” throw maybe 10-20% more throws. Then making more random throws use the output to “gap your way” along the string striking out characters untill you get down to the required length. This again helps make any defects in the system way way harder to spot.

Clive Robinson July 6, 2021 3:14 PM

@ All the usual suspects,

Saw this and it reminded me of the stuff I used to get upto in the 1980’s

https://donjon.ledger.com/compact-em/

Note they only did their successful tests in around three days starting from scratch, so think how much better you could do with a little longer 😉

It can be vastly improved by using a sense loop to ascertain when the software on the target is at a crirical point in execution to triger the pulse at the right time.

Thus the success rate climbs dramatically.

However the “injection source” they used is also a “14lb lump hammer” in comparison to the EM Fault Injection techniques I developed back in the 1970’s to successfully attack pocket gambling machines and electronic wallets.

I’d dig out the Email corespondence with Ross J. Anderson back in the mid 90’s –where I pointed out the dangers of “self clocking logic” with respect to “injection locking” for the likes of Smart Cards that had major major security issue back then– to get the actual dates, but I don’t fancy a rummage in the garage right now.

Needless to say you can take an RF carrier and Amplitude Modulate it with a pulse train that in effect causes the bias points of logic gates to get changed thus cause them to function incorrectly in your favour (you can also use phase modulation, as well as multiple sources to get constructive or destructive patterns in your generated field).

More interestingly you can use microwave signal sources 10GHz / X-Band etc with centimetric wave lengths that will use the ventillation slots and/or gaps at the edges of metal boxes as “resonant antennas”. Thus couple a lot of RF energy into the case if this centrimetric source is modulated at say 100MHz it will get picked up on the internal wiring and PCB traces, that get terminated in “semiconductors”… The semiconductors being basically “square law devices” or diodes will “envelop detect” the 10GHz and radiate the 100MHz around the inside of the case. If you then modulate the 100MHz signal with your “fault injection signal” then you’ve got it “past” the protection the “box bashers” in the TEMPEST brigade think they’ve stopped… Opps 😉

ADFGVX July 6, 2021 3:15 PM

@ Clive Robinson

Like many such postings on “entropy” it should carry a “Health Warning”.

So let’s talk about the Infraorder Gnathostomata in the Linnæus classification system.

There’s a “jawbone of an ass” and it’s unnecessary.

A dentist with laughing gas came up with that one.

Technically, in an evolutionary sense, the primitive jaw arches including jawbones and the hyoid bone are nothing but specially adapted ribs, assuming all vertebrates have ribs or at least rib arches to cover their hearts and lungs and enclose their digestive tract. Perhaps such animals should be classified as chordates but not vertebrates, because each and every vertebra of any vertebrate animal has a primitive jaw arch or rib arch structure aligned with it — even among the so-called Agnatha. Or else snakes should not be classified as Tetrapods, because they lack the requisite limbs.

ferritecore July 6, 2021 3:32 PM

@ADFGVX
Re:

Technically, in an evolutionary sense, the primitive jaw arches including jawbones and the hyoid bone are nothing but specially adapted ribs, assuming all vertebrates have ribs or at least rib arches…

As I seem to recall, although I am not quite that old, those features — and more — can be traced back to segments of segmented worms. No backbone required.

Weather July 6, 2021 3:40 PM

@clive
Your replied to @freezing was sarchtic ,sometimes meaning is body language.

ADFGVX July 6, 2021 4:22 PM

As I seem to recall, although I am not quite that old, those features — and more — can be traced back to segments of segmented worms. No backbone required.

And what’s special about the Tetrapods versus insects and spiders with 6 or 8 legs?

Or other crustaceans such as shrimp, lobsters, and crabs?

Or the Superphylum “Deuterostomia,” to call the anus a second mouth, if it isn’t mistaken for one of the books of the Holy Bible?

SpaceLifeForm July 6, 2021 4:40 PM

@ ALL

I have recently smelled an IOC.

JPMORGAN CHASE BANK IT folk need to be searching.

Kaseya -> Synnex -> JP MORGAN

I may be wrong, but they should research.

There are hints that Revil has actually been attacking for over a month.

Clive Robinson July 6, 2021 6:00 PM

@ Weather,

sarchtic

As is often the case, what you are trying to say gets lost for one of two reasons,

1, You post way way to short sentences, so not just meaning but context is not apparent.

2, You throw out words that are not found in an English or other European language dictionary.

Now I don’t know why the second happens, be it the use of a diferent language, or different language keyboard creating typos, or just typing way to quickly then not proof reading.

But with your overly short sentences, the usual “protection measures” inherant in most languages as apparent redundancy is entirely unavailable so your sentance has no meaning or ability to infer one.

You asked the other day why I was not posting hashes for you and I asked why I should. Your brief reply conveyed no information as to what you are doing or why. As I’m not “all seeing” I’ve no ability to work out what it is you are doing or to what purpose, so again why?

As far as I’m concerned it’s easier for you to generate such hashes in a determanistic manner and not take up blog space.

There are standard tools on most *nix boxes from the automated Zen like MOTD generator to the spelling dictionaries that will enable you to “auto-build” plain texts to make hashes, in a standard and repeatable way that others can easily verify.

You realy need to consider just how your comments come across… Broken english is nothing new on this blog, we generaly can work around it. Especially if you use longer sentances and the odd paragraph most will be able to atleast work out some kind of context thus piece things together.

Also try using longer words, the latin or Greek roots make them atleast recognisable in most European languages.

No name July 6, 2021 6:02 PM

@SpaceLifeFrom

Synnex is Chinese

Do you think that JPMorgan did business with Kaseya and Synnex or are they attacked by them?

SpaceLifeForm July 6, 2021 6:31 PM

@ No name

Do you think that JPMorgan did business with Kaseya and Synnex or are they attacked by them?

Yes.

Don’t ask non-mutually exclusive questions.

You know better.

No Name July 6, 2021 7:37 PM

@SPL

Maybe you should contact Huntress. They seem to be organizing. I’m not concerned.

What I am concerned about is that the US Congress thinks that this crap software problem can be solved by offering non technical folk a training class. Announced today: https://www.peters.senate.gov/newsroom/press-releases/peters-and-johnson-introduce-bipartisan-bill-to-help-secure-federal-information-technology-supply-chains-against-threats

The reason we have crap software out there is because non-technical people are making technical decisions. That should be illegal for the same reason we don’t let lawyers perform brain surgery.

All the same I am grateful that Congress recognizes we have crap software and vendors out there.

Which leads me back to JPMorgan – I could be wrong, but I don’t think they suffer from this. The Pentagon cancelled their Microsoft cloud contract today because of the tech. So let’s just look at this as an opportunity to invest more into cloud R&D and regulations before any additional migration takes place.

DHS cancelled their failed Eagle II cloud migration last year too – after realizing that Zero Trust is impossible. They let this project expire after this RFI probably turned up no no response. The big question is why didn’t the GSA or DHS communicate this to the DoD? https://www.datacenterdynamics.com/en/news/department-homeland-security-plans-cloud-shift-data-center-consolidation/

We need a collective time out.

Sorry for referring to it as crap. I don’t know what else to call it at this point.

SpaceLifeForm July 6, 2021 8:25 PM

@ No Name, Clive, ALL

Synnex dots

hxtps://www.bloomberg.com/news/articles/2021-07-06/russian-state-hackers-breached-republican-national-committee?sref=ylv224K8

“We immediately blocked all access from Synnex accounts to our cloud environment,” he said. “Our team worked with Microsoft to conduct a review of our systems and after a thorough investigation, no RNC data was accessed. We will continue to work with Microsoft, as well as federal law enforcement officials, on this matter.”

SpaceLifeForm July 6, 2021 9:50 PM

@ JonKnowsNothing, Winter, Clive

Vax up. Stay safe. Don’t fly.

hxtps://johnpavlovitz.com/2021/07/06/our-family-got-vaccinated-then-we-all-got-covid/

Freezing_in_Brazil July 6, 2021 9:58 PM

@ Clive Robinson, ADFGVX, All

I won’t pretend to be on par with you [nor that I get your sophisticated humor without a visual cue]. I marvel at your ability to weave long comments with cadence and precision. I just have to thank you for your time [especially Clive for his post – Almost midnight local time, I’ll read it calmly in the morning]

I just want to point out that I mean bare metal more than software; the ability to sample a certain random state [something better than eg date] in the machine.

*I see that we have an entire thread dedicated to the subject today. 🙂

@ Weather

re ‘the whole movie as sample’, good one.

@ SpaceLifeForm

My comment about Bolsonaro is in rrespomnse to your post

Regards

lurker July 7, 2021 12:36 AM

@No Name

They also think that Physical Security is IT Security. That is problematic for a tech company.

Why problematic? The position described is an on site, hands on job, where the bad guys are walking in and doing stuff to your gear. Theft from a warehouse might seem a simple job for the local cops. But if they bring the boxes back next day with different chips inside, will the local cops know or care?

Physical Security has to be part of IT Security. Edward Snowden types in their command bunkers are on the back foot when the security cameras are playing back a recording of yesterday’s movements with today’s timestamp.

I liked the bit about languages too. Fluency in another language gives you a window into the thinking of the other culture.

SpaceLifeForm July 7, 2021 12:57 AM

@ No Name, Clive

Yeah, not good. Something leads to Concentrix.

Why these points?

“Ten years of progressive law enforcement experience, civil or military, on the street and as a detective”

“25-40% travel is needed to visit our warehouses regularly, often multi-weeks at a time; this role typically can involve longer working hours as required”

“fluency in Spanish or Mandarin is a strong preference”

Just the travel is a huge red flag. Warehouses?

But, the webpage NEVER says anything about real IT skills.

What are they really looking for?

And this:

Disclaimer for US Recruitment

Please note SYNNEX is not able to hire in the following states: Alaska, Alabama, District of Columbia (DC), Hawaii, Iowa, Idaho, Kansas, Louisiana, Michigan, Montana, North Dakota, New Mexico, South Dakota, Vermont, Washington, Wyoming

No Name July 7, 2021 1:00 PM

@SpaceLifeForm

The reason why this company is banned in so many States is due to violating the labor laws. They don’t pay people and they allegedly operate a WFH scam.

The Federal Gov has over 300 active contracts with a co that seeks Mandarin speaking employees and has active lawsuits against them (under numerous D/B/A’’s). There’s class actions too.

Plus there’s other jobs posted which point to them having issues with accounting fraud.

Another thing I check when scrutinizing vendors are their financial statements. Do the numbers make sense?

I also look at their Cybersecurity and IT staff background. Do they hire qualified staff? Do they even have the roles and corporate structure they need to qualify for Government contracting?

The US needs a way to define domicile more than just where the company says they are. This company claims to be American. Using this designation opened the door to very sensitive contracts and enables them to buy up other sensitive companies.

They are a Microsoft reseller. Today their press releases claiming they aren’t a MSP, yet they sell and manage O365.

Do I think Russia would attack a Chinese vendor and just go for the RNC’s data? Nope. That makes no sense.

7/28/21 https://sdvsolutions.us/sdv-solutions-synnex-partnership/

They also just announced they are purchasing Tech Data. Wonder if they have US Gov contracts too? They are supposedly ranked 83 in the global Fortune 500.

How bad do we want to lose our life and Liberty?

https://www.insider.com/disenchanted-chinese-youth-join-a-mass-movement-to-lie-flat-2021-6

Read the article. Grass ain’t greener.

Weather July 7, 2021 1:44 PM

@clive
I’m dyslexic I can’t spell, if it wasn’t for the spell checker no one would understand what I wrote.

About sha256 the bit strength is about 35/255 a large drop, I just don’t know how to release the program, I though that if someone gives me a hash and in return I say these a probably the input characters then it should show that the program works if only to the people on this blog.

No Name July 7, 2021 2:11 PM

@Lurker

In companies of this size, with lots of facilities – physical security is its own department. There’s laws pertaining to health and safety in facilities. So there’s compliance and environmental roles too.

IT Security is under IT. Plus if someone is looking for a physical security job, they likely wouldn’t use “IT Security” as a search term, and they may not even open up a job with this title. But former police and military might be enticed by the opportunity to transition into IT.

What’s also curious is that the advertisement shows that they don’t even interview the person live. The person interviews by recording a video. Is that because they don’t pay people? So there’s no record or proof of who hired them?

Wage theft is a crisis in the USA. And it is quite common in the security field. The biggest global companies do it. https://abcnews.go.com/Business/wireStory/companies-rip-off-poor-employees-77477685

This is why they are banned in 16 states. I researched.

One key attribute is getting the employee to travel near their hire date. Before they have corporate cards or are set up in payroll. It’s quite common in IT contracting too. But they never pay. It usually involves the Fortune 500 and relocation.

Yes – @SpaceLifeForm

Very curious they are looking for retired Military and former law enforcement who need to travel. My sense is they make new hires travel on their own credit card and then don’t pay them and then the employee is blackmailed into doing something they wouldn’t normally do.

Clive Robinson July 7, 2021 3:58 PM

@ Weather,

I’m dyslexic I can’t spell, if it wasn’t for the spell checker no one would understand what I wrote.

You are not the first, nor will you be the last with dyslexia on this blog (I get shot at occasionally for the same reason).

There are three basic solutions,

1, Use a spelling checker.
2, Learn the four hundred or so rules of spelling English words.
3, Think up a mitigation.

The use of a spell checker may not be possible on all systems some but by no means all web browsers or smart phones have them, others do not.

Unless you are planing on writing a rules based spell checker, these days I’d keep away from “the rules” even the likes of “I befor E except after C” have more exceptions than make the rule useful.

Thinking up a mitigation that suits you is probably the best way.

As I indicated if you use longer sentences then yes there will be more incorrectly spelt words, but… The inherant “error correction” goes up as does the “context data”. Also the use of longer latin based words works because “the rules” of Latin are way more like rules than mere hints.

The other thing is if you’ve got “predictive spelling” on your system if you know you have trouble with certain word types you may recognise the word.

For instance “mere” is a word I have problems with I want to spell it differently however if I type “mee” or “mea” or similar it tends to pull “mere” up in the list.

As a mitigation it works for me because there is an imbalance between writing and reading… It’s also why I do usually go back and read through what I’ve typed before hitting submit… However I still have troubles near the end of a line and it’s wrap around as a continuance on a new line, why I don’t know, lets just call it a work in progress 😉

(just spotted I’d got hitting wrong above)

But I’d definitely advise longer sentences with longer words as a start point.

So onwards and upwards.

About sha256 the bit strength is about 35/255 a large drop, I just don’t know how to release the program, I though that if someone gives me a hash and in return I say these a probably the input characters then it should show that the program works if only to the people on this blog.

OK, you’ve made an observation, have you tried to “walk it through” to see why you might be observing what you see?

Have you come up with a “test generator” and run tests on two or more sha256 implementations to cross check there is not something broken in the version you are using.

Have you checked to see if it’s an artifact of the input string length?

Lets put it this way if I put in 26 strings of just each alphabet letter in turn, as that is such a very very small fraction of the 2^256 strings possible I would be surprised if they had uniform statistics in the output. As the strings increased in length thus number, I would expect the statistics to become more uniform. But realistically I’d suspect that anything less than about 2^128 strings to show some signs of bias.

Usually the first step you have to do after making a hypothesis from observations is to try to come up with ways to disprove the hypothesis. If you do in part disprove the hypothesis, you go back to making observations from a slightly different point of view.

But eventually you are going to have to get “down and dirty” with the algebraic and logical structure of sha256 and show where the bias is originating. If you know how to pull linear operations away from nonlinear operations then there are quite a few tools you can use to analyze the linear structure.

It’s not an easy path to go on your own especially if you do not have the computing power to run sufficient tests.

SpaceLifeForm July 7, 2021 5:39 PM

@ No Name

Weird that @kaseya does not want to pony up the petty cash, yet the Ever Given is now free for a magnitude more.

MarkH July 7, 2021 6:14 PM

@Freezing:

Remember that hash functions are your friend, especially those used in cryptography1.

For example, suppose you want 256 unpredictable bits, and your data source is 99% predictable. Putting it a little simplistically, this can be thought of as 99 predictable bits (no entropy) for each bit of maximum entropy.

If you collect 25 K bytes from such a source — almost all of which are predictable — and compute the SHA1 hash of that file, its output will have 256 bits of entropy. In other words, the result will have the maximum possible of one bit of entropy per bit of output. The 25,344 predictable bits don’t detract from that!

Unless you need a large volume of entropy, low-entropy sources can be practically used in this manner, and physical transducers measuring local conditions can serve this purpose — however, the characteristics of such a source (and the way you sample it) must be studied with care when estimating how much entropy you can get from it.

For a more sophisticated (and practical) perspective on working with limited sources of entropy, take a look at the Fortuna algorithm designed by Bruce Schneier and his colleague Niels Ferguson:

https://www.schneier.com/academic/fortuna/

  1. For the purpose of entropy concentration, it doesn’t matter at all whether the hash function is secure against forgery, so “broken” functions like MD5 are fine. 

MarkH July 7, 2021 6:21 PM

@Freezing:

In the example, I mistakenly wrote SHA1, which makes a digest if only 160 bits. SHA256 would be appropriate to the example I offered.

SpaceLifeForm July 7, 2021 6:57 PM

@ MarkH, Freezing, Clive

If you collect 25 K bytes from such a source — almost all of which are predictable — and compute the [REDACTED] hash of that file, its output will have 256 bits of entropy. In other words, the result will have the maximum possible of one bit of entropy per bit of output. The 25,344 predictable bits don’t detract from that!

Sorry, but I must respectably disagree.

You can not create entropy out of thin air.

Good luck pulling 25K of bytes from /dev/random

Roll Dice

SpaceLifeForm July 7, 2021 7:34 PM

@ Freezing_in_Brazil

I’ve been watching for half a century.

There are no surprises.

hxtps://apnews.com/article/climate-climate-change-science-environment-and-nature-935be069af34aad472074d42097af85e

“But you ain’t seen nothing yet,” he added. “It’s going to get a lot worse.”

JonKnowsNothing July 7, 2021 8:02 PM

@ Clive Robinson @ Weather

re: 2, Learn the four hundred or so rules of spelling English words.

I always feel sorry for people forced to learn English (UK or AM), it’s really a nasty language to navigate. It maybe my first language but I am no more literate in it than someone just learning “the rules”.

Spell checkers work, until they don’t but the wiki-dictionary/thesaurus helps lots. You can get a handle on other languages too, in case you need to quote passages of Cicero in the original…

I’ve done stints in a few other languages and I think French gets my next vote for awful. Sounds great, writes like merde. Everything in French has a rule. The rules are very complete. The only problem is that every aspect of the language is an exception to the rule.

Between not being able to spell in my own language, and not being able to spell in other languages and not being able to keep the syntax straight in the hundreds of computer languages and scripts I’ve had to work with, I can say I am very grateful for Compiler and Syntax Checking Editors.

Now if they could just fix all the massive offset typos with DWIM rules, I’d be set.

===

ht tps://en.wikipedia.org/wiki/Cicero

ht tps://en.wikipedia.org/wiki/Writings_of_Cicero

(url fractured to prevent autorun)

Fake July 7, 2021 8:38 PM

@slf,

You could attach /dev/random to a boggle pod with a camera and an actuator.

Maybe you could seal it w silicone and use a vacuum pump.

/Roll Dice

MarkH July 7, 2021 10:36 PM

@SpaceLifeForm:

As I read it, the concept initially put forth by our friend in Brazil was to use “sensor data” — not a bit generator packaged with the OS.

Maybe you wrote with some intended irony? From thin air — or whatever atmosphere may be available — one might measure thermal infringement on a tiny diaphragm; temperature; ambient sounds; humidity; barometric pressure; concentrations of trace chemicals; and so on.

In general, such parameters have a degree of predictability. To some extent, depending on the sensor, sampling technique and local conditions, they also exhibit variations beyond feasible prediction, especially if you aren’t operating an embassy in Moscow.

When the entropy of sources is not dense, you can concentrate the entropy you have. With some patience and conservative estimation, you can attain maximum entropy in enough bits to generate keys for individual use.

And if you want, you can modify your /dev/random to use your sensor data … at max entropy, so no post-processing is needed.

Winter July 8, 2021 12:29 AM

@SLF
“You can not create entropy out of thin air.”

MarkH forgot to stress the last step. You get a 256 bit hash out of SHA256.

A hash function concentrates all the entropy in the input into the output. If the hash algorithm is good, 25.6k bits with 256 bit of entropy result in a 256 bit SHA256 hash with 256 bit of entropy.

No entropy created.

Although, I agree with MarkH that thin air could be a source of good entropy. I would go for sound, as sound waves (all frequencies) are the principal means energy is distributed in matter.

Put up a microphone in the wind (or a ventilator or busy street), followed by lossless compression (FLAC) and SHA512 should do the trick. Provided you take long enough recordings.

Winter July 8, 2021 12:49 AM

@SLF
“You can not create entropy out of thin air.”

PS, here is a theoretical (not practical) treatise about how this actually can work:

Le Bot A. 2017
Entropy in sound and vibration: towards a new paradigm.
Proc.R.Soc.A473: 20160602.
ht tps://royalsocietypublishing.org/doi/pdf/10.1098/rspa.2016.0602

This paper describes a discussion on the method and the status of a statistical theory of sound and vibration, called statistical energy analysis (SEA). SEA is a simple theory of sound and vibration inelastic structures that applies when the vibrational energy is diffusely distributed. We show that SEA isa thermodynamical theory of sound and vibration,based on a law of exchange of energy analogous to the Clausius principle. We further investigate the notion of entropy in this context and discuss its meaning. We show that entropy is a measure of information lost in the passage from the classical theory of sound and vibration and SEA, its thermodynamical counterpart.

echo July 8, 2021 12:55 AM

@JonKnowsNothing

I always feel sorry for people forced to learn English (UK or AM), it’s really a nasty language to navigate. It maybe my first language but I am no more literate in it than someone just learning “the rules”.

English has the same grammatical structure as Japanese apart from one difference. Verb order. English words are built of a relatively small number of building blocks and Japanese isn’t disimilar although English is more fluidly adapted than Japanese which tends to be more rigid in its use and interpretation. Almost all people who say a language is hard to learn (whether English or Japanese) are saying this because they do not get the grammatical differences between language groups and ebcause everyone else is saying it. Once you get this difference learning a new language is easier as you are not trying to force your preconceptions into a model which does not fit. Mandarin or Cantonese among others are tonal languages which is another layer.

Another similarity between England and Japan is both are almost unique in being island nations and having cuisine with ingredients which remain seperate on the plate.

Learning has a psychological element and a neurological element. The belief something is hard and tripping on beginner mistakes puts a lot of people off so their learning process is sub-optimal. Studies suggest this impacts women more than men in subjects such as maths but women tend to self-report this more so this is likely inaccurate insofar as outcomes are concerned. Older people tend to learn languages slower and it doesn’t stick so well but even learning a language when older comes with significant additional mental health benefits.

Actors may be proficient in rote learned scripts and convey the impression of being fluent speakers with no detectable accent.

Language may also be physical or presentational which does have an impact on communication. Also different people and communities and cultures and places place different weight and interpretations on the same words.

There are some interesting studies on ScienceAlert this week discussing language in animals and dogs emotional attachment with people they know versus strangers. Both pose interesting questions on the energy demands and utility of language, and a counter-intuitive understanding of degrees of aggressive versus how easily one is bought off.

I have found you can get by in a foreign langauge so you don’t get lost or die of starvation within a few weeks. Six months thrown in the deep end in a fully immersive environment is enough time to begin not sounding like an idiot. The key is immersion.

Immersion is a very powerful tool which can be used or abused.

Clive Robinson July 8, 2021 2:42 AM

@ SpaceLifeForm,

Sorry, but I must respectably disagree.

So must I,

@ MarkH,

If you collect 25 K bytes from such a source — almost all of which are predictable — and compute the [SHA256] hash of that file, its output will have 256 bits of entropy. In other words, the result will have the maximum possible of one bit of entropy per bit of output. The 25,344 predictable bits don’t detract from that!

No it will not.

Whilst the file contains some small measure of entropy the hash function in no way changes that.

What happens at the output of each use of the hash is totaly dependent on the input to the hash, that is it is fully determanistic[1]. There is no argument to be had on that the hash does not in any way create entropy.

However for each true bit of entropy in the input the hash algorithms Avalanche Effect[1] will change ~50% of it’s output bits. So in effect 1bit of entropy gets spread across 256/2 or 128 bits giving each one in effect 1/128th of a bit of entropy.

What happens next is very dependent on how you apply the hash. If you do not “chain it” in some way then the spread of the entropy stops at that hash output block.

If you chain it, it depends on if you “bitwise add” (vector add without carry) or “wordwise add” (traditional addition with carry).

If you bitwise add then the resulting bit into the hash would be the hash output XORed with the next 256 bits of the file. Each one of those ~50% changed bits will each likewise effect ~50% of the bits, however the result will be that some changes ~50% of the changed bits will be back to what they would have been if the original 1bit of entropy had not changed them, thus undoing changes, also ~50% of the bits not originally changed will now be changed, but it’s still only 1bit of entropy getting spread across around ~50% of the hash output bits.

You can show similar logic for a traditional wordwise add across the 256bits, but there are also the non-linear effects of carries to consider that tend to make the upper bits have more non linear xhanges than the lower bits. This distorts the way the individual bits of entropy are spread across the hash output.

Thus each bit of entropy in the file changes ~50% of the bits after the first hash. But after two hashes you still only have one bit of entropy, but ~50% of the bits that were not changed with the first hash have now changed, and ~50% of the bits that were changed have now been effectively changed back. That will continue for as kong as you apply the hash. The same logic applies with the second bit of entropy only some of the ~50% of bits changed will be bits also changed by the first bit of entropy.

When you do the reasoning you will realise that only ~50% of the bits at the hash output will have changed no matter how many bits of entropy are actually in the file.

Further, you will realise for the bitwise addition that the entropy becomes in effect a cyclic cellular automata feed back –similar to a LFSR– super impossed on each bit that goes into the hash.

[1] In effect a hash is nothing more than a very large mapping function where the complexity of the algorithm trys to give it certain characteristics.

1, Psudo One Way Function.
2, Each input bit change causes ~50% of output bits to change in the Avalanche Effect.

Clive Robinson July 8, 2021 3:06 AM

@ JonKnowsNothing, Weather,

in case you need to quote passages of Cicero in the original…

Yes he had some timeless observations about how we humans do not learn despite history,

Six mistakes mankind keeps making century after century:

Believing that personal gain is made by crushing others;

Worrying about things that cannot be changed or corrected;

Insisting that a thing is impossible because we cannot accomplish it;

Refusing to set aside trivial preferences;

Neglecting development and refinement of the mind;

Attempting to compel others to believe and live as we do.

In their way they are all “deadly sins”, and each of us has at some point committed one if not all of them, sometimes as much with love as hate or the banality of indifference.

JonKnowsNothing July 8, 2021 3:19 AM

@echo

re: Six months thrown in the deep end in a fully immersive environment…

I dunno, I’ve been in the Deep End of American English for decades and it hasn’t improved my ability to spell my own language…

Literacy is a big issue globally, a good number of folks can speak their first languages but they may not be able to read or write. Many reasons can impact this aspect: gender, culture, health, opportunity and more.

Basic communications is not quite enough for full literacy. Idioms are a most interesting aspect and wars have been started over incorrect translations.

tl;dr

Today I learned that showing the bottom of your shoe can be a serious insult in some cultures and the one that got thrown at a US President had special meaning.

And here I had thought that it was the only thing security hadn’t removed before the reporter was allowed in the room and a shoe carries farther than a necktie.

Cause then they started making you take your shoes off to get on a plane, and your belt … it’s a wonder there were not more eye-popping moments while shuffling in socks holding your pants up while trying to make sure no one made off with your electronics at the other end of the conveyor belt and getting there before they landed on the floor…

While I cannot say others have had the same experience but rarely did a previous work position using X, Y or Z computer language translate to a new position.

New positions generally required new software, new compilers, new systems and new coding methods, and of course a whole new design and deliverable. The key was to get past the one dude(1) who was either The Big Dog or The Big Stop-Sign who inevitably challenged everyone to code something on-the-fly just to prove how stupid you were and how unworthy you were to be in the same office building with such a brilliant personage as themselves.

Silicon Valley was full of such persons.

I remain functionally illiterate in many languages ….

===

1, In the day, it was always a dude. Today it could be anyone. I don’t envy anyone having to face a phalanx of such persons.

ht tps://en.wikipedia.org/wiki/Functional_illiteracy

  • Functional illiteracy consists of reading and writing skills that are inadequate “to manage daily living and employment tasks that require reading skills beyond a basic level”.[1] People who can read and write only in a language other than the predominant language of where they live may also be considered functionally illiterate.[2] Functional illiteracy is contrasted with illiteracy in the strict sense, meaning the inability to read or write simple sentences in any language.

ht tps://www.theguardian.com/world/2021/jul/08/toppling-saddam-hussein-statue-iraq-us-victory-myth

They would all crowd around [a toppled statue] Saddam and start hitting [the statue] with their shoes. (Shoes are considered dirty in the Middle East: it is rude to show someone the soles of your shoes, and a terrible insult to hit them with a shoe. In 2008, an Iraqi journalist would make international news when he threw a shoe at George W Bush.)

(url fractured to prevent autorun)

Winter July 8, 2021 3:22 AM

@Clive
“In effect a hash is nothing more than a very large mapping function”

As I understand it:
An X bit perfect hash function maps any possible string of input bits onto a completely unpredictable string of X output bits such that every individual input string will always be mapped to the same output string.

If I have a string of 25k bits that has only 256 bit of entropy, there are only 2**256 possible unique such strings. I can, theoretically, compress every possible such string of 25k bits to an unique string of 256 bits that uniquely identifies every possible such string of 25k bits.

If the hash function is perfect, it will compress that 25k bit input string to a unique 256 bit output string that does contain exactly the same entropy/information as the 256 bit compressed string that fully describes the input string.

If that is not true, then the hash function is not perfect. The question is then, in what way is SHA256/SHA512 imperfect in that it loses (so much) entropy?

And is there a hash function that can indeed preserve the entropy in the original? If not, why not?

Clive Robinson July 8, 2021 3:46 AM

@ Winter,

And is there a hash function that can indeed preserve the entropy in the original? If not, why not?

We’ve been down this road before with the One Time Pad encrypting another One Time Pad.

The answer is after you shuffle the parts around the same.

MarkH July 8, 2021 4:28 AM

@Clive, Freezing, et al:

Looks like a communication breakdown!

I don’t think I wrote anything about creating entropy by hashing … did I?

Hash functions are an excellent tool for concentrating entropy from a data source with low “density” of entropy.

If the input data has zero entropy, so will the hash.

The part about chaining, lost me completely. When I’ve collected data with n bits of entropy, and evaluate its n-bit hash, I’ve gotten (for practical purposes) one bit of entropy per output bit. What on Earth is there to chain???????

I suspect that our trains of thought went down different railways.

========================

It’s easy to understand entropy concentration by a thought experiment.

Imagine a data source generating a 2^20 bits (128K byte) file, almost all of which is fixed. Only the 389,211th bit is unpredictable, equally likely to be one or zero. The file is a million bits long, but has only one bit of entropy.

The SHA256 digest of that file will assume either of two values. The digest is 256 bits long, but has has one bit of entropy.

The hash output is much shorter than the input file, but contains the same entropy. The entropy per bit of the hash is much greater than that of the input file.

Now suppose that the 861,903rd bit is likewise unpredictable; this leads to four possible digest values, all equally probable. Both the file and its hash have two bits of entropy.

There is no obstacle to iterating this process 254 more times, after which the megabit file has 256 bits of entropy, and the hash has very nearly as much. [In practice, some collisions are to be expected, so there will be less than 2^256 distinct hash values. But because it’s a good hash function, the probability of collision is low, and the hash will have more than 255 bits of entropy.]

The hash function is clearly serving to concentrate the scattered entropy in the input file.

In real life, the input probably won’t be a mixture of fixed (zero-entropy) and completely unpredictable (maximum-entropy) bits, but rather consist of bit sequences with less than one bit of entropy per bit of data.

But the same principle applies, and the hash function will concentrate the diffuse entropy of the data source.

Winter July 8, 2021 4:32 AM

@Clive
“We’ve been down this road before with the One Time Pad encrypting another One Time Pad.”

That claims that it is impossible to extract any meaningful part of the entropy from a bit string.

It is clear that we cannot exactly determine (or extract) the entropy of a bit string, it’s Kolmogorov complexity, because it is incomputable. However, Kolmogorov complexity does show that we can indeed define unequivocally the complexity/entropy of a bit string and can implement approximations of this complexity/entropy. That is what compression is all about. The FLAC version of an audio recording with noise is much longer than the FLAC version of an audio recording with silence (quantization noise).

So, it must be possible to approximate the entropy of a bit string to some precision. And by the same method, it must be possible to extract a certain fraction of the entropy in a bit string in a what could only be described as a has function.

Clive Robinson July 8, 2021 5:25 AM

@ MarkH,

Looks like a communication breakdown!

I don’t think I wrote anything about creating entropy by hashing … did I?

Explicitly or implicitly?

You claimed,

and compute the [SHA256] hash of that file, its output will have 256 bits of entropy. In other words, the result will have the maximum possible of one bit of entropy per bit of output.

That is you say 256bits of entropy at the hash output, I’ve shown it will be ~50% of that.

So there is a ~100% increase in your claim to account for…

Winter July 8, 2021 5:55 AM

@Clive
“That is you say 256bits of entropy at the hash output, I’ve shown it will be ~50% of that.”

If that would be true, that could be solved by using SHA512 io SHA256.

Clive Robinson July 8, 2021 6:14 AM

@ MarkH,

Hash functions are an excellent tool for concentrating entropy from a data source with low “density” of entropy.

As I’ve explained in the past hash and other crypto functions do not “concentrate” or “multiply” or any other of the claims people make and thus drop themselves into “magic pixie dust thinking”.

From the perspective we are looking at such deterministic algorithms have two purposes,

1, The aproximate a “one way function”.

2, They are designed to have an Avalanche criteria of ~50% at the output for each and every input bit.

So under 2 if you have two inputs that differ by a single bit, then the output will have ~50% of the bits changed in a determanistic way that under 1 is practically not invertable.

Nothing has been “concentrated” or “multiplied” all that has happened is that 1bit of change at the input has caused ~50% of bits to change.

However if you change 2bits, you expect to get ~50% change for each of them, which you do. However the changes occur in a finite field, thus ~50% of the changes from the first bit will get changed back by the second bit (ie ~25% of the total bits). Likewise ~50% of the bits that were not changed by the first bit will be changed by the second bit (ie ~25% of the total bits). The overall effect is that only ~50% of the bits change (ie ~25% + ~25%).

It does not matter how many bits you change at the input, you would only expect a ~50% change of bits at the output. It’s actually a crucial requirment to remove bias thus prevent “discriminators” that would allow someone who can only see the output determaning the state of any of the input bits.

The part about chaining, lost me completely.

You have a file of data that far exceeds the bit width of the hash.

You have three choices,

1, Just arbitarily select one block from the file and hash that.
2, Output one hash for each block of data from the file.
3, Chain either the file or hash output to reduce down the file to the hash size.

The first achieves very little of use for the usual use of crypto hashes which is illicit “change detection”, as it will only work on one small part of the file.

The second obviously will detect all illicit changes and localise them to a given block in the file. The downside is the final hash is as large if not larger than the file size.

The third makes the output of the final hash dependent on the current hash and the preceding hash and so on back to the first block in the file. Therefore it will detect any illicit change in the file, but obly needs to have the final hash output to do so.

There are two basic ways you can chain,

1, Chain the file contents before hashing.
2, Chain the hash outputs whilst hashing.

Of the two the second is generally considered more secure.

Finally,

But the same principle applies, and the hash function will concentrate the diffuse entropy of the data source.

Ad I’ve already said the hash or crypto function does not “concentrate” it just changes ~50% of the output bits for each bit change at the input in a hopefully non reversable way to an observer who can only see the output of the hash.

Clive Robinson July 8, 2021 6:31 AM

@ Winter,

That claims that it is impossible to extract any meaningful part of the entropy from a bit string.

Not true,

There are two ways of extracting information or entropy from a bit string.

1, Compare it to a known string.
2, Compare it to a test string.

Both can be done on a bit by bit basis or in a statistical way on multiple bits.

When I said,

“We’ve been down this road before with the One Time Pad encrypting another One Time Pad.”

I was simply making a refrence to a previous conversation about a “One Time Pad encrypting abother One Time Pad”.

As for,

If that would be true, that could be solved by using SHA512 io SHA256.

Don’t be daft, it would only double the number of bits.

It would not in any way change the fact that the ratio of changed bits at ~50% would be the same, not the 100% MarkH has claimed.

Winter July 8, 2021 6:51 AM

@Clive
“Ad I’ve already said the hash or crypto function does not “concentrate” it just changes ~50% of the output bits for each bit change at the input in a hopefully non reversable way to an observer who can only see the output of the hash.”

A good (perfect?) hash function distributes the entropy of the input string uniformly over all bits of the output string. If the entropy of the input string is larger than what fits in the output string, that maxes out. But if the entropy of the input string is smaller than that, the output string should, again, in the perfect hash function, contain all the entropy of the input string.

I do not really see how this “it changes 50% of the bits” has any bearing on the entropy of the output string.

The ideal hash function, i.e., its mathematical model, is a Random Oracle model which has no internal workings.
ht tps://en.wikipedia.org/wiki/Random_oracle

If the input string has 256 bits of entropy, the ROM can only output 2^256 different output strings. Hence, the ROM will captures all the entropy as long as it has 256 bits or more in the output string. If the ROM outputs more than 256 bits of output, it will look like it increases the entropy, but we can enumerate all the 2^256 possible input strings and tabulate the corresponding 2^256 hash output strings so nothing goes wrong here.

If you want to argue that “real” hash algorithms are not ideal like a ROM, I will not deny that.

But your argument seems to imply that SHA1, SHA2, SHA3, SHA128, SHA144, SHA256, and SHA512 are all the same and all capture only 50% of the entropy of the input string, irrespective of the length of their output.

I am not yet willing to buy that.

Winter July 8, 2021 6:56 AM

@Clive
“It would not in any way change the fact that the ratio of changed bits at ~50% would be the same, not the 100% MarkH has claimed.”

Now you lost me completely.

If you change all the bits, that is the legendary NOT cryptography that changes all of the bits where other cryptographic systems change only 50% of the bits.

Fake July 8, 2021 6:59 AM

Clive you make it sound as if a shaking mixing or stirring function is an additive xor.

😅

I’m not sure if I would use microphone based data, surely you guys have heard static or hum in recordings and playbacks.

Short sporadic samples triggered by when and how much data was needed but a microphone would likely be extremely vulnerable to at least two types of the gaming commission.

Unshielded and input saturation.

Fake July 8, 2021 7:07 AM

A fan with tinsel a disco ball and a camera should give a fairly semi random photograph. Cameras need light shined into them to blind the sensor, are they still vulnerable to the same generic blinding that an analog low powered microphone would be?

I was wondering if you guys didn’t just have a diagram for a serial port wire + crystal + diode or something simple.

The way it used to be certain countries had to scavenge electronics per sae and the likelihood of still having a serial or ltp port would be pretty good….

Short length of wire, appropriate crystal and what were those diodes you used to talk about?

echo July 8, 2021 7:09 AM

https://www.eesc.europa.eu/en/our-work/opinions-information-reports/opinions/action-plan-synergies-between-civil-defence-and-space-industries

Action Plan on synergies between civil, defence and space industries

On 22 February 2021, the European Commission (EC) presented an Action Plan on Synergies between civil, defence and space industries (COM(2021) 70 final) to further enhance Europe’s technological edge and support its industrial base. This timely and strategic Action Plan is designed to reinforce European innovation by exploring and exploiting the disruptive potential of technologies at the interface between defence, space and civil uses, such as cloud, processors, cyber, quantum and artificial intelligence.

I don’t personally like the use of the word “disruptive” nor the focus on military applications as it tends to have a narrowing effect on discussion and broader considerations. As for timely arguably Europe is 10-20 or more years behind where Europe should be. Nonetheless a response was required and Europe is taking it seriously.

https://techcrunch.com/2021/07/07/youtubes-recommender-ai-still-a-horrorshow-finds-major-crowdsourced-study/

New research published today by Mozilla backs that notion up, suggesting YouTube’s AI continues to puff up piles of “bottom-feeding”/low-grade/divisive/disinforming content — stuff that tries to grab eyeballs by triggering people’s sense of outrage, sewing division/polarization or spreading baseless/harmful disinformation — which in turn implies that YouTube’s problem with recommending terrible stuff is indeed systemic; a side effect of the platform’s rapacious appetite to harvest views to serve ads.

That YouTube’s AI is still — per Mozilla’s study — behaving so badly also suggests Google has been pretty successful at fuzzing criticism with superficial claims of reform.

The mainstay of its deflective success here is likely the primary protection mechanism of keeping the recommender engine’s algorithmic workings (and associated data) hidden from public view and external oversight — via the convenient shield of “commercial secrecy.”

But regulation that could help crack open proprietary AI blackboxes is now on the cards — at least in Europe.

[…]

Mozilla says the crowdsourced research uncovered “numerous examples” of reported content that would likely or actually breach YouTube’s community guidelines — such as hate speech or debunked political and scientific misinformation.

But it also says the reports flagged a lot of what YouTube “may” consider “borderline content.” Aka, stuff that’s harder to categorize — junk/low-quality videos that perhaps toe the acceptability line and may therefore be trickier for the platform’s algorithmic moderation systems to respond to (and thus content that may also survive the risk of a takedown for longer).

[…]

A particular stark metric is that reported regrets acquired a full 70% more views per day than other videos watched by the volunteers on the platform — lending weight to the argument that YouTube’s engagement-optimising algorithms disproportionately select for triggering/misinforming content more often than quality (thoughtful/informing) stuff simply because it brings in the clicks.

While that might be great for Google’s ad business, it’s clearly a net negative for democratic societies that value truthful information over nonsense; genuine public debate over artificial/amplified binaries; and constructive civic cohesion over divisive tribalism.

But without legally enforced transparency requirements on ad platforms — and, most likely, regulatory oversight and enforcement that features audit powers — these tech giants are going to continue to be incentivized to turn a blind eye and cash in at society’s expense.

Anyone with half a clue to see the EU acting on this at some point. The methods of the far right and their networks and how they interact with and are fuelled by social media including Youtube are now very well documented. While Youtube may hide behind US freedom of speech and other legislative protections these dont’ work in Europe which has substantial human rights protections.

Returning to the regulation point, an EU proposal — the Digital Services Act — is set to introduce some transparency requirements on large digital platforms, as part of a wider package of accountability measures. And asked about this Geurkink described the DSA as “a promising avenue for greater transparency.”

But she suggested the legislation needs to go further to tackle recommender systems like the YouTube AI.

“I think that transparency around recommender systems specifically and also people having control over the input of their own data and then the output of recommendations is really important — and is a place where the DSA is currently a bit sparse, so I think that’s where we really need to dig in,” she told us.

One idea she voiced support for is having a “data access framework” baked into the law — to enable vetted researchers to get more of the information they need to study powerful AI technologies — i.e., rather than the law trying to come up with “a laundry list of all of the different pieces of transparency and information that should be applicable,” as she put it.

This kind of thing is required at some level to get within the OODA loop of Youtube et al. Normally it is courts which provide a venue or this level of scrutiny after the event and only then after a lot of struggle and sometimes not even then.

Whether from within or without both acute and cumulative impact must be considered, and experts who get the idea of constructive dismissal and complex frauds so that events and attitudes which would normally result in opportunistic content or content which slides under the radar is more readily identified and actioned. Expertise should almost certainly be informed by scientific and legal and community experiences and opinions.

This is obviously a developing area but ultimately individuals cannot shirk accountability by hiding behind algorithms or any other form of bureaucracy.

Clive Robinson July 8, 2021 7:56 AM

@ Winter,

A good (perfect?) hash function distributes the entropy of the input string uniformly over all bits of the output string.

Do you actually understand that statment or are you just parroting it?

Because it’s incomplete. The statment only applies if the hash is iteratively applied to some distant point and may never get there for all bits (though apoximately 2N where N is the bit width usually is sufficient).

As I’ve pointed out there is an ~50% change with each iteration. Which means that aproximately half the bits that were not changed on the previous hash have now been changed (ie ~25%). Further aproximately half the bits that were changed have been changed again or bit flipped back to their value prior to the previous hash (ie ~25%).

As in theory the bits that are changed are done randomly over time any changes will average out.

That means that only aproximately half the bits are different no matter how many iterations of the hash you apply.

What do you not understand about that?

Winter July 8, 2021 8:05 AM

@Clive
“Because it’s incomplete. The statment only applies if the hash is iteratively applied to some distant point and may never get there for all bits (though apoximately 2N where N is the bit width usually is sufficient).”

If you use a Random Oracle Model, that will most certainly hold for the whole output (but will indeed be meaningless on bit level).

If you do use a ROM, your explanation becomes meaningless as a ROM has no internals.

Clive Robinson July 8, 2021 8:22 AM

@ Fake,

Clive you make it sound as if a shaking mixing or stirring function is an additive xor.

There is a reason the XOR function is called “the half adder” or “add without carry” or most importantly in this case “Vector add over GF(2)”.

Assume your hash output is a “vector of bits” that are idealy “independent of each other” the vector defines a singular point in an infinite N dimensional space.

But each infinite space can using an appropriate mapping function be mapped onto a finite space. When such a finite space has certain properties the normal rules of add subtract, invertion and division apply. That is they form a “field”. Now, due to the nature of integers specifically primes under the mod rules a “field of reals” will map onto a “field of ints that is of a prime order”.

The smallest prime is 2 therefor the smallest such field is of order 2. Which is by happy happenstance representavle by a “binary bit”.

Thus a binary number of N bits in size is also a “vector” in N dimensional space.

You get told a field has two operators (+) and (x) which are the analogues of addition + and multiplication x. The XOR is (+) and the AND is (x). Interestingly neither of them generates a carry.

So yes (+) does a lot of work in crypto in part because (x) leaks information about it’s inputs.

Oh one last point to renember, “a point in any space carries no meaning” to have meaning it has to be with refrence to another point in the same space. Often one point is picked as a refrence point or origin and it’s frequently set to the addative identity value in each dimension as this makes vector addition considerably simpler.

Clive Robinson July 8, 2021 8:36 AM

@ Fake,

I was wondering if you guys didn’t just have a diagram for a serial port wire + crystal + diode or something simple.

I’ve many and they are all relatively trivial to build.

However the suffer from all sorts of issues that makes their use for “high quality work” problematical.

I’ve mentioned on this blog before some of the work you need to do (a lot) to protect them, and how easily (very) they are attacked by an adversary remotely before. As well as the work you need to do (a lot) to spot if the system has broken or is under attack by an adversary.

Whilst I used to design such systems back last century they realy are not worth the effort these days nor do people like paying the money for what is to them “A noise diode, amplifier and slicer in an expensive box” (which they actually are not). Especially when they appear to be “given away for free” on CPU chips (remember there is no free lunch you pay one way or another).

Clive Robinson July 8, 2021 8:48 AM

@ Winter,

If you do use a ROM, your explanation becomes meaningless as a ROM has no internals.

What does the term “iteratively” mean and what is the consequence of that?

MarkH July 8, 2021 9:37 AM

@Clive:

There seems to be a radical disconnect between what I’m discussing and what you’re discussing, and I haven’t yet worked out where the crevasse lies …

You wrote,

You have a file of data that far exceeds the bit width of the hash. You have three choices,

1, Just arbitrarily select one block from the file and hash that.
2, Output one hash for each block of data from the file.
3, Chain either the file or hash output to reduce down the file to the hash size.

Well, the correct way to use a hash function for Freezing_in_Brazil’s application is option 4:

4, Hash the entire file to reduce it down to the hash size.

A “file of data that far exceeds the bit width of the hash” is the normal use case for a hash function.

Surely we can agree on that!

Some hash functions have no defined limit on input file size; the FIPS for SHA256 specifies a limit of more than two million terabytes.

To obtain the digest of a few hundred bytes, or a few thousand, I’ve no need to think about “blocks.” I hash the entire data set. That’s exactly what hash functions are designed to do.

Clive Robinson July 8, 2021 10:13 AM

@ MarkH,

4, Hash the entire file to reduce it down to the hash size.

Is the same as 3, when you think about it.

Some hash functions have no defined limit on input file size; the FIPS for SHA256 specifies a limit of more than two million terabytes.

You are mistaking the size of the input to the hash function (256bits) with the number of times you can iteratively apply the hash fuction befor it inherently starts to form a recognisable cycle.

As I’ve said the hash function is a mapping function, that you THEN use it in some kind of “mode” that is a “chain function”.

Think AES in “Electronic Code Book”(ECB) mode is the same as a hash function hashing individual blocks of the file, it’s just a simple substitution cipher with all the issues that carries, hence the need for a “chaining mode” around the hash function.

So instead think of AES in “Cipher Block Chaining”(CBC) mode where the output of the mapping function is fed back and XORed with the next block of plain text. If the plain text does not change the ciphertext certainly does, but will eventually start to cycle. Importantly though any change in the file gets propergated along the chain (except in a very very unlikely event).

Replace AES with a hash function and you only need to use the last block to detect any changes in the file.

You can likewise chain just the file blocks and finally hash the last one but that is not recommend for a number of reasons.

MarkH July 8, 2021 10:36 AM

@Clive:

The argument concerning how hash output bits change is an almost literal example of missing the forest for the trees.

The bits of a hash have no individual meaning. That is by design! They aren’t supposed to mean anything.

The appropriate way to think of hashes in this context, is as arbitrary codes.

Here’s an example, with hash values in hexadecimal. The hash of a 320-bit file is

97993a6abf5957b2b907afb4b8cd6b82.

When I toggle the bit at offset 1, the hash changes to

d8bd901ca174cabd0454528b9d284d8a.

Remember, these should be thought of as arbitrary codes. No particular bit, or group of bits, should be understood as conveying any meaning.

If the other 319 bits are fixed, and bit 1 is unpredictable, then the hash must be one of the two foregoing values.

Very many of the bits are different, as expected with a good hash function. But there is exactly one bit of entropy, corresponding to which of these two arbitrary codes was generated.

If I instead toggle the bit at offset 241, the hash is

052ac3c8f8d06aa6082054cec2122d9b

and if I toggle both bits 1 and 241, the hash is

846d6e08b76f5b46362c3d55cf89f51c.

If the other 318 bits of the file are fixed (completely predictable, and thus having no entropy), then the hash must be one of the four codes shown above.

It happens that the hash function is 128 bits wide, and that many bits change between the four codes. But none of that conveys any meaning.

If the bits at offsets 1 and 241 are completely unpredictable (all four combinations equally probable), the hash carries 2 bits of entropy.

There are four equally probable codes. Their size, format, bit patterns, etc etc mean NOTHING WHATSOEVER.

The hash result has 2 bits of entropy, not more and not less.

For the purpose of entropy concentration, the importance of the avalanche effect in such a hash is that it contributes to an extremely low probability of collision (different file contents yielding the same hash). Apart from that, what the individual hash bits are doing is irrelevant.

Winter July 8, 2021 10:37 AM

@Clive
“As I’ve said the hash function is a mapping function, that you THEN use it in some kind of “mode” that is a “chain function”.”

I do not understand why the chaining is relevant. What is important is that the probability of collisions is as expected and the outcome is not predictable.

The chaining can be a problem because Hash(Hash(X) XOR Y) is identical to Hash(X+Y) (concatenation) if X is a multiple of the block size. However, that is easy to correct for. It is utterly irrelevant for whether Hash(X+Y) is a good hash mapping as that is still unpredictable knowing X, Y, and Hash(X).

MarkH July 8, 2021 10:55 AM

@Clive:

I see we’re still on different planets …

If I want 256 unpredictable bits, I accumulate data in my “pool file” until I estimate that the total entropy is at least 256 bits.

I then execute hash function SHA256 exactly once.

One time.

Not zero times, not two times, not 1.5 times.

EXACTLY ONCE

Why would I want to iterate it?

Fake July 8, 2021 11:18 AM

Yeah I would avoid iterating as that might cause EM leaks.

Slow processes and certain operands might leak more EM too.

You sure you want to mix your entropy?

Separately from the process using it?

Clive Robinson July 8, 2021 11:19 AM

@ MarkH,

The argument concerning how hash output bits change is an almost literal example of missing the forest for the trees.

Sadly the boot is on rhe other foot.

The bits of a hash have no individual meaning.

Far from true, which is why,

The appropriate way to think of hashes in this context, is as arbitrary codes.

Is not true either.

You remember that picture of Tux the Linux Penguin that gets encrypted in ECB mode? Where you can clearly see Tux’s outlines?

Well it clearly makes a nonsense of,

Remember, these should be thought of as arbitrary codes. No particular bit, or group of bits, should be understood as conveying any meaning.

Well that’s just one very obvious reason of many as to why your thinking is wrong.

@ Winter,

I do not understand why the chaining is relevant. What is important is that the probability of collisions is as expected and the outcome is not predictable.

The chaining is relevant because,

1, The problem I mention with the Tux image if you use the hash function in that way (which many do with entropy pools and to hide deficiencies in their TRNG sources).

2, If you do not chain when using a hash in it’s more normal mode then any changes in a given block will not propagate down to the output, thus the purpose of the hash to detect illicit changes anywhere in the file will fail.

Do you get it now?

Winter July 8, 2021 12:05 PM

@Clive
“You remember that picture of Tux the Linux Penguin that gets encrypted in ECB mode?”

I do not see why ECB mode is relevant here. Did anyone here argued to use it to get a hash value?

MarkH July 8, 2021 12:06 PM

@Freezing_in_Brazil:

Unfortunately, the present discussion is likely to sow confusion.

The short version is, cryptosystems have used hash functions to concentrate entropy from partially-predictable sources for decades.

It works.

The one hazard is, that if something goes wrong with the source data and its entropy becomes much lower, the hash will obscure that. So it’s important to monitor the pre-hash data and ensure that the entropy sources are working as expected.

MarkH July 8, 2021 12:17 PM

@Clive:

With great respect, I have been wondering whether your recent comments conflated hash functions with block ciphers. That would make sense of this stuff about blocks and chaining.

The example of the famous Tux image seems to confirm my speculation. Had the image file been put through a hash function, it would have yielded a much shorter bit string, probably not resembling a penguin!

========================

Imagine that instead of a bit string, a more exotic form of “hash” deterministically outputs colors, tones of fixed pitch, or names of your family members as a function of the input file.

If the hash of a given variable input is sure to yield one of only four colors, or musical notes, or relatives … then that hash result has exactly two bits of entropy.

The format of the hash result doesn’t enter into the question of entropy. The number of equiprobable results (as determined by the input file) corresponds to the entropy. The essential requirement for the hash function is that it minimize the probability of output collisions.

Weather July 8, 2021 1:38 PM

@all regarding Hash’s
If you have a input of 6 bytes and you then loop the output to input it register 32 bytes but the selection is still 33 bytes.
2^128 from sha256 because like Clive was trying to say 0010 hashed with 25k bytes equals 0110 the 10 is still there or it could be 0100

Clive Robinson July 8, 2021 2:28 PM

@ Winter, MarkH,

I do not see why ECB mode is relevant here. Did anyone here argued to use it to get a hash value?

The problem is,

1, You have a “secret” entropy pool
2, Some users need thousands of bytes of entropy very quickly.

This is considered “normal behaviour” in most *nix machines hence the dev/urandom and dev/random.

If you “drain the pool” and it effectively becomes static then one of two things has to happen,

1, The system crawls to a virtual stop whilst more entropy is built up.
2, You keep on running the output with no real entropy.

If it’s case 2 and during the likes of boot etc it is kind of what you want to do.

If the hash mapping function is used in ECB you are going to get the same value over and over. If however you use CBC it’s not obvious because of the cipher chaining.

At least your fall back is a CS-PNG rather than a crappy substitution cipher with the same very very very few substitutions.

You and @MarkH are both making some real rookie mistakes, that makes it clear you’ve “read a bit” but you’ve not actually designed and built and tested such systems.

But don’t worry you are not alone, those with longer memories can remember Linus and his comments about random sources, and the apology he ended up making…

Any way enough, I keep batting your arguments away with solid answers and to be honest you are not paying me consulting rates…

MarkH July 8, 2021 4:21 PM

@Freezing_in_Brazil, Winter:

If you want to “get into the weeds,” the U.S. NIST has published (in 3 parts) a special publication 800-90 with recommendations for deterministic random number generators.

[Note: Freezing_in_Brazil was asking about non-deterministic generation, but the NIST publication has some relevant information.]

It endorses various FIPS hash functions (including SHA256) for the purpose of concentrating entropy — in their terminology, they call it a conditioning function.

They actually recommend twice as many bits of entropy as desired for the input. This is extremely conservative: the target on which this is based is 0.99999999999999999995 bits of entropy per bit of output!

If you’re generating keys for a block cipher, 0.99 bits of entropy per bit of output is plenty. If the mean cost of exhaustive search is reduced from 2^127 to 2^126 or even 2^125, that’s of no use to an attacker.

Weather July 8, 2021 5:39 PM

@markh
You might not be talking to me but with sha256 a input of 1-10 produce a min group of 33 possible chars out of 255 ,sure 20-32 produce 140 possible chars, but add to the fact 95% of the time if its 10-20 chars long you can rule out 1-9 char input.
Telepathy had a sha256 hash that was between 1-10 I’m running the program against it, 7 char is doable on my Hw..
And when you post a predictable source you didn’t mention the range, it could be two, the length has nothing to do it, you can’t add entropy only lower it from the step before.

Clive Robinson July 8, 2021 5:44 PM

@ Freezing_in_Brazil

The thing you need to be aware of is that there are “no tests for entropy” the only tests there are, realy only find bad random generators and even then they generally only find realy realy bad generators…

But the NIST docs are not where I’d start (in fact I’d give them a miss unless you have a specific need to be familiar with certain parts).

The one I’d start wirh is from the German BSI it’s a relatively short document that is easier to get into than the others three to four times it’s length,

https://cosec.bit.uni-bonn.de/fileadmin/user_upload/teaching/15ss/15ss-taoc/01_AIS31_Functionality_classes_for_random_number_generators.pdf

Also you might want a look at ISO 18031 it’s currently in draft of it’s third revision,

https://www.iso.org/standard/81645.html

Then there are NISTs three monumental and some say sanctimonious tombs, of which the third NIST 800-90C is the one probably of some interest, but it’s been stuck in draft for nearly a decade for some reason,

https://csrc.nist.gov/publications/detail/sp/800-90c/draft

From a practical perspective, our host has written a couple of papers and books that are quite easy to get into on the subject.

Also Ross J.Anderson’s Security Engineering book has been a good read.

As almost always though it’s now more than two decades since it’s last edition Menezes et al Handbook of applied cryptography can be downloaded from,

http://cacr.uwaterloo.ca/hac/

However if it’s the actuall “physical sources” that are not Quantum Mechanical in nature, you want to learn about. Of what there has been written, much has been written from a semi-theoretical view point that in the real practical world does not stand up for a multitude of reason or is now so far out of date it’s like reading up on beam enginess to find out how jet enginees work.

As a more apt example, if you read up on micro-fluctuations inside mechanical hard drives, it’s not going to be a lot of help with solid state drives… Sources of Human entropy via the HCI is since USB not as much help as you would think (unless you happen to be a gamer who has USB delay etc horror stories to tell).

PC internal hardware is becoming of less and less use for entropy gathering, and lets just say many a chip based TRNG is not worth the paper it was scratched out on.

SpaceLifeForm July 8, 2021 5:55 PM

@ Mark, Freezing_in_Brazil, Winter, Fake, Clive

As I have done before, I was not clear.

But, it led to good discusion. Which probably needs to happen every year when it comes to entropy and random.

When I said you cannot create entropy out of thin air, what I meant is that you cannot increase entropy via a hash function.

If that was possible, everyone would do that. Right?

But, you can certainly lose entropy.

That said, you can actually generate entropy from thin air. Ok, maybe not that thin.

What Fake described is similiar to my design.

Light and turbulence.

A transparent enclosed device. Cameras watching, collecting hopefully random bits.

Lights projecting into the inside.

Inside, is a mister, a pump, and one or more circulating fans. The fans disrupt the flow of the droplets.

The photons reflecting or refracting via the droplets vary.

I think this can generate a lot of random very quickly. Way faster than a set of lava lamps.

If that can not generate random, then I can only conclude that the universe is predetermined.

Fake July 8, 2021 6:24 PM

@slf,

Mister, your mister is not too bad an ideal.

You have an enclosed space, depending on how you sample the image I’m not sure an enclosed space is honestly required. 2 cameras can not occupy the same point in space and time, an oscillating fan with tinsel facing a camera with a disco ball behind it or above it is the way I was thinking.

Audio in my mind would be very vulnerable, likely a good omnidirectional sensor but possibly not a good entropy source due to interference possibilities.

Cameras can be gamed too but it should be pretty straight forward especially with ML style ideas to determine if it’s being games somehow we should just have to take care to sample the format correctly.

Also, some years ago there was a thread here about the uniqueness of cameras being like fingerprints potentially giving camera entropy usage an additional layer of authentication. I’ve been considering using high resolution pictures of stone aggregate for authentication purposes.

You guys would get bored quickly though verifying each new angle was the same 30 grey rocks.

I wouldn’t want your mister to clog, I wouldn’t want my fan to die… A low volume water sprinkler source against a hydrophobic surface is really not a bad idea… How to increase prism effects though… Fresnel could be dangerous but would encourage evap.

I’m a noob though guys/gals don’t take anything I say as evaluated.

SpaceLifeForm July 8, 2021 7:04 PM

@ Fake

Purely a thought experiment. I have not built this Rube Goldberg device.

Yes, the clogging of pump is an issue. Mechanical wear of moving parts also.

Your idea is better in that regard, as long as there are enough dancers (the fans) to disrupt the air flow.

And the ceiling is not too high.

MarkH July 8, 2021 7:08 PM

@SpaceLifeForm:

I’m dazzled by the warnings that hash functions can’t add or create entropy …

… when I didn’t see any such claim on this thread!

Reading comprehension would seem to be a lost art.

Freezing_in_Brazil made a reasonable and (as it seemed to me) ultra crystal clear proposal: accumulate data with a modest amount of entropy, and then “boil it down.”

As I observed, hash functions have served this purpose for decades. They don’t create entropy: they concentrate it, enabling asymptomatic approach to one bit of entropy per bit of hash.

Fake July 8, 2021 7:12 PM

The deniable version of such a rude device would be to keep colorful fish and train a camera on it near the pump/filter.

Fake July 8, 2021 7:36 PM

I still think concentration is the wrong word for that, one certainly has to concentrate to reverse a hash… It, as far as we know more is seemingly impossible or at least outrageously difficult.

But I believe a hash function dilutes entropy, it mixes your ingredients. A single bit tacked onto or into an individual file is not something that makes me comfortable as an example… I don’t care if that one process ran 57 times internally before it output the resulting reinterpreted hash.

Say we just hashed file a, the computer will emit a perpendicular logic in the electromagnetic spectrum of your source how many times during a pass of which sbox? Then, you find a single bit that you trust… Add it into a file and perform the same operation 57 more times because that’s how computer logic and logs work until your data has been processed a second time.

I could care less how different the hash is, the logic was the same and the entropy you just diluted into the main operation was only a single bit that when multiplied rotated xorred and or added into your file resulted in the appearance of a different result. That’s great for static analysis, how’s that stack up against a replay?

Are antennas that sensitive? 😁

I tried to stay out of this discussion part of the conversation but just because it’s fine and dandy for some doesn’t mean it’s fine and dandy for everyone.

We all have different threat models, for instance it’s my believe that using an audio or video file as a key file is a bad idea.

Great care should be taken to ensure that one samples the data and not the structure.

A vbr file may be stronger than a consistently sampled one, also compression formats affect the qualify and sampling. I wanted to let the professionals talk about how using text files would be bad… ASCII… Unicode… They all have lower entropy than some people might assume.

These are just things I feel are important to include.

MarkH July 8, 2021 9:05 PM

@Fake:

Don’t like “concentration”?

How about:

“enrichment” (nuclear isotopes)

“distillation” (grain alcohol)

“refining” (petroleum fractions)

“boiling down”/”evaporation” (maple syrup)

Probably you can find many more.

They all have the same purpose: to increase the concentration of a desired ingredient in the process output.

SpaceLifeForm July 8, 2021 11:18 PM

@ MarkH, Winter, Clive, Freezing_in_Brazil, Fake

Fixing the hash you meant, my bold.

If you collect 25 K bytes from such a source — almost all of which are predictable — and compute the SHA256 hash of that file, its output will have 256 bits of entropy. In other words, the result will have the maximum possible of one bit of entropy per bit of output. The 25,344 predictable bits don’t detract from that!

I may have misread your point, so apologies.

Like I said, good luck pulling 25K bytes from /dev/random because you will be waiting a long time.

But, consider this scenario.

You pull 25K bytes from /dev/urandom (almost all of which are predictable), because it is not really random. And then you hash that.

The question is: Are we sure that CSPRNG and the various hash functions actually do not degrade horribly when it comes to random? That is, even if you believe that there really is 256 bits of entropy in the 25K bytes, are you really sure that the hash function has not wiped out that entropy?

How do you know that the CSPRNG algorithm and the hash algorithm are not conspiring against you?

If one wants true random, you better roll your dice. Do not trust hardware or software.

Winter July 9, 2021 12:36 AM

@Clive
“1, You have a “secret” entropy pool
2, Some users need thousands of bytes of entropy very quickly.”

I do not think anyone here will contest this. 1) disqualifies any manual method.

If you do not have access to a quantum source, e.g., radioactive materials, you need to get another physical source with known entropy characteristics. /dev/random uses human behavior and other unpredictable interrupts in the computer.

Other physical sources of known entropy are:
– Shot noise
– Cosmic background radiation
– Video of leaves in a forest on a windy day
– Audio of busy traffic
– Audio of fan noise

All of these contain a lot of unpredictable variation embedded in predictable patterns. But that is also true of /dev/random. Using a hash function to concentrate this entropy is not different from adding it to a PRNG as /dev/random does.

Just like with /dev/random, it is critically important to keep track of how much entropy is actually available in your pool. But, in principle, any source of entropy can be fed into the pool, even the sound of a detuned radio set on an unused frequency.

And running a hash function on it is really not different from what /dev/random does.

MarkH July 9, 2021 2:28 AM

@SpaceLifeForm:

Freezing_in_Brazil wrote “sensor data”.

Not /dev/random, nor any other OS random or pseudo-random facility.

In terms of what Freezing proposed, and I responded to, reading /dev/random makes no sense. Neither does “chaining hashes”, whatever that meant!

I’m sure it’s human nature: when presented with information, we “see” 80% our own imaginations, and 20% what’s in front of us (on a good day!).

Again, it’s the parable of the blind men and the elephant.

If one wishes to use physical sensor data, the classic problem is that it is nearly always some mixture of predictable and unpredictable data.

I offered a tool for handling that problem.

Bong-Smoking Primitive Monkey-Brained Spook July 9, 2021 2:33 AM

Turing, Kurt Godel, Quantum Mechanics, Prime numbers, and a bunch of other topics discussed here are summarized and correlated in these amazing 36 minutes!. And on the topic of entropy: another perspective and a contiuation. I still struggle with imagining the cube spin at 9:30 — I’ve got to try it some day.

Clive Robinson July 9, 2021 2:38 AM

@ MarkH,

… when I didn’t see any such claim on this thread!

As you very well know that is not what I objected to, so a strawman.

Reading comprehension would seem to be a lost art.

So it would appear in you case is telling the truth.

Are you having another one of your fantasies about how others must be suffering from dementia or male menapause because they don’t agree with your disprovable notions?

What I objected to and you have gone around in circles trying to avoid answering is,

If you collect 25 K bytes from such a source — almost all of which are predictable — and compute the [SHA256] hash of that file, its output will have 256 bits of entropy. In other words, the result will have the maximum possible of one bit of entropy per bit of output. The 25,344 predictable bits don’t detract from that!

Both statments in the highlighted section are wrong not some hand wavingly brush it out of the way wrong.

Just wrong definitively wrong, and because of that your entire thinking about hashes and entropy gets derailed and you start talking about “concentrating” and such like which is again wrong as I’ve made clear.

I suggest you start by understanding what a bijective mapping is and why it can be split into two sub mapings, one that gives you linear algebraic complexity and the other various forms of nonlinear complexity. And what the implications are of a random mapping with an unbiased flat distribution.

When you understand that, then you can move forward to say balanced surjective compression mappings and how you can build them from bijective mappings.

But untill you get past that first step then you are stuck not in an ivory tower, but a tower of bable, of your own making and trying to hand wave it a way will not help you.

Fake July 9, 2021 2:40 AM

It’s still good to have this open conversation so anyone new here could see the intricacies involved, personally I’m still curious about a small detuned rs232 crystal and loop.

That level of sampling would avoid file structure repetitions should be pretty easy to verify with analysis and it could be left potentially adjustable readily.

Winter July 9, 2021 3:42 AM

@Clive
“Just wrong definitively wrong, and because of that your entire thinking about hashes and entropy gets derailed and you start talking about “concentrating” and such like which is again wrong as I’ve made clear.”

How is the scheme of MarkH different from how /dev/random works? That too gathers entropy from imperfect sources and then injects it into a seeded PRNG to generate “concentrated” randomness.

I cannot see why the scheme of MarkH is wrong, but /dev/random is right?

Clive Robinson July 9, 2021 5:19 AM

@ Winter, Denton Scratch,

Other physical sources of known entropy are…

The basic definition for “True” in TRNG is that the entropy can only come from a physical source.

After that all things start to get a bit complicated as the implication is a Shannon Channel and all that implies in terms of signal bandwidth, antenuation, distortion, added noise, etc, as well as the basic channel transducer charecteristics of offset, dynamic range and nonlinearity.

Thus messy, messy, messy, and a lot of real nasties hide out in there such as cross modulation and other power law issues as well as all the effects of “sampling” and how they revolve around the sampling frequency.

Which brings us to,

All of these contain a lot of unpredictable variation embedded in predictable patterns.

Actually you need to think of it in more than just predictable or unpredictable. All you are realy saying is we’ve got a point on a line to the left is predictable to the right is unpredictable. Which is a lasy way to think of it… The line is a spectrum thus it changes from fully predictable to compleatly unpredictable with a large range between the two. Worse these spectrums are “multi-dimensional” and have many domains. Thus you have simple “DC” or “AC RMS” measurments with a voltmeter that can sort of be considered fixed. Then you have “AC” measurments where you can see amplitude changes against time with an oscilloscope etc. But you also have the frequency domain that you see almost like an AC RMS voltmeter against frequency on a Spectrum Analyzer, with the equivalent of an osciliscope against frequency with a “waterfall display”. Most engineers untill fairly recently stoped at that. But those domains keep piling up in Hilbert Space… Pick any two or more dimensions and you get a spectrum of some form or a plane or manifold. One such is seen through the Walsh-Hammarad transform and it used to be called “sequency space” and it’s a fun place to play if you are dealing with analysing the complexity of linear logic.

These days people tend to talk of the multitude of domains involving “wavelets” these too can be used to analyze not just logic but analogue signals as well and can have some very desirable charecteristics, that neither the frequency or sequency domains have.

The point being there is now a very rich tool set to analyze not just the output of the physical source, but also the digital output of the entire TRNG. Oddly though there is not currently much research in the open community in this area, though due to “recruiting” information by the Five-Eyes Sigint agencies it can be surmised they do have a significant interest in these areas.

Which is kind of a concern when you realise that the security of all privacy systems rests on this area.

Which brings us onto,

it is critically important to keep track of how much entropy is actually available in your pool. But, in principle, any source of entropy can be fed into the pool,

On the “noise spectrum” you have the purely “determanistic and predictable” through to “non determanistic and non predictable” the former being considered “interference” the latter “entropy” in between you have an area that is in effect “determanistic and non predictable” which is often called “chaotic”.

Thus the “entropy” in your second sentance should be “noise”.

But it’s also important to realise you can not measure “entropy” like you would a kilogram of sugar etc for a cake recipe thus it’s impossible to say how much entropy you do or do not have in your pool.

As like as not you will have a lot lot less real entropy than you think because, what you actually have is a little bit of real entropy and one heck of a lot of chaos.

The thing about a chaotic processes are they are effectively determanistic and start being highly predictable, and over some time function become less and less predictable. Which is a real problem as they are often cyclic or have an otherwise highly predictable pattern.

Thus all an attacker needs to do is find some kind of discriminant that is moderatly easy to find in one domain but not others, and they can synchronize to the chaoitic process.

For instance if you look at a signal on an oscilloscope you might see a complex wave form. But can you see a very small sinusoidal signal -80dB down (1/10,000th) or even -140db down (1/10,000,000th) that is phase reverse modulated? Simple answer no but with a sepectrum analyser or well designed receiver working in the frequency domain yes you can. You can also detect that phase reversing mosulation and thus any data it might contain, which could be an accidental or deliberate discriminat, giving you a synchronizing signal.

With that signal we know from the 1990’s and Differential Power Analysis (DPA) all determanistic algorithms can be stripped away and the desired “secret data” revealed.

So a skilled attacker has way more chance of seeing what you think of as entropy as chaotic with a very high degree of predictability. Which massively reduces any potential search space using “backwards and forwards” algorithms.

Most “ring oscillators” are very chaotic and have little if any real entropy. The trick of combining two ring oscillators of different frequencies using a digital “latch” might produce what looks “close in” on an oscilloscope as a very unpredictable signal. The reality could not be further from the truth. When you open out the oscilloscope time base on an analog scope, you see the transitions form bunches that alternate smoothly from dense to sparse and back again on a very precise time scale. The reason is the output from the latch is in fact a one bit digitisation of the waveform you would get out of a doubly balanced analogue frequency mixer. In effect a very pure sinewave at the difference frequency between the two ring oscillator frequencies predominates. Highly predictable and easily removable.

Thus with knowing about DPA and Ring oscillators, you can start to understand why I’ve next to zero trust in the likes of the Intel on chip TRNG’s.

It’s also why it is important to use Hash functions in their “chained” or “iterative” modes, not as many programers mistakenly do in their “one shot” mode usually used for generating a random number.

Hope that gives you a little mor insight into the murky swamps that suround not just entropt pools but many things associated with,

“The care and feeding of TRNG’s”

Which might make for a good paper title 😉

Winter July 9, 2021 6:03 AM

@Clive
“As like as not you will have a lot lot less real entropy than you think because, what you actually have is a little bit of real entropy and one heck of a lot of chaos.”

Air turbulence is chaos, but for all human purposes, it is entropy. Entropy is the result of missing state information, see Maxwell’s little daemons. For physical systems, chaos or not, this state information cannot be measured nor reconstructed so it is entropy, pure and simple.

Anyhow, you have just tied to argue that /dev/random does not work, because this is just how /dev/random works. In my opinion, if it is good enough for /dev/random, it is good enough for me.

Clive Robinson July 9, 2021 7:14 AM

@ Winter,

I’ve not argued how /dev/random, and dev/urandom work, only that they appear on different *nixs and they are not all the same…

Entropy like information is what is politely called an “emergant property” and I’ve occasionaly called “data shadows”.

A point in space conveys no information what so ever, likewise a second point in space. What energes however is a relationship between the points, both information and entropy are the relationship not the points or objects that might define the relationship in an actual physical space.

It’s why I say “information is impressed or modulated on energy or matter, but is not the matter or energy”. Thus the question of if forces or the speed of light apply to information rather than the matter/energy it might be impressed upon.

But the relationship becomes more peculiar, because unless the observer is at one of the two points the relationship is dependent on where it is observed from.

Now as I’ve indicated whilst a space can be considered to be multiple dimensions of infinite infinities, they can be in effect compressed by a trasform to a field that only has two values GF(2).

Further that each field dimension is a scalar in a vector.

Thus for any binary number there is a fixed number of scalar “fields” in the vector that is the binary number. This means that there is only a limited amount of entropy or information attainable with a binary number. Add another dimension (bit) and the potential information capacity doubles as does the raw entropy possabilities.

Maxwell’s little demons are only a problem for an observer not colocated at the door, the behaviour of the demons is fully determanistic[1].

With regards,

Air turbulence is chaos, but for all human purposes, it is entropy.

No it’s not and has been shown to be not in a very real world experiment that you can repeate at home if you are a little inventive.

Take an iron or other tube that easily conducts heat. Put it in a rig where it can be spun and look through the tube. At ambiant you see no distortions, however heat the tube up you see distortions due to the chaotic air molecule movment change in density of thermal movment against gravity.

Alow the tube to cool down to ambient then repeate this time spining the tube. You will find that the chaotic movment of the air now forms a very good lense you can use in a telescope.

Thus it can easily be seen that in the short term chaotic is very amenable to use. It’s only with what is in effect a time function it becomes less amenable to use.

You say,

For physical systems, chaos or not, this state information cannot be measured nor reconstructed so it is entropy, pure and simple.

What evidence for that statment do you offer?

None.

However how about contra evidence,

There was that “coin tossing” machine that was designed to remove chaotic physical attributes… The problem the coin stopped showing random behaviour by a very very long way.

Thus we know and have proof that chaos in physical systems can be removed or reduced to a point that makes real entropy appear way to small to even arive at statistically as an emergant property…

Likewise we know and have proof that chaos can be detected and synchronized to thus it’s effects removed.

Care,to state the logical implications of those last two pieces of information?

[1] The argument and proof is a little long to give here but Seth Lloyd gives it in his book if you want to look it up.

Winter July 9, 2021 8:08 AM

@Clive
“Entropy like information is what is politely called an “emergant property” and I’ve occasionaly called “data shadows”.”

That is a different entropy than what is commonly used.

Entropy is a fundamental property of thermodynamic systems and the second law of thermodynamics, the one about entropy, is a fundamental law of nature. When you talk about the entropy of physical systems, this is a measurable fundamental property of that system.

The resolution of Maxwell’s demon paradox showed that information, in terms of bits, are as much part of physical entropy as are “temperature” and “work” in physics.

Calling entropy “emergent” is not making it less of a fundamental property of nature.

Winter July 9, 2021 8:13 AM

@Clive
“What evidence for that statment do you offer?”

I would like to point you to introductory texts on statistical mechanics and thermodynamics.

Heated air in a tube is “chaotic” in the sense that the individual molecules and small patches of air will move chaotically. It is pure entropy as long as we are unable to record and describe all the state parameters of all the patches of air. The air in that tube will behave according to statistical mechanics and thermodynamics.

In practice, if you start recording the pressure waves (sound) that move around the tube, or count the infra-red photons coming out of the tube, you will have perfect, thermal noise which is as unpredictable as is possible.

Winter July 9, 2021 8:16 AM

@Clive
“Maxwell’s little demons are only a problem for an observer not colocated at the door, the behaviour of the demons is fully determanistic[1].”

Indeed, and it was shown to perfectly obey the laws of thermodynamics as soon as you wipe its memory of past events.

Clive Robinson July 9, 2021 9:57 AM

@ Winter,

Entropy is a fundamental property of thermodynamic systems and the second law of thermodynamics,

Err that’s the historical placing of entropy,

You nave von Neumann Entropy which is more fundemental than that, and you also have Shannon entropy which is what we are talking about.

Von Neumann indicated that he thought Shannon entropy was as a result of his entropy, others however think Shannon entropy is entirly independent of the various physical system entropies.

But all of them measure an emergent property of multiple particals or possabilities using the natural logarithm.

As an emergant property it’s not something singular or that you can reach out and put your hand around and say “this is what I hold”.

It’s why you can not realy say,

When you talk about the entropy of physical systems, this is a measurable fundamental property of that system.

As it’s a measurment of relationships any measurment you make as I’ve already indicated is “relative” and technically it’s not fundamental as it’s a measure of something, that it is obviously dependent upon somethings. But interestingly what are you measuring? Well call it a statistical average that is there is some mean, that again changes depending on where the point of the observer is. You could call it the RMS of the n-dimenshional space points with respect to some arbitrary origin. Which makes it a rather quaint measure.

But, as I’ve already noted a chaotic process starts out determanistic and as it follows a function that can be tied to time it becomes progressively less determanistic.

You indicate only part of that with,

Heated air in a tube is “chaotic” in the sense that the individual molecules and small patches of air will move chaotically. It is pure entropy as long as we are unable to record and describe all the state parameters of all the patches of air.

What you’ve failed to mention is the spinning of the tube acts on the chaotic process whilst it is still determanistic thus you get a determanistic outcome from the action. That is the chaotic component does not emerge and certainly does not become dominent.

With regards,

Indeed, and it was shown to perfectly obey the laws of thermodynamics as soon as you wipe its memory of past events.

All you are saying is that exhibits certain random statistics when you sample independently of all other samples. That is an artifact of the measurment process made by the observer not the process being measured which remains unchanged. It’s part of what underlys the “hidden information” argument which is still something that gets people hot under the collar after a century.

In meterology there is a statment about statistical measures which is “The square root of bugger all” which gets aptly used from time to time. The implication being it does not matter if what you measure averages zero, or is in fact zero all along, the outcome is the same which is not the same as the inputs being the same. It’s just one reason you need to take care about “cause and effect” and even why “times arrow” exists, but that’s a long conversation for anorher day.

If @Wael is reading along he will probably confirm that it is something that has been discussed here some time in the past.

MarkH July 9, 2021 10:23 AM

@Freezing_in_Brazil, Clive, Winter:

Clive has properly reproached my failure to correct my error above.

While hash functions do in fact concentrate the entropy of an input file, the entropy in the hash cannot exceed that of the input file, and in practice falls short of the source entropy by a gap which increases as the source entropy increases.

It’s easy to see that a bit can convey at most one bit of entropy, so if the input file has much more entropy than the width of the hash, most of that entropy would be lost by a kind of “truncation”.

But even for much lower levels of source entropy, some will be lost in hashing because of collisions. For two unpredictably variable and equally probable states of the input file, that specific variability corresponds to one bit of entropy. If a hash function maps both of those states to the same output value, then that bit of entropy is lost from the hash.

The hash entropy can never equal one bit of entropy per bit of output, but can approach this asymptotically.

So, in the example of SHA256, the input file needs more than 256 bits of entropy to get nearly 256 bits of hash entropy.

As I wrote farther down, the NIST recommendation is to accumulate twice the width of the hash in order to “saturate” the entropy of the output.

Freezing_in_Brazil July 9, 2021 11:10 AM

@ MarkH [long time no see!], Clive Robinson, SpaceLifeForm, Weather et al

I’m out of town until Sunday, under adverse connection conditions, so I’m sorry for the delay in answering

I am delighted with the discussion that was generated. We ended up messing with algorithms, which is natural. I wondered about some hypothetical physical component that could capture the entropy of sources inside the machine [notably temperature, but I think images and sounds could also participate in this pool] – since using the date to generate PRN is so notoriously unsafe – and deliver that information in a custom, universal format. With a component like this I imagine that [a new library and a new] command could be issued like:

/dev/entropy | | sha256sum | base64 | head -c 32 ; echo

I’ll be commenting more, once I get a grip of it all. Thanks for all the fish.

Kind Regards

Winter July 9, 2021 12:29 PM

@Clive
“All you are saying is that exhibits certain random statistics when you sample independently of all other samples.”

That is statistical mechanics and together with thermodynamics it governs all use of energy. If you design an engine or battery charger, that is what you use to predict it’s performance.

As for “emergent” systems, theoretical physicists consider gravity emergent, and they use von Neumann entropy to describe it.

SpaceLifeForm July 9, 2021 4:32 PM

@ MarkH, Winter, Clive, Freezing_in_Brazil, Fake

I posit this analogy, aka thought experiment.

You have a shaker, that you dump a ball out of, one at a time. The shaker is opaque, it is your entropy pool. It is a black box.

There are two players. The dumper that shakes the shaker, and the injector, that adds balls to the entropy pool as required.

The balls are red or green. Initially, we load the shaker with 50 red balls and 50 green balls.

The max entropy possible. You can shake the shaker as much as you want, but you will not change the entropy inside the blackbox.

The process in this. The dumper dumps out a ball (red/green, one/zero), you now have a random bit. It has been observed.

You also just changed the entropy inside the shaker, aka blackbox, aka entropy pool.

The entropy just decreased. The pool is now biased one way or another as there is an extra ball of the opposite color inside the shaker.

Here’s where it gets interesting. The injector puts a replacement ball inside the shaker, outside of the view of the dumper. The dumper does not know what color ball the injector added to the pool.

Maybe the injector put the same colored ball in that was the same color as observed. But maybe not.

You keep repeating this process. Shake, dump, inject.

After some iterations, can you estimate how good the entropy is inside the shaker?

If the injector has a bias, then you know the entropy will decline.

If the injector is fair, then the entropy will remain good.

The question is: Is the injector fair?

Are your sources of entropy fair?

The only way to detect an unfair injector is to observe the dumping process, and measure the results against expected. That is, you must observe what comes out of the blackbox, which because you observed the dumping process, you also changed the entropy.

How long (how many bits) do you observe?

What if the injector knows you are watching for some time so appears to be fair, but then when you turn your back, the injector goes unfair?

(Think: bias when computer is idle)

Weather July 9, 2021 5:37 PM

@slf
Does the dumper have to remove a ball before the injector can put one in?
If not a slight bias wouldn’t be a problem if the injector put two balls in, based on a longer time.

MarkH July 9, 2021 5:44 PM

@SpaceLifeForm:

The analogy you offer is a pretty fair depiction of Clive’s worries about Intel’s on-chip hardware random generator, except that the Intel case is even worse!

Software can’t obtain the raw random stream, only a “whitened” version which might conceal a multitude of weaknesses.

Our kind friend Freezing — who after all raised the topic in this thread — has proposed to “roll his own” entropy source, which offers two advantages:

• control of the source

• the ability to directly monitor it

NIST 800-90B specifies continuous health testing — with the issuing of error messages as appropriate — for entropy sources.

For critical security applications, failure to incorporate such continuous testing is engineering malpractice.

Fake July 9, 2021 6:22 PM

@MarkH,

Thank you I hadn’t followed the parallels and am trying to stay on the Kurt side of our mayor.

That also makes AES even more curious.

@Weather,

I’m pretty sure that bias from the pool will be a problem over any short term measurement. I’m more concerned with giving the supplier my orders of anything less than 2500 red and 2500 green.

If we do micro transactions we’re giving up our results.

ADFGVX July 9, 2021 6:24 PM

@ MarkH

The analogy you offer is a pretty fair depiction of Clive’s worries about Intel’s on-chip hardware random generator, except that the Intel case is even worse!

Software can’t obtain the raw random stream, only a “whitened” version which might conceal a multitude of weaknesses.

That’s a general issue with CISC architectures implemented in microcode by an internal RISC processor with a Harvard architecture on which the raw microcode registers are marked restricted “for microcode use only” and not made available to consumer-level CISC instructions loaded from general purpose cache or main memory.

The “lean mean efficient” internal architecture of any modern microprocessor is always RISC of course, but the instructions from the manufacturer’s booklet are implemented in firmware microcode and “served” as a CISC architecture for consumer level compiler and assembler use, whereas access to execution of raw RISC code is denied or subverted for intellectual property protection and various other purposes.

SpaceLifeForm July 9, 2021 6:45 PM

@ Weather

Does the dumper have to remove a ball before the injector can put one in?

Yes, in the scenario I outlined.

If not a slight bias wouldn’t be a problem if the injector put two balls in, based on a longer time.

Thanks a lot for making my already complex scenario even more difficult!

Take one ball out. Injector puts two balls in. Even if you whiten it, can you still trust the injector to be fair?

Weather July 9, 2021 7:18 PM

@slf all
These a strategy used in high frequency trade were you pick up if its correct you flip and go down, if its wrong you go up again, fixed to 1 sec accuracy +-0.0001
If the dumper pick red….

Wael July 9, 2021 8:55 PM

@Clive Robinson, …

If @[…] is reading along he will probably confirm that it is something that has been discussed here some time in the past.

Affirmative! The stonner shared a few interesting links on the topic.

Clive Robinson July 10, 2021 4:45 AM

@ SpaceLifeForm, ALL,

Thanks a lot for making my already complex scenario even more difficult!

It’s actually a bit more complicated than that…

Firstly the dumper takes out all the balls and observes them all before puting them back unchanged, when ever they wish to do so.

Secondly the injector when ever they wish to randomly[1] takes out ~50% of the balls, and will,

1, Either put them back unchanged.
2, Change the colour of each ball whilst putting them back.

(That’s what the Avalanche Criteria does for you)

The choice of 1 or 2 is made by some nondetermanistic process.

Importantly though the behaviours of the dumper and drawer remain independent of each other.

What you see is each nondetermanistic bit the injector acts on effects ~50% of the balls thus each bit of entropy diffuses each time the injector acts across the set of balls.

There is a probability that each bit of entropy never diffuses across all the balls. Because it randomly draws that -50% to change/not each time.

But importantly there is a big difference between “effect” a ball and “change” a ball. That is a ball selected may not be changed (~50%), or when selected a subsequent time “changed” which in means “changed back”. So even though a ball might be “effected” by the injector multiple times or not at all, the ball may not have changed state when observed by the dumper.

In theory the effect of each entropy bit has will be to effect ~50% of the balls randomly selected[1], but you would also expect of those about half the balls were previously randomly selected so ~25% of the balls were effected by the previous entropy bit and so on.

Arguably once in 1/(2^N) occurrences one entropy bit will exactly cancel the effects of some previous entropy bit. Untill then the effects keep “rattling around” but as the dumper you will never see more than ~50% of the balls “changed” by any individual bit of entropy the injector adds.

[1] It’s actually not “random” but a fully determanistic process using an unaltering effectively “randomly generated map” as a “one way function”.

Winter July 10, 2021 6:58 AM

@Clive
“[1] It’s actually not “random” but a fully determanistic process using an unaltering effectively “randomly generated map” as a “one way function”.”

I think this explains the controversy and the misunderstanding in this discussion.

Entropy has ZERO connection with “randomness” and “probability”. Entropy exists when everything is deterministic and, in principle, knowable. It is theoretically possible to know the state of every particle in a thermodynamical system and then to be able to predict all future evolution. Entropy exists for the observer that does not have all the information. That is why it is also called “missing” or “negative” information.

That is also the reason the second law has fundamental status. Over time, my ignorance over a physical system, and thus it’s entropy, can only grow, never decrease.

The makers of /dev/random were correct in calling the input an entropy pool. They know very well that the input of this pool is not “random”, but fully deterministic. But they also know that it is, generally, unobserved and unknown: Hence they call it Entropy.

Every Entropy Pool can, in principle, be observed in detail and have the entropy reduced to Zero for that observer. But this holds for ALL classical systems, dice and shuffled cards included.

Clive Robinson July 10, 2021 1:13 PM

@ Winter,

As far as Shannon Entropy is concerned, entropy is a measure of “possability”

My son understood this years ago when he was around seven. He not only knew but could say what entropy was.

That is it was the difference between a pile of loose Lego bricks and the same bricks in a finished model.

Whilst the pile did not give “endless wonders/possibilities” you could make hundreds if not thousands of models. Thus the “potential information” v the “actual information”.

Random like Entropy is an “emergant property” as I’ve said chaos is a determanistic process with divergance, the degree of divergance increases with a function that can be related to time or if you prefer cycles through the system.

There are three basic possabilities for such a function,

1, The system has “slop” or hysteresis.

2, The system has “ware”

3, The observation system is deficient.

Slop/hysteresis is the interesting one as it necessarily envolves the environment the system is being used in. It frewuently appears to be “random” in the short term, but can be identified and synchronised to if the environment can be controled.

Ware is a consequence of use and the environment it tends to be slow and predictable, and why “Planed maintainance” works as does “tool offsetting”.

There are so many ways a measurment system can be deficient, where would you start the list…

With regards,

They know very well that the input of this pool is not “random”, but fully deterministic.

Err no, they knew the “input process” (effectively measurment) was determanistic, but they also knew that the “signal” being “measured” was a product of “the environment”. Thus there were atleast two degrees of “uncertainty” with each addition to the “entropy pool” from each “input process”.

How you describe those two or more “uncertainties” is a question not just of semantics but semantics within a context…

With regards,

Over time, my ignorance over a physical system, and thus it’s entropy, can only grow, never decrease.

Not true, think of it as “drift” in a boat at sea, you start off with a measurment that gives you an acurate position. Over time uncertainty increases, that is true, but when you make your next measurment that uncertainty colapses down to another known position. You can use the delta over time to predict drift more accurately or you can increase the frequency of possition measurments or both. You can analyse the various points and work out the state of tide etc and take it’s cyclic nature into account in a synchronized way. Sailors call the two procrsses “dead reckoning” and “taking a fix” and have been doing it for centuries.

Which brings us to the inverse problem,

Every Entropy Pool can, in principle, be observed in detail and have the entropy reduced to Zero for that observer.

Err not to zero… As I’ve pointed out the measurment process is at best deficient and as importantly all measurments are “realative”. Thus what you see and when is different to what I see and when the same for any observer. Thus you can not precisely see what the machine sees.

If you look back over this blog over the years you will find me saying there is always atlrast N+1 truths, the Points of View of the known observers N as well as the actual event they are observing (which nobody actually observes when you think about it).

I know this sounds kind of mad / ultra hypothetical but it realy is not, it’s probably happening witgon a couple of feet of you right now with your mobile phone. To both communications and space engineers this stuff is their daily bread and butter.

MarkH July 10, 2021 1:56 PM

@Winter:

I think your argument goes a step too far.

The information theory version of entropy applies to situations in which nothing is random (though random processes are part of the mathematical model). So entropy and randomness are not necessarily linked.

However, Shannon’s definition of entropy (along with alternative definitions) is a function of probabilities.

ADFGVX July 10, 2021 2:25 PM

@ MarkH

The information theory version of entropy applies to situations in which nothing is random

Information theoretic entropy is calculated from any discrete probability distribution

H = –∑ p log p

where we assume that the probabilities p are mutually exclusive and exhaustive

p = 1

Information theoretic entropy may also be calculated on a continuous distribution, but then its definition is sensitive to a specific choice of unit or measure.

H = –∫–∞f(x) log f(x) dx

Where ∫–∞x dx = 1.

SpaceLifeForm July 10, 2021 4:23 PM

@ Clive

Secondly the injector when ever they wish to randomly[1] takes out ~50% of the balls, and will,

1, Either put them back unchanged.
2, Change the colour of each ball whilst putting them back.

I think you meant to say:

2, RANDOMLY Change the colour of each ball whilst putting them back.

Lets say, in my scenario, there are 100 balls in the entropy pool, at nearly a 50-50 distribution.

If the injector pulls 50 out, but puts them back in the shaker unchanged, then the entropy did not change.

If the injector pulls 50 out, but flips ALL 50 of the bits by changing the color, then the entropy should not significantly change either. It should remain near 50-50.

This assumes that the starting point the injector is dealing wIth in the entropy pool is already near 50-50.

It will definitely change the entropy, but it should not degrade the entropy by 25%.

If the starting point (the entropy) is already biased, then yes, it is not good.

The blackbox must maintain near 50-50 at all times.

If there is an unfair injector, then my point here is moot.

Your entropy (and therefore your Random) is only as good as your strongest entropy source.

If the injector, even if bit-flipping at random, has a bias, then entropy will be lost.

ADFGVX July 10, 2021 6:12 PM

CORRECTION: Where ∫–∞f(x)dx = 1.

NOTE: HTML tags <sup>…</sup> is not working for text superscript.

SpaceLifeForm July 10, 2021 6:33 PM

@ ADFGVX

NOTE: HTML tags <sup>…</sup> is not working for text superscript.

No surprise there.

How the Markup and Markdown interact is a challenge. It is a Second Order Differential Equation.

You made me work on the problem, just to post this. I was not able to C+P what you posted. I had to do the amper semi stuff.

Clive Robinson July 10, 2021 8:08 PM

@ SpaceLifeForm,

I think you meant to say:

I was saying hat are the Avalanche rules per bit of change at the input were in an expanded form to show the implication, as,

“For each bit changed at the input ~50% of the balls will change state.”

Is a bit terse, so I wrote it as two rules with ~50% of balls in each.

However… I changed that bit later when re-writting to add the ‘random[1]’ above it (to explain it was not random at the time of selection but the selection was by a now fixed random mapping).

Moral copy read twice 🙁

But the point about ~25% of the balls changed, change back to what they were prior to the first change is a consequence of ~50% of ~50% for the second change under a second bit (the feedback of the hash output to mix with the next input to “chain” the effect of each bit so it is there in the final hash so “any change is detected”). Equally as obviously but I should have said is, ~50 of ~50% of balls that were not changed the first time get changed the second time. So in total the ~50% of “all” balls are changed remains an average constant.

So the balls “effected” by any bit of entropy goes up each time around by ~50% on average that is true enough and expected. So ~50% of ~50% more, so it approaches but may never meet 100% of balls. However on each turn around the hash a previous “ball changed” may be changed again or actually negated, the time after it may or may not be changed back or it might depending on the random mapping function.

If you think about it each ball is in effect a parity bit of the “changes” made to it by the “mapping function”, not the “actual state” of the entropy bits just the “changes in state” of the entropy bit and each time it could be any one of the new input bits that has changed. So in theory you can not trace it back thus giving you the One Way Function (OWF) requirment.

Hopefully this time I’ve done a Santa and “checked it twice” as I don’t want to “keep making a hash of it” 😉

Winter July 10, 2021 8:48 PM

@MarkH
“However, Shannon’s definition of entropy (along with alternative definitions) is a function of probabilities.”

But no one ever complains when the message is not random. I suspect Shannon entropy is mostly used when messages are not random.

One reason is that “probability” is used in several ways. One of them is about “missing information”, ie, the process is deterministic and in principle knowable, but the observer is ignorant and can only use probabilities of events. Entropy works perfectly in such cases.

MarkH July 11, 2021 9:07 PM

.
Whose Entropy Is It Anway?

@Winter:

But no one ever complains when the message is not random. I suspect Shannon entropy is mostly used when messages are not random.

Didn’t understand the part about complaining …

Your recent comments point the way to very important distinctions in how the concept of entropy is used. It seems to me that a lot of confusion results from not understanding that its meaning depends on its application.

When Shannon first applied “entropy” to information, he wrote that mathematical expressions like his formula for entropy can serve “as measures of information, choice and uncertainty.”

Those are radically different categories.

If you’re working on getting information from point A to point B, Shannon entropy as a measure of information content is often a useful tool.

If you’re creating secrets, Shannon entropy as measure of uncertainty is typically the relevant meaning.

The interrelation between choice and information is easy to see — the greater the domain of choices in forming a structure, the more information that structure is able to convey.

At the time of that paper, Shannon’s previously classified work on information security was not yet published, and his seminal paper on information theory says nothing about secrecy.

In application to information security — and especially the generation of secret numbers such as keys — “entropy” most often means uncertainty: it measures a state of knowledge (specifically, how much is not known).

When a structure is created according to set rules and statistical distributions, its entropy in the sense of choice and information is objective.

But entropy in the sense of uncertainty depends on which party’s knowledge is considered. It is, as Winter has been pointing out, fundamentally observer-dependent.

ADFGVX July 11, 2021 9:41 PM

@ MarkH, Winter

Boltzmann’s constant times the natural logarithm of two embodies the entropy of one unbiased bit of information as as a physical constant.

kB log 2 =~
9.5699262898 × 10^-24
kg • m^2 • s^-2 • K^-1

At a given absolute background temperature in Kelvin, Landauer’s principle dictates that this is the absolute minimum amount of waste heat that must be dissipated by every computational operation that irreversibly erases one bit of information, or causes two possible paths of computation to merge imperceptibly into one.

The entropy of one unbiased bit is the irreversible entropy of the doubling of the state space of the universe — into those states where the bit in question is set to one as well as all possible states where it is cleared to zero.

Clive Robinson July 11, 2021 10:38 PM

@ SpaceLifeForm,

How the Markup and Markdown interact is a challenge

Even when not interacting what comes out of “Preview” and on the page from “submit” are often wildly at variance. It’s why I don’t use it, as there is little point.

Then there is “blockquote” still apparently randomly blowing up on people as can be seen in thread listings…

So minimum input to try and ensure clarity in the face of adversity / perversity.

MarkH July 13, 2021 4:46 PM

.
Hash Entropy as a Function of Input Entropy

This is a deep subject, touched on by a variety of academic papers. So far, I haven’t seen definitive results which would make sense to readers who haven’t had a fairly deep education in statistics (which I assuredly have not).

I claimed above that if a bit sequence has n bits of entropy — even if sequence length is 100•n bits — then the n-bit hash of that sequence has n bits of entropy. That’s incorrect, though a fair approximation.

Clive responded with a chain of reasoning, leading to a conclusion that the entropy of the hash has approximately n/2 bits of entropy. As I read that claim, it’s incorrect by a wide margin. [I offer no
critique of Clive’s derivation, which I failed to understand.]

My intent here is to estimate the hash entropy using basic math many of us learned before college age.

[1] Suppose that in a bit sequence x of length l, m of those bits are equally likely to be 1 or 0, all other bits being fixed. The entropy1 of sequence x is clearly m bits.

[2] If some function F(x) is one-to-one (injective, such that every argument value has a unique function value), then the distribution of F(x) would exactly mirror the distribution of variability in x, so F(x) would also have m bits of entropy.

[3] Hash functions are non-injective: many inputs can result in the same function value. Of course, this is a necessary property for the use of hash functions to create message digests for cryptography. Therefore, in the general case the distribution of a hash function H(x) has less variability, and accordingly less entropy than x. Note well that the reduction of entropy in the hash is related to the number of collisions over the distribution of inputs.

[4] Suppose Hn(x) to be an ideal hash function with range Z2^n (outputs n bits long).

For an ideal hash function evaluated over k distinct inputs, the expected (average) number of collisions is C(k, 2) / 2^n where C(a, b) counts the number of possible combinations when choosing b elements from a set of a elements.

Substituting the formula for combinations, the number of expected collisions is k (k-1) / 2^(n+1)

[5] When Hn(x) is applied to an input with entropy equal to the hash width (that is, m=n), then the number of distinct inputs (variants of x) is 2^m=2^n, so k = 2^n. For large n, k (k – 1) is approximately k^2, so we have 2^(2n) / 2^(n+1) = 2^(n-1). In other words, very nearly half of the input sequences will result in collisions with other variants.

One consequence of this is that at least 1/4 of the possible hash output values don’t appear in the distribution of hashes. If all of the hash collisions were 2-way collisions (two inputs map to the same output), then the collisions would make up 1/4 of the set of resulting hash values; there will also be 3-way, 4-way, etc. collisions, reducing the set of outputs.

Without trying to estimate the distribution of collisions, I assume that multi-way collisions reduce the number of collision hash values to 1/6 of the hash range. [I suppose this to be conservative, in the sense of resulting in an estimate of entropy probably lower than the actual case.] By this assumption, the distribution of hash outputs makes up 2/3 of the function’s range; the other 1/3 of potential hash outputs never appear.

[6] In many cryptographic applications, such as the choosing of keys, the most relevant measure of entropy is called “guessing entropy.” It’s based on measuring the average number of guesses required to find the secret, assuming that the guesser starts with the most probable value and proceeds in decreasing order of probability.

When a hash function output as described above is used as a key, the optimal guessing strategy would be to start with high-collision hashes (for example, 12-way or whatever the maximum might be) and work downward, ending with the non-collision hash values. Obviously, this would be less work than a simple exhaustive search trying every n-bit integer which would form a valid key.

[It’s an interesting question whether an attacker could compute which hash outputs are more likely to result from collisions in less time than the extra cost of a simple exhaustive search.]

Using the example of n=256 for an n-bit symmetric cipher key chosen completely at random (with 256 bits of Shannon entropy), exhaustive search would require an average of 2^256/2 guesses.

With an n=m hash-value key, there’s a 50% probability that the key is one of the collision cases. Using optimal guessing strategy (which exhausts collision hashes first), the median number of guesses is 2^256/6 (based on my 1/6 assumption). Probably the mean is not very far from the median: there’s better chances near the start of the search, but if it turns out not to be a collision hash, there’s extra work plowing through the non-collision half of the hash range.

Accordingly, the ratio of exhaustive search work is reasonably estimated as 6/2 = 3; log2 3 ~ 1.6, so 1.6 bits of guessing entropy were lost in the hashing process. The 256 bits of entropy in x correspond to about 254 bits of entropy in H_256(x).

========================

There’s surely much more that could be said on this topic, but this comment is long enough!

  1. Here, entropy may be understood as either Shannon entropy or guessing entropy, because the measures are equal for the given bit sequence. 

Clive Robinson July 13, 2021 11:56 PM

@ MarkH,

Clive responded with a chain of reasoning, leading to a conclusion that the entropy of the hash has approximately n/2 bits of entropy. As I read that claim, it’s incorrect by a wide margin.

No I said ~50% of bits would “change” at the output of the actual hash determanistic map for each change of a bit at the input to the map. That is the Avalanche criteria.

It’s an essential point to grasp that “bits changed” is not “bits effected” so shall go through ot again and word it slightly differently.

I explained that to propagate that “1bit” of entropy onwards –which is usually but not always the purpose of a hash function– the hash map output would have to be “chained” back to the hash map input in some way (something @Winter had problems with getting to grips with). And again on the second pass through you would “expect on average a ~50% change” of the bits at the output of the hash map.

I pointed out that of the ~50% of bits changed in the first pass through the map, you would only expect ~50% of those to be changed, but you would also expect ~50% of the bits unchanged originally to be now changed. That is ~75% of output bits have been “effected” after the 2nd time through but only ~50% are changed. That is you get a kind of “parity effect” on the “changes”, the bit flips from changed to effectively unchanged and back again for each bit selected for change by the hash map.

So whilst ~50% of bits “change” back ~50% of bits don’t get changed. Because ~50% of the ~50% of bits not originaly changed on the first pass are now “changed” but also aproximately ~50 of ~50 originaly changed have changed back again. Thus maintaining the aproximately ~50% of changes on each turn around through the hash map.

Importantly though, is that output bits “changed” and output bits “effected” are different measures.

That is whilst the “change” remains around ~50% each time through a subsequent pass through the hash map aproximately ~50% Of the bits remaining “uneffected” by previous changes bow become “effected”

So ~50, ~75%, ~87.5%, ~93.75 and so on BUT because “this is on average” sometimes even after 2N passes through an N bit wide map some bits will remain “uneffected”. If the hash map is a true “random map” as you would expect from a “random Oracle” then some bits might not change in even longer cycles.

But you would also expext at some point all the bits “changed” by that 1bit of entropy to cancel out as the hash map and chain feedback form a “Discrete System”…

Because even in Entropy with an ideal gas in a sealed container in a water bath you would expect to see “entropy” apparently go into reverse” at some point in time. It’s called the “Poincaré recurrence theorem” and it basically says,

“In physics, mathematics, and statistical mechanics, the theorem states that dynamical systems with certain properties, after a sufficiently long but finite time, ‘will return to their initial state’ for discrete state systems, or one arbitrarily close to intitial state for continuous state systems’.”

Think of a glass of water in the corner of a sealed room held at a constant “room temprature”. We know the water will evaporate into the air, we also know it will condense back out. Therefore at some point in time it will all condense back into the glass. Not once but over and over on a cycle.

The only real question is “When will that happen?”. It’s called the “Poincaré recurrence time” and whilst it can be wildly different for apparently similar systems, arguably on average in our macro world a lot longer than we expect the universe has. However the smaller the size of a discrete state system the sooner it will happen as the number of states is reduced.

Which is exactly what you see and would expect to see happen with an N bit hash map suitably chained to make a hash function. It’s why we give the hash maps such a large bit width.

MarkH July 14, 2021 12:45 AM

@Clive,

In a July 8 comment addressed to me, you wrote:

… you say 256 bits of entropy at the hash output, I’ve shown it will be ~50% of that

I interpreted these words to mean that the SHA256 hash of an input with 256 bits of entropy would have roughly 128 bits of entropy. Did I misconstrue that?

========================

If the input entropy and hash entropy be denoted by e and h (both in bits), and n be the hash width in bits, we have:

enhehe

MarkH July 14, 2021 1:13 AM

.
A Further Speculation/Question About Hashes to Extract Entropy

I wrote above about the NIST recommendation to collect twice as many bits of entropy as the function range width prior to hashing, and that this is supposed to result in 0.99999999999999999995 bits of entropy per bit of output.

With respect to the guessing entropy argument I made above, it’s hard to imagine any practical difference between that and (for example) 0.9999 bits of entropy per bit of output.

So what is the motivation for that guidance? Here is my speculation (people who know more, please chime in!).

In my above analysis of input entropy equal to the hash width, I estimated that something like 30% of the possible hash values might never appear.

Intuitively, this causes some statistical anomalies in the set of hash outputs. For some cryptographic applications, only the guessing entropy matters, and such statistical anomalies are of no importance.

In other cases however, it’s important that secret numbers be drawn from a distribution statistically indistinguishable from a random distribution.

Interested readers may wish to look up the “leftover hash lemma.” If I understand it correctly (probably I don’t), it derives how many bits can be extracted from data with partial entropy while achieving statistical properties close to a random distribution. [Corrections welcome!]

I speculate that the NIST recommendation is based not on squeezing in some ultra-ultra-microscopic amount of entropy per output bit, but rather attaining statistical properties so near to random that no feasible test can distinguish them from a random distribution.

Clive Robinson July 14, 2021 2:29 AM

@ MarkH,

In a July 8 comment addressed to me, you wrote:

It’s always handy to add the link especially if a search is going to fail.

But to repeat what was said by me,

“You claimed,

<

blockquote>and compute the [SHA256] hash of that file, its output will have 256 bits of entropy. In other words, the result will have the maximum possible of one bit of entropy per bit of output.

That is you say 256bits of entropy at the hash output, I’ve shown it will be ~50% of that.

So there is a ~100% increase in your claim to account for…”

Is the original conversation (with the exception you originally used an even shorter hash by mistake).

It’s because of the confusion I started talking about “changed” and “effected” when talking about the output bits of the hash map.

The Avalanche Criteria/effect will on average change ~50% of the hash map output bits, which is all you should be able to see (the pool being secret by functional requirment).

Now is that “real entropy” from some physical source or “observed entropy” you see at the output?

In theory that ~50% change “on average” applies to any changes at the input, be it just a single bit change or nearly all 256bits changing (look up Bent Functions that are non-linear mapping functions combined with linear boolian functions for a mind numbingly detailed explanation but as I’ve mentioned previously brush up on Walsh-Hammarad functions first).

As an observer you see on average ~50% of the bits change at the output but you do not see how many bits change at the input which could be just 1 bit at the input to the hash map…

Thus the output bits from the hash map do not directly relate to the “real entropy” at the input. As the observer of the output all you see is bit changes.

Is that any clearer?

MarkH July 14, 2021 3:43 AM

@Clive:

I’ve asserted that a 256-bit hash of an input with 256 bits of entropy has ~256 bits of entropy.

It seemed to me that you asserted that a 256-bit hash of an input with 256 bits of entropy has ~128 bits of entropy … I’m still confused as to whether or not that’s the meaning you intended.

If that is what you meant, then we have conflicting assertions.

Either both are false, or only one is true.

Clive Robinson July 14, 2021 6:50 AM

@ MarkH,

What you actually said to @Freezing on 7th July is, after ill defining a source as having 1% entropy is,

If you collect 25 K bytes from such a source — almost all of which are predictable — and compute the SHA1 hash of that file, its output will have 256 bits of entropy. In other words, the result will have the maximum possible of one bit of entropy per bit of output. The 25,344 predictable bits don’t detract from that!

1, 25 K bytes = 200 K bits.
2, 1% of 200 K = 2 K.
3, SHA-1 is given as 160 bits wide.
4, 25,344 predictable bits

Are I assume all errors on your behalf?

I’m guessing that what you are trying to say is you have put 256 bit’s of “entropy in” therefor you asume –incorectly– you will have 256 bits of entropy at the output of the hash function. Hence,

In other words, the result will have the maximum possible of one bit of entropy per bit of output

But from the technical asspect, as I’ve indicated repeatedly it is not true. I’ve tried to tell you why with the simplest logical case, ie the Avalanche Criteria, and I’ve mentioned you should consider the more than a century old “Poincaré recurrence theorem”. Which tells you that the entropy will be cyclic in nature for a discrete state system (which a hash map in chained feedback is implicitly, also Linear Feedback Shift Registers, likewise Non-Linear Feedback Shift Registers as are counters be they odometer style with carry or Chinese style that have different prime counts in each column but no carry).

I’m not sure what it is you do not understand but I suspect you do not understand what entropy actually is as an “emergant property of a discrete state system with cyclic behaviour”.

As I’ve said it’s why I stoped talking about entropy at the output of the hash function and started talking about “effected” bits which is entirely different to “Changed” bits. The later following a “parity” effect on the number of times they have been changed.

I also explained how you would see the number of bits “effected” grow with each pass through the hash map and how it may never effect all hash map outputs.

For amoungst other reasons the cycle point may be reached where the effect of one bit of entropy at the input to the hash map is entirely negated.

I suggest rather than trying to find explanation in heavy treaties on entropy, you start by understanding the notion of a bijective map where the mapping is randomly selected, then move on to an understanding of vector addition over a Prime field {0:1} and work towards linear algebra via basic Boolian logic and thus why in a feedback system the behaviour is overwhelmingly cyclic in nature (ie odometer counting etc).

But if you realy must dig into entropy get your head around Boltzman’s Entropy from a century and a half ago first,

S = kb log W

And what is realy going on with W and why it reduces down for Shannon and other entropies.

Oh and remember as far as things go with entropy it’s about “the statistics of particles” so all systems are realy discrete it’s why W is a non negative integer (Natural number).

Fake July 14, 2021 6:28 PM

@Clive,

Prime based iterations? has anyone tried to see if those are recoverable?

Could you give me a specific algo w a ‘chinese primed’ ‘recycler’ ?

my argument about hashing my private seed is just that, i don’t want to broadcast it any more than i would already “have to”. an extra set of processing is an extra set of broadcast.

seems we should just yell our enigma rotor configuration across the room.

Fake July 14, 2021 6:38 PM

shortest path on an atx/matx/eatx board would be existing pins, so fan and audio pins probably retrofitted with something similar to the standard equipment but not.

the fans themselves are dc/brustless motors being induction of various sorts and probably not a good idea to treat those are purely random direct data. could be mirrored easily? the cd audio pins might be co-optable as potentially could be caselock pin on the mobo itself. i could see caselock being output into a light sensing diode, audio pins being fed by a vibration sensor. you have 12v and 5v available within the case there’s all sorts of stuff that you could do w existing pinouts and never have to move up into usb or some other microchip sampled data or protocol.

hashing a well of entropy itteravely is kicking a dead horse, randomize you samples with a small second source of entropy, maybe randomize your source selection even and run with it. i suppose blinding something with senses is harder than tricking a single sense that they trust.

Clive Robinson July 14, 2021 8:46 PM

@ Fake,

Could you give me a specific algo w a ‘chinese primed’ ‘recycler’ ?

Consider a set of wheels of

1, 1..2
2, 1..3
3, 1..5

All set to 1, then you “clock them” all one position you get the following sequence,

111, 222, 331, 412, 521,
132, 211, 322, 431, 512,
121, 232, 311, 422, 531,
112, 221, 332, 411, 522,
131, 212, 321, 432, 511,
122, 231, 312, 421, 532,
111.

Which is 30=5x3x2 long before it cycles at 111 again.

Clive Robinson July 14, 2021 9:20 PM

@ Fake,

Prime based iterations? has anyone tried to see if those are recoverable?

Any repeating sequency you can sychronize to can be removed simply by generating an inverse signal of the right amplitude. It becomes especially easy if you get either an implicit “sync signal” or the repeating frequency is stable.

That is why you have to understand what is a “chaotic” signal and one that is “truley random”. Most PC motherboards are swamped in the former and distincly lacking in the latter.

You also have to consider that nearly all “analog signals” into a PC are “sampled” and this includes both the keyboard key press timing and mouse movment timing.

Without going into details of the Nyquest limit or y=sin(x)/x effects you actually end up with quite a low bandwidth signal at best. Most of which is in effect “chaotic” not “Truly Random”.

Speaking of sampling,

randomize you samples with a small second source of entropy,

You need to be carefull… Ever hear of the “Wagon-wheel effect” where in movies the wheels on the wagon appear to turn slowley in the wrong direction?

Well it’s a result of the real wheel signal at a high frequency being sampled by the “stroboscopic” effect of the “film gate” at some harmonic of the film gate frequency. The frequency difference between the real wheel frequency and the gate harmonic is what you see. It also illustrates the concept of positive and negative frequencies by the direction the wheel appears to turn in.

Obviously seeing the wheel turn rate enables you to “lock-up” or synch to it thus strip it out…

And so on.

MarkH July 14, 2021 10:11 PM

@Clive:

The perils of sleep-deprived commenting!

For the record, my reference to the wrong function is corrected in the immediately following comment posted 7 minutes afterward; and my brain-dead writing of K bytes instead of K bits doesn’t affect either the logic of the assertion, nor the broader message that low-density entropy can be concentrated.

Fake July 14, 2021 10:36 PM

@Clive,

yeah good simplistic example, obviously only a single transformation but that’s where my eye is on such looped ‘obfuscation’. one person telling me it’s okay to md5 a 2kb file of random data is okay to share the md5 but by hashing it i might be sharing the whole file and that’s excluding any pre-existing rainbow tables.

by 2kb random file i don’t mean 2048 bit key, but that should put it into perspective and the same being for 4096… what are the undisclosed safety margins involved here or anywhere else?

i still don’t want to handle my keyfile more than i have to, that’s why we have hardware based key-escrow.

known unknowns are hard

afternote,
i see you went with the tumbler example, probably apt due to the hash conversation above and it’s relation to the some of the newer ‘light-weight’ ‘speed-balanced’ symmetric stuff. i wasn’t sure if there was a different meaning behind primes being used other than for ‘unique’ looped transformations.

Clive Robinson July 15, 2021 3:56 AM

@ MarkH,

doesn’t affect either the logic of the assertion, nor the broader message that low-density entropy can be concentrated.

Actually no you can not “concentrate” entropy, sorry, that’s not what “entropy pools” are all about anyway.

What you are actually doing by the use of Shannon “confusion” and “diffusion” is spreading 1bit of entropy across multiple output bits of the hash map nothing more.

If the hash map is also used with chained feedback of some type, all you then do is further spread out that 1bit of entropy with time such that it effects more bits at the output of the hash map, but on average the changed bits will be ~50%.

But if you wind the process back as you can with a bijective map and linear vector addition in the chained feedback all you get back to is your single bit of entropy.

That’s it.

MarkH July 15, 2021 10:46 AM

.
Fun Facts About Entropy

[1] Consider the integer 31,556,926.

If it was derived as a random bit sequence (say, by low-bias coin tosses), the most probable number of generated bits is 26. If there were indeed 26 generated bits (coin tosses or the like), then it has very nearly 26 bits of entropy.

When entropy is interpreted as uncertainty, it means a state of knowledge. To an observer I’ll nostalgically call Oscar:

• if the number was composed from 32 bits chosen without bias in a manner Oscar could neither predict nor infer, and remains secret from him, then it has 32 bits of entropy

• if Oscar has discovered the number, it has 0 bits of entropy

• if Oscar knows that the number has some astronomical significance, it may have no more than a dozen or so bits of guessing entropy (see section [6] of this comment)

[2] What is the entropy of zero? Well, it depends!

If a selection from the integers modulo 2^256 — made in a manner that is unpredictable, and is no more likely to choose one of those integers than any other — happens to yield zero, then zero has 256 bits of entropy.

That may feel counter-intuitive. How can a number as simple and formless as zero have any entropy at all?? And anyway, what are the chances that 256 consecutive tosses of a hypothetical “fair coin” will all have the same outcome???

Well, we know the chances: the probability of that particular outcome is (pretty nearly) .00000000000000000000000000000000000000000000000000000000000000000000000000000864

That’s the same probability as the outcome being hexadecimal 5A3122F29D912222382260C25D606EC8B3B8330C29A8441302C8AEC037EF5E8C, or any other integer not exceeding 256 bits.

In this context, 5A3122F29D912222382260C25D606EC8B3B8330C29A8441302C8AEC037EF5E8C has exactly the same entropy as zero. Its entropy is determined by the probability of its selection, not its value nor any aspect of how that value is encoded or represented.

In contrast, if zero is the outcome of a single random bit generation, it has exactly one bit of entropy.

[3] In computer science, we’re used to thinking of randomized choices as yielding numbers. It’s more practical to do, and more convenient for applications.

However, numbers have no special status with respect to entropy. Entropy can be computed for the selection of an element of any set.

Consider 6-sided dice with negligible bias. Their faces are conventionally marked with patterns of dots, which are conventionally interpreted as numbers (though they don’t have to be!)

Conventionally, each roll of such a die corresponds to the randomized choice of an element of the set {1, 2, 3, 4, 5, 6} and its outcome has log2 6 (~2.585) bits of entropy.

If the faces of 6-sided fair dice are instead marked with distinct colors, or recognizable images of the faces of different famous persons, the outcome of each roll has exactly the same entropy as the roll of a number-marked die.

========================

All of the examples above illustrate what is crystal-clear in Shannon’s “A Mathematical Theory of Communication” — entropy is defined in terms of the probability distribution of distinct outcomes, each outcome corresponding to one element of some set of possible outcomes.

Entropy is NEVER determined by the type, name, internal structure, symbolic representation, coding, etc. etc. etc. ad infinitum of the elements of that set … entropy is based only on the distribution of probabilities among them.

It doesn’t matter what the elements of that set “look” like; any non-empty set will do.

========================

Attempts to infer the entropy of zero, or of 5A3122F29D912222382260C25D606EC8B3B8330C29A8441302C8AEC037EF5E8C, or any other element of any set by analysis of internal structure are doomed to failure, because they ask the wrong question.

One might as well enquire, “how many bicycles are needed to make water wet?”

Clive Robinson July 15, 2021 4:20 PM

@ MarkH,

A simple experiment, based on your proposed idea.

Generate a file A of data that consists of M blocks of N bits. The first M/2 blocks are semi-random data of approximately uniform numbers of set or cleared bits. The second M/2 blocks are all set to zero or some invariant block pattern.

Copy file A to file B.

Then in file B go to the last block of semi-random data and change just a single bit’s state.

Now using a bijective hash map N bits wide run file A through it and save as file A’, likewise file B saved as file B’.

Compare file, A’ and B’ prime, how many bits are different and where?

Also what do you notice about the second half of the files A’ and B’?

Now change the use of the hash map so it is now in an output feedback mode. Run file A through and save as file A” and run file B through and save as file B”.

Now compare files A” and B” where do you see differences start? Where do they stop? And how many bits are different?

Now build from the bijective hash map H an inverse hash map H’ and run the files A’ and B’ through it and compare the output to files A and B respectfully. What differences do you see?

Now for the feedback chain C build an inverse feedback chain C’ around the inverse hash map H’ and run the files A” and B” through it and compare them to files A and B. What differences do you see?

The point is that there is only 1bit of difference between files A and B and after going through the hash map H only one block of N bits is changed and that goes back to 1bit when put through the inverse hash map H’. So the real entropy in file B is 1bit and likewise file B’ even though to the observer more bits have changed by Shannon confusion and diffusion. No entropy has been “concentrated” in fact the opposite now 1bit of entropy has been spread across a single N bit block with ~50% of bits changing.

Now when you repeate this with file A” an B” and the inverse chain C’ and inverse hash map H’ you still have only 1 bit of entropy in the resulting output even though just over half the blocks in file B” were different to file A” with on average ~50% of the bits in each one of those changed blocks are different.

That gives you proof that neither the hash map or chaining mode are doing anytging to change the entropy.

Why you appear to still have issues with this I realy do not know…

The hash map H does not create entropy nor does it’s inverse H’ destroy entropy the same with the chaining mode C and it’s inverse C’.

It’s why for years now I’ve called the use of crypto algorithms on TRNGs “magic pixie dust thinking” because like the legendary “fairy gold” it just as quickly turns back into the same base it started from.

From a security asspect as an observer who can only see the output of the total hash and chain function you actually have no idea if there is actually entropy in the system at all. It could just be a Stream Generator based on a simple counter or LFSR feeding a block cipher like AES so AES-CTR with a fixed key known only to the chip manufacturer and their chosen confidants…

Even if as the chip manufacturer I add “chaos” to the counter, if I arange for a sync signal I can eaaily strip it off again as can my confidants.

These are the hard facts Linus came up against when he made his ill advised comments about basing the /dev/urandom and /dev/random in Linux just on Intel’s CPU ‘alleged’ TRNG.

Fake July 15, 2021 4:35 PM

@moderator,

actually, seeing as clive’s last post is referencing a timeframe that we know includes an article intentionally tagged ‘linux, openbsd’ by the author can i request a feature?

blog articles include ‘tags’, but the search box is just generic ddg.

@all,

other than me including a generic search critera for ‘openbsd’ and ‘trng’ from an external source does anyone have a recommendation for using the /included/ tags to search this sequentially authored content from an external site?

SpaceLifeForm July 15, 2021 5:38 PM

@ Fake

You can try adding ‘inurl:schneier.com’ to your search.

There may be a distribution of probabilities among your results.

Fake July 15, 2021 6:32 PM

@slf,

i think i’m going to open curl and wget on this one [to limit depth], my memory is pretty foggy as to when my earliest post was but i remember the dual source talk i just don’t remember if it was here or somewhere else.

funny, in your link:

**//This is a hard problem. We don’t have any technical controls that protect users from the authors of their software.

//And the current state of software makes the problem even harder: Modern apps chatter endlessly on the Internet, providing noise and cover for covert communications. Feature bloat provides a greater “attack surface” for anyone wanting to install a backdoor.**

circa 2013 obviously.

@slf, ian,
guard pages and signaling, $10musd bounty 07/15/21

chicken or the egg?

MarkH July 16, 2021 10:33 PM

@Clive et al.:

I think I get what you’re driving at, by “opening the covers” and looking at the machinery by which hash functions work.

My perspective is different: good hashes are designed to approximate an ideal hash function, and their internal mechanization is the means by which the designers work toward that goal.

For the purposes of entropy extraction, what’s important about a hash function is that (a) with respect to input differences — especially small differentials — the hash maximizes distinctness of outputs, and (b) the distributional effect of input changes is substantially independent of position (first bit, last bit, or anywhere between).

If a hash function achieves those desirable characteristics, the means by which it does so may be of absorbing interest; for this application, the “black box” model is sufficient.

========================

In trying to illuminate how the process works, I have chosen an extremely contrived scenario in which most input bits are fixed, with a minority of unpredictable bits at certain locations.

It’s not a realistic scenario for random number generation! But it’s simple to reason about, and the analysis of more typical situations with entropy broadly distributed across the input leads in the direction of math I don’t know; and even if I learned and applied it correctly (who knows?), perhaps it would shed faint light for the majority of our readers.

I started with the example of a long input containing one unpredictable bit not because any rational person would apply a hash that way, but in order to show that:

• the hash preserves the entropy — the tiny drop of entropy in the input, is guaranteed to be present in the output (not added or created but preserved, mostly or entirely)

• the density of entropy (entropy per bit) is much greater in the hash than in the input bit sequence

To say the one bit of entropy has not been “concentrated” but rather spread across the bits of the hash is (in a sense) literally true, but does not contradict the facts above.

========================

My use of the verb “concentrate” has provoked objection not only from Clive, but at least two other commenters as well.

What about “collect”, or “gather”?

If the long input has 100 scattered bits which are unpredictable and unbiased, SHA256 will collect all of them into a “bucket” with much more entropy per bit1.

We could propose, “why not just pick out and concatenate those 100 bits? Then you’ve got pure entropy!”

But perhaps — if the location of those bits in the input were unknown — it might be conceded that applying the hash function has usefully increased the entropy per bit.

The hash function has collected the entropy scattered throughout its input. Plain truth, as it seems to me.

  1. In that ratio, the likelihood of even a single collision is extremely low, so the hash is practically certain to have exactly 100 bits of entropy. 

Clive Robinson July 17, 2021 4:14 AM

@ MarkH,

For the purposes of entropy extraction, what’s important about a hash function

A hash function does not “extract” entropy, it can not.

Entropy “extraction” if you insist on calling it that is done if at all –and it often is not– at an earlier stage much closer to the TRNG. Look on it as being the inverse of a signal to noise enhancing circuit.

The aim is to remove the determanistic and as much of the chaotic signals as possible and leave the “unpredictable / random” noise as possible and treat that as entropy (even though entropy is actually a small part of it).

Look at it this way if you listen to the output of a Geiger Counter you get a series of clicks. Over a short period of time they appear random with time. However over a longer period of time a clear underlying decay signiture becomes apparent. As this is quite predictable you can either remove it or mitigate it so it’s predictability is nolonger a factor.

When you think about it what you are looking for from a TRNG in crypto use is not even entropy in the information theoretic perspective but,

“Unpredictability by an adversary.”

Hold that thought as foremost in your mind.

Unpredictability goes up by the number of degres of freedom your input has to the hash map function after that, as the hash is entirely determanistic all that changes is the Shannon Confusion and Difussion. That in the case of Confusion changes with every new input through the hash map which is what the feedback chaining is supposed to give (it’s why having irregular clocking / sampling is seen as a way of increasing unpredictablity).

With regards,

the density of entropy (entropy per bit) is much greater in the hash than in the input bit sequence

That makes no sense what so ever the hash map is fully determanistic and without storage. It is a mapping function like a very large ROM or lookup table, you present the same input you always get the same output.

Both your links are effectively broken so it makes further discussion moot at this point.

MarkH July 17, 2021 1:25 PM

@Clive:

The usual term in cryptographic literature is “randomness extraction” rather than “entropy extraction.” Hash functions are one recognized tool for randomness extraction. I suggest that in the context of generating secret numbers, entropy is a suitable measure of the quantity of randomness.

Your words

…what you are looking for from a TRNG in crypto use is not even entropy in the information theoretic perspective but, “Unpredictability by an adversary.”

seem to imply a distinction where I see no difference. How is a quantity of information unpredictable to Oscar different from its entropy (in the semantics of entropy as uncertainty)?

You wrote that it “makes no sense” to say that

the density of entropy (entropy per bit) is much greater in the hash than in the input bit sequence

Shannon entropy has a clear definition, and the Shannon entropy of the outcomes of a set of “trials” (distinct process completions generating outcomes) can be measured to as great a precision as one pleases.

If an input sequence has a mean entropy of 0.1 per bit, and the hash of that input has mean entropy of 0.9 per bit, why does it make no sense to say the entropy per bit of the output is much greater than that of the input?

To my understanding, that X “makes no sense” means that X has no meaning (is gibberish, for example), is too ambiguous for useful interpretation, or is logically erroneous.

========================

I know that your time is valuable; if you would be so generous, will you kindly review my above estimate for the entropy of a hash?

Do you think it’s incorrect? If so, where do you see it going wrong?

JonKnowsNothing July 17, 2021 2:33 PM

@MarkH, @Clive @All

re:

  • @MarkH I think I get what you’re driving at, by “opening the covers” and looking at the machinery by which hash functions work.
    My perspective is different: good hashes are designed to approximate an ideal hash function, and their internal mechanization is the means by which the designers work toward that goal.
  • @MarkH If a hash function achieves those desirable characteristics, the means by which it does so may be of absorbing interest; for this application, the “black box” model is sufficient.

Given the topic of discussion it seems to me that the very premise of any black box model, is suspect. Any aspect that is not fully evaluated from top to bottom, implementation through coding including updates or hardware changes, can lead to the assumption that things are “good enough”, when they are not.

Therefore “statistical anomalies are of no importance” is actually of extreme importance.

Even if your application does not require a more robust model, knowing that your model is ROBUST or is FAULTY can alter what sorts of information to trust to that model.

tl;dr

While not of such in depth review, these topics arise in MMORPG combat games. These video versions are patterned on the paper D&D type games with “dice throws” that govern all game events and combat outcomes.

Online versions do the “dice throws” automatically and the calculations are done on-the-fly. When the topic of “random outcomes” arises, some very heated exchanges happen because RNG and PRNG are not the same.

They may not be sending encrypted high level information of global importance but they do define if your character survives an encounter with a more powerful adversary and if you get a chance at the rare loot.

Within most games are “manual random generated” dice rolls like “/roll 100” with the winner getting the rare loots.

But these rolls are no more “random” than any other generated version. The algorithms are guarded by the game manufactures. They are “black box” versions.

For games the outcomes are just Pixels on a Screen but for other applications they can be of grave importance.

===

ht tps://www.schneier.com/blog/archives/2021/07/friday-squid-blogging-best-squid-related-headline.html/#comment-383687

ht tps://www.schneier.com/blog/archives/2021/07/friday-squid-blogging-best-squid-related-headline.html/#comment-383960

(url fractured to prevent autorun)

Clive Robinson July 17, 2021 4:06 PM

@ MarkH,

The usual term in cryptographic literature is “randomness extraction” rather than “entropy extraction.”

For very good reason randomness and entropy are not the same by a very long way.

As I pointed out to you,

“Unpredictability by an adversary.”

Hold that thought as foremost in your mind.

You are otherwise running around a twisty little maze of passages of your own devising, but without a map on how to get back.

As I’ve also pointed out both the bijective hash map H and the vector addition chain function C are fully determanistic, you can build inverses for them H’ and C’ prime and wind them backwards.

This means that an attacker of sufficient compuring power/memory can successfully attack the overal hash function which has happened to SHA1 already.

Thus from the algabraic logic side there are not any “degrees of freedom” other than what you chose to do with what are the inputs to the overall hash function…

The “randomness” thus occurs before the hash function and it does not add anything to either the randomness or entropy.

It therefore does not matter what colour the box is you chose to put the hash function in it does not change that.

Something you really really do not appear to be getting a grip on.

You are not the only one which is why I’ve said for years “Magic Pixie Dust Thinking”.

All the use of any crypto function does after the TRNG is hide from easy observation just how crappy an individual TRNG is, that’s it.

I’ve already proved this to you with simple factual arguments and proofs.

I realy do not know what it is you think you are trying to prove but you sure are not going about it in any way that might aid your cause.

MarkH July 17, 2021 4:38 PM

@JonKnowsNothing:

The widely used cryptographic hash functions have extremely good statistical properties. They’re designed from the ground up to have them — that’s the entire motivation for their designs. And then they are tested intensively.

They also have qualities such as preimage resistance which are of zero importance for the application I’ve been discussing.

An insecure crypto hash — with feasible second preimage attacks — might have measurably more collisions against randomized inputs than an ideal hash function. If it does, that means that a little sliver of one bit of entropy is lost at the output.

For the purpose of entropy collection, crypto hashes are miles better than is strictly necessary. It’s rather like making a salad fork from the special steel used in uranium enrichment centrifuges.

========================

Preimage attacks (against the use of hash functions for digital signatures) have indeed been based an analysis of hash internals. They’re just not relevant to entropy collection, and anyway nobody has yet found such a weakness in SHA256.

========================

I wrote about statistical anomalies (nice out-of-context quote, Jon) in reference to secret random numbers. The hash collisions reduce the guessing entropy (as I explained in some detail); otherwise the statistical anomalies corresponding to “missing codes” and reduced min-entropy just don’t matter for that application.

Sometimes random numbers for cryptography are not secret, a famous example being the random numbers exchanged in Diffie-Hellman key negotiation. For such an application — if you’re doing DH thousands of times a second, as some operations do — the “missing codes” could (at least theoretically) leak some information to an attacker.

Whether that leads to any practical attack, I’ve no idea. But if your crypto randoms will not be kept secret — and you want to be very conservative — provide extra bits of entropy at the hash input.

Per the NIST guidance, providing a hash input with twice as many bits of entropy as the hash width makes the distribution of hash outputs indistinguishable from a completely random distribution.

MarkH July 17, 2021 5:04 PM

@Clive:

If you don’t use entropy as a measure, how do you quantify randomness? I wonder.

When you write about “attack” on a hash, I suppose you mean inversion in the sense of finding a preimage.

The input to the hash is used once and thrown away. The part of the input that’s predictable isn’t very useful to know, and (by definition) the part that is not predictable is very unlikely to recur when its entropy is hundreds of bits.

Someone really worried about this possibility can use the strongest hash functions available.

But honestly, anyone who expects that some attacker is willing to focus heavy resources into attacking their information security (a) has a whole host of other things to worry about and (b) would be well advised not to comment about security measures on public forums like this.

Fake July 17, 2021 5:37 PM

hardware escrow of seed data should probably be explained as explicitly data_at_rest.

data in it’s most restful state possible should not be available for either:

a) direct processing
b) extraneous processing

if direct processing is preferred b) is definitely important.

Fake July 17, 2021 5:53 PM

to kick a dead horse,

if TIA knew everything the NSA would not be an Agency but a Fellow

there are or may be ways to ‘come up’ and still post publicly and have a secure enough channel (think hashes and multiple transforms) and still require the mindset of strict privacy for implementation details being prepared for.

Clive Robinson July 18, 2021 2:54 AM

@ MarkH,

If you don’t use entropy as a measure, how do you quantify randomness? I wonder.

Well you can start with the English language,

1, Randomly selected from a set…
2, Entropy selected froma set…

You frequently hear the former but not the latter, thus common sense would tell you there must be a reason Yes?

Suprose suprise there is, randomly effectively means “unpredictably” whilst entropy is a measure of possibility.

When you write about “attack” on a hash, I suppose you mean inversion in the sense of finding a preimage.

No I do not I mean any and all attacks including inverting the bijective map function H into H’ that effectively turns the hash map from being an encrypting block cipher into a decrypting block cipher.

The point is @MarkH you originally made a statment, that was wrong, you were told it was wrong you’ve been given both argument and informal proof that it’s wrong.

So what do you do, you run around trying to suggest two things,

1, You don’t know very much.
2, But the person who told you were wrong must be wrong.

It’s a silly game to play so stop whilst you still have some coinage left in your pocket.

MarkH July 18, 2021 10:12 AM

@Clive:

you’ve been given both argument and informal proof

I’ve seen an assertion that the 256-bit hash of a file with 256 bits of entropy will have approximately 128 bits of entropy.

I’ve seen some reasoning about avalanche effect.

I haven’t seen anything I understood as a proof of the quantity of entropy, even a very informal one.

========================

I prove theorems from time to time, it’s a little hobby. I start with a crisp statement of the theorem I’ve set out to prove, and then write mathematical and/or English language sentences which (I hope) are logically chained.

But it isn’t a proof unless the last step demonstrates the correctness of the theorem.

Suppose the theorem is:

If Hn is an ideal hash function n bits wide, and X is a bit sequence of total entropy n, then the entropy of Hn(X) ~ n/2.

The last step must be something equivalent to:

–∑ p log2 p ~ n/2

Otherwise, the theorem stands unsupported.

========================

Note well that in the case of hash function Hn, the set of outcomes is the integers modulo 2^n, so each possible outcome is an integer with radix 2 expression not longer than n.

The entropy of the hash is necessarily a function of the distribution of those integers.

If anyone offers a proof with a calculation of the amount of entropy, I shall study it with great attention and care.

MarkH July 18, 2021 11:03 AM

@All:

If you’re interested in my investigations, I’m starting to study a purported proof that an m=n=256 hash has expected entropy ~255.2 bits; my estimate above was ~254.4 bits.

The prediction of losing less than 1 bit is surprising to me, considering my estimate of collisions.

I’ll report back if/when I make sense of it …

Clive Robinson July 18, 2021 1:50 PM

@ MarkH,

I’ve seen an assertion that the 256-bit hash of a file with 256 bits of entropy will have approximately 128 bits of entropy.

Not by me.

My first comment to you on the subject is,

https://www.schneier.com/blog/archives/2021/07/friday-squid-blogging-best-squid-related-headline.html/#comment-383312

Since then all you’ve realy done is muddled terms and make vague arm waving arguments.

I haven’t seen anything I understood as a proof of the quantity of entropy, even a very informal one.

As you’ve been told it is irrelevantas a hash map H is fully determanistic it can not create entropy. All it does is present a changed output for the input due to deyermanistic behaviour. You’ve also been told that from the encoding hash map H an inverse map H’ can be created that reverses the changes and gives the original input.

If you actually understood what was going on then you would understand why that would constitute sufficient proof.

But bo you do not appear to understand the terms thus you try in your mind to convert “change” into “entropy” which it is most definately not.

So what do you think that is going to do to your,

I prove theorems from time to time, it’s a little hobby. I start with a crisp statement of the theorem I’ve set out to prove, and then write mathematical and/or English language sentences which (I hope) are logically chained.

Well the answer to your original statment has been given to you via several such proofs and they all say that what you said was “untrue”.

So if you think you can come up with a proof otherwise then you will be shifting the goal posts somewhere and that would be dishonest.

MarkH July 19, 2021 2:16 AM

@Clive:

In your comment just above, you linked to an older one (your first addressed to me on the subject). On that same day, less than three hours later, you wrote:

That is you say 256 bits of entropy at the hash output, I’ve shown it will be ~50% of that. So there is a ~100% increase in your claim to account for…

To my poor gray head, “~50% of that” means about 50 percent of my claimed 256 bits of entropy, which surely is ~128 bits of entropy.

And if an increase of about 100% is necessary to reach my claimed figure, that increase must be relative to ~128 bits of entropy.

You’ve written that you did not assert that the 256-bit hash of a file with 256 bits of entropy will have approximately 128 bits of entropy. Therefore, I must have misunderstood the words quoted above.

What did “~50% of that” mean in the quoted language?

MarkH July 19, 2021 7:22 PM

Hash Entropy Update:

I’ve spent a few hours taking a good look at the proof I mentioned above, with my usual head-scratching (and consequent splinters under the fingernails).

It’s here on crypto.stackexchange, in the first answer (under the bold headline Expected entropy in the output of a random oracle).

It actually doesn’t respond directly to the page’s original question, which is about hash truncation. But it does quantify how much entropy is lost when applying an n-bit hash to a source with n bits of entropy.

The author, known on stackexchange as “fgrieu”, is a frequent contributor on cryptography questions.

As its headline says, the proof is based on a random oracle, and applies to actual hashes only to the extent that they well approximate the distributional properties of a random oracle … which in my understanding all cryptographic hashes are supposed to do.

The proof itself is pretty straightforward — starting with Shannon’s formula for entropy — though I spent plenty of time to make sure that I followed the logic, and the algebra (in the interest of compactness, I presume) makes some jumps I needed to write out in order to verify.

Its conclusion that less than one bit of entropy is lost in such a hash is based on a formula for expected n-way collisions among randomly selections, which I haven’t yet proved to myself (or found online).

Interestingly, the author used a pseudo-random number generator to tabulate experimental collision frequencies, which are presented below the proof. The results very accurately conform to the cited formula for expected n-way collisions. [No hash function was used in this tabulation, but to the extent a hash approximates an ideal hash function, similar results would be expected.]

Those interested in this question, might enjoy taking a look of fgrieu’s work.

Clive Robinson July 20, 2021 4:40 AM

@ MarkH,

the proof is based on a random oracle, and applies to actual hashes only to the extent that they well approximate the distributional properties of a random oracle … which in my understanding all cryptographic hashes are supposed to do.

The random oracle model has a problem with a continuous distribution model in that it can not go back in time, thus has a “window effect” at the edges that decays after each full itteration.

To see why as I’ve explained the hash map function H is an N-bit to N-bit mapping that is bijective for good reason, thus the first bit can “only carry forward within the maping and the last can only carry back into the N-bit mapping whilst those in the middle have greater freedom.

But you then have to consider the chain Function C which is more complicated especially when you consider it as a vector addition on GF(2) or as addition across GF(22N) as it effects what has or does not have carry forward.

You also have to consider how bits of entropy effectively cancel out as they carry down the chain.

Thus the probability curves change as you use the combined hash map H and chain addition/map function C. With a fastish attack with slowish decay and a flatish response for the rest of the time.

I suspect most are saying “but that’s obvious” –and it should be with hindsight 😉 — but youl’d be surprised how often it does not come through into equations that get written down…

But the point still holds the hash map H is,a lookup table / ROM equivalant and it’s fully determanistic. Whilst in theory it’s “random oracle” they actually are not to ensure the Avalanche Criteria holds as benificially as possible when used “within the chain mode” C which skews with time by “carrying forward effectively indefinitely via the feed back function that takes two N-bit inputs and providing an N-bit input to H.

It’s down to the designer if they use vector “bitwise” addition which does not carry or block wide addition that carries fron the LSB to MSB across the block, and what adjustmants are made.

Then there is the question of other functions for “mixing”such as multiplication or X2 BUT they all bring in “truncation” or “wrap around” issues/questions Truncation provides a nice limit on carry with time where as “wrap around” with modular addition or similar does not…

But there are other fun issues such as Bent functions, that are the relatively new kid on the block and post the documentation mentioned.

But as I keep warning “changes/confusion” and “difusion” are not “entropy” and way to many people confuse them thus get nonsense in their findings.

Oh and as I also warned avoid Shannon entropy as your model it will lead you astray as it carries quite a few implicit assumptions.

JonKnowsNothing July 23, 2021 12:27 AM

@Clive, MarkH, SpaceLifeForm, All

re: Universal Decryption Key: Consideration of methods

One of the many recent stories of ransomware, malware, stateware, and cyberwarware, one company was able to get a Universal Master Decryptor for REvil ransomware.

What I find curious is that this is a “Master Key” purported to work across 1,500 company networks. That’s a big hole in someone’s OTP calculations.

So, how does the math work as it seems evident that they either deliberately hooked their own key or they botched up the encryption coding. It’s precisely the condition no one who encrypts their code for security purposes wants to find out after the fact, that 1,500 other companies have the same decryption key.

===

ht tps://arstechnica.com/gadgets/2021/07/kaseya-gets-master-decryptor-to-help-customers-still-suffering-from-revil-attack/

  • Kaseya—the remote management software seller at the center of a ransomware operation that struck as many as 1,500 downstream networks—said it has obtained a decryptor that should successfully restore data encrypted during the Fourth of July weekend attack.
  • We obtained the decryptor yesterday from a trusted third party and have been using it successfully on affected customers …
    (url fractured to prevent autorun)

Clive Robinson July 23, 2021 5:14 AM

@ JonKnowsNothing, ALL,

What I find curious is that this is a “Master Key” purported to work across 1,500 company networks. That’s a big hole in someone’s OTP calculations.

Not of necessity… It rather depends on how the bit of the ransomware industry you come into contact wirh works…

In some countries paying ransom of any kind is a serious criminal offense. The reason being the politicians figure that if they stop the money supply then the ransom crimes will stop…

Which is typical political thinking and even the acient Roman Cisero recognised that “Politicians were not born they were excreated” so what sort of thinking would you expect from a turd rather than a human?…

All that realy happens with such pain in the arse thinking is a faux market place of “middlemen” who double, tripple or quadruple the price one way or another…

So your computer gets jacked and your files crypted / deep-sixed, tough your Government says pay and go to jail for six-ten years of hard labour, sequestration of all assets, and the sale of your first born into slavery etc etc… Unless of course they don’t like you for some reason or it’s a slow news day… in which case also expect to be flogged around town on a hurdle, hung by your neck untill nearly dead then eviserated as you come too, to have your ripped out entrails thrown on a brazier infront of you and then be ripped asunder by horses whipped up to get the best of screams out of you for the crowd to enjoy.

So what to do, well you visit a data recovery company that is primarily based abroad but has local offices. They kind of go through the motions of producing reports etc at your expense before sending the files of to head office in Usbeggeryouto or where ever where a local lad has a contact who can sort out the encryption problem. But it costs about three times what it would have cost if you’d just payed the ransom…

That price hike is the tax your politicians have inflicted on you because the reality is the “local lad’s contact” is “one of the ransomware gang”…

But some ransomware guys are saying why should we loose out on “free money” so they set up their own data recovery firm with their own “local lad with a contact” thus not just get the free loot but money launder it as well through the data recovery firm…

So yeh some of these data recovery firms know exactly what the encryption keys for two reasons,

1, They have read about cryptovirology specifically kleptocryptography, thus have a secret backdoor.

2, They know exactly who has been ransomed and for how much, and already know they can make the required crypto key.

The problem is as with Dual EC DRNG you can not prove step 1 so you get stuffed one way or another.

It’s a very profitable game and the politicians just love it because it makes,them look tough on crime when in fact they are just another turd getting flushed down the cloaca of history to an ignominious ending.

MarkH July 26, 2021 5:55 PM

Hash Entropy Update 2

Previously, I estimated that when an input bit sequence has 256 bits with maximum entropy (all other bits being known in advance), an ideal 256-bit hash of that sequence would lose 1.6 bits of guessing entropy as a consequence of collisions.

More recently, I cited a proof that the Shannon entropy for this case would be 0.83 bits less in the hash than in the input.

The estimate I made above for guessing entropy was based on a crude approximation for the distribution of collisions; since then, I’ve learned more about expected collision frequencies.

On that basis, I’ve computed that an optimal guessing strategy (if such were feasible) will go through 0.1982 of the hash output space in 50% of cases. In other words, that is the median guessing attack cost.

The mean cost is actually a little worse. For a 256-bit hash, 57-way (!) collisions are reasonably probable, but only a little more than 1/4 of possible inputs (for the 256-bit full-entropy case) map to triple (or greater) collisions; the mean guessing attack goes through 0.2381 of the output space.

Compared to 0.5 mean for simple exhaustive search, the guessing entropy loss is 1.07 bits.

The discrepancy between 0.83 bits and 1.07 bits is accounted for by the different definitions of Shannon and guessing entropy.

MarkH August 10, 2021 6:18 PM

@Freezing_in_Brazil et al:

In case anybody is still interested in the effect of hashing on entropy, I’m preparing to run some experimental measurements.

The straightforward approach — accessible to people like me, who lack sophistication in statistical analysis — is enumeration, which might also be called exhaustion.

Such experiments at full hash width are literally impossible for cryptographic hashes; the required computational resources don’t exist.

So my plan is to run experiments at small scales, which of course will not correspond to actual cryptography applications. However, I’m confident that the mathematical results can be extrapolated to real-world hash sizes.

Experiments can provide an opportunity to test opinion and theory against demonstrated fact.

Based on my analysis so far, storage capacity is more limiting than time of computation. I anticipate that 40 bits will be my practical upper limit (in other words, feeding inputs with up to 40 bits of entropy into a 40-bit hash function).

I’ve kindly been granted access to a rack-mount server (belonging to two young software developers) with capacities greatly exceeding those of my ancient computers.

I will post results here, as they come in.

Weather August 10, 2021 11:26 PM

@markh
I’ll be interested in what you find I’m type of doing the same, I use 4^256 but 5^256 would make a signal stand out more.

Weather August 10, 2021 11:41 PM

@markh others
My program generates a very simple hash of a string of bytes(32) ,I’ve only had one collision so far and that was input about the same length +5, and a spread of the first 118 chars of ASCII chart, all the others were urandom or rand in built C function with srand(v).
Sha2 finds it hard with a short input length, with about the same number(place 1+2+3+4+5=number). The way its design is not trying to duplicate input char to output char, makes a signal, so strength becomes weakness.

JonKnowsNothing August 10, 2021 11:51 PM

@MarkH @All

re: The straightforward approach — accessible to people like me, who lack sophistication in statistical analysis — is enumeration, which might also be called exhaustion.

Cheers for Straightforward approaches!

Personally, I have trouble counting past my fingers… occasionally a knuckle count happens but not often anymore.

I will look forward to any and all of your finds and of course the commentary regarding the results.

Weather August 11, 2021 2:41 AM

@markh,john all
I can post the C code, to this blog, but mod,Bruce have to OK it, its probably longer than 32000 , plus if I post it, I don’t want it deleted.

Clive Robinson August 11, 2021 5:40 AM

@ Weather,

My program generates a very simple hash of a string of bytes(32)

Is that 32 8bit values giving 256bits.

Or 32 7bit ASCII chars giving 224bits?

You would expect one collision for every square root of those bit numbers. So 2^128 and 2^112 random test samples respectively.

Clive Robinson August 11, 2021 7:14 AM

@ JonKnowsNothing,

Personally, I have trouble counting past my fingers… occasionally a knuckle count happens but not often anymore.

Make that two,of us 😉

The old hands are decidedly worse on comming out of hospital, not sure why but the rest of me aches like I’ve just run a half marathon so I’m assuming it’s “change in meds” related and will settle down.

The down side is of course it makes you quite nervous at the top of a flight of steps… The odd bump or three on the way down would be a painfull nuisance. But the real worry is in forming a messy pile at the bottom… It is looked upon by so many these days as either an annoyance to their day or worse a photo op with their mobile phone, I’m not sure which is worse…

I had it yesterday coming home from another hospital scan. At a railway station that’s “original” and been there for a century or so, the stairs are a bit narrow. So there I am clumping down on the sticks and some rude people making loud comment and worse pushing past because they thought 30secs was going to make some big difference in their life…

Not what you want when you do not have a firm grip on things physically…

If I was alowed to drive, I think I might go for a Chieftain Tank or similar, bit heavy on the gas, but people tend not to argue with it as a friend who’s got a couple of tanks reminds me from time to time when they take them to shows and film sets for some reason priority is not a problem 😉

MarkH August 22, 2021 5:20 PM

@Freezing_in_Brazil, JonKnowsNothing, et al:

I’ve been spending an unreasonable amount of time writing and testing software for hash entropy measurements, and anticipate posting some results in this coming week.

Inevitably, the project includes a variety of small problems, some of which may be of interest to programmers.

Approximately, the number of steps required is exponential in the number of bits of input entropy, and the storage required is exponential in the number of bits (width) of the hash.

Because the number of steps for larger cases would be fairly prodigious, I intended from the start to use threading to take advantage of the power of symmetric multiple processor machines.

There’s only one big data structure, an array of counts of how many times any particular hash value was generated — I call this the histogram. It is indexed by the hash output (or hash code, as I sometimes call it).

Although this situation is “dirt simple”, making it work in multi-processing is harder than I expected. I’ve tried a variety of schemes to enforce atomicity of histogram updates, and have not yet succeeded in efficiently harnessing more than 3 CPU cores, because histogram contention is so costly. [In practice, I used 4 cores, but the completion time improvement over 3 cores is very slight.]

The good news is that histogram updates are almost atomic1, so I can disable mutual exclusion and get the benefit of as many cores as are available.

I benchmarked a 35-bit test using 8 cores on the server in about eighteen minutes [for now, I’m testing at “full width” such that the numbers of input bits and hash bits are equal]. On the available hardware, with the histogram in RAM, my hash width limit will be 36 bits.

Putting the histogram on a filesystem would enable expansion to perhaps 40 bits, but I anticipate the execution time penalty with anxiety.

  1. I looked at the gcc assembly output, and the increment action itself seems to be atomic. I’ve noticed very slight discrepancies between runs (literally plus or minus one or two counts out of billions; these minute errors have no meaningful effect on entropy measures). Though I haven’t analyzed the cause, I guess that the atomic increment gets executed on the wrong index because the loading of the address register is of course a separate instruction. 

Clive Robinson August 22, 2021 5:44 PM

@ ALL,

Did anyone else get the joke behind the article title,

When an Eel Climbs a Ramp to Eat Squid From a Clamp, That’s a Moray.

If you’r not sure imagine Dean Martin singing it…

Or the line,

“When the moon hits your eye like a big pizza pie, that’s amore”

https://m.youtube.com/watch?v=69O4PXzAQ5Y

MarkH August 22, 2021 7:20 PM

@Clive:

I eventually got the joke, though it was probably a week or more after I first saw the headline.

You, Bruce and I are near contemporaries. My expectation is that younger folks (and those not well-versed in English language pop culture) were most unlikely to catch it.

Weather August 22, 2021 10:06 PM

@markh all
If you haven’t done parrellel programming have a look at openmp.org, its a header that takes care of for,do loop pp and sections.

MarkH August 23, 2021 7:54 PM

Some Meta on Hash Functions

When I first set out to do entropy experiments, I compiled a short list of possible hash functions to use, mainly because I was worried about test execution time.

As my development process has proceeded, the time consumed by hashing seems not to be such an important factor. However, for the sake of Germanic thoroughness, I decided to check out some alternatives.

Most of my testing so far has used MD5, which is (a) an actual cryptographic hash (though long deprecated for signing and authentication), and (b) pretty fast.

If you’ve been following along, you know I don’t intend to do experiments at more than 40 bits of hash width. So what do I do with a 128-bit hash function?

According to my studies, any chosen subset of bits from the outputs of an ideal hash function forms a smaller ideal hash function. Of course, neither MD5 nor any other function is an ideal hash function — but all crypto hash functions are designed to approximate ideal hash functions (I’ll be writing more about that later).

Although I haven’t been running conventional tests for randomness, I’ve been using a proxy which is perfect for my purposes. While working on predictions of what should be the entropy in the hash of an input with specified entropy, I needed to learn about hash collisions; I’ve written some about that above.

The more nearly the distribution of hash function outputs matches a random distribution, the more nearly the incidence of collisions will follow a simple formula … so I’m using agreement with that formula as a stand-in for “randomness” of the hash function’s output distribution.

As I expected, MD5 scores well by this standard; truncated to 32 bits, incidence of collisions matches the formula with less than 0.1% error up to 5-way collisions, and less than 1% error up to 9-way collisions.

What was educational for me, is that the non-crypto hashes I tested have completely different distributions. It makes sense now that I see it, but I never thought about it before (non-crypto hashes are used for allocating entries more evenly among indexed tables, and the like).

========================

Table hashes (as I will call them) are intended to maximize dispersion of outputs, whereas crypto hashes are intended to approximate a random distribution.

The distinction is a little subtle. An output pattern scattered as though at random should be widely dispersed, no?

But in fact, crypto hashes produces lots of collisions. At “full width” — literally, hashing numbers 0 through 2^n – 1 where n is the hash width — more than 63% of inputs collide with other inputs (yielding the same hash output).

In contrast, the whole point of table hashes is to minimize collisions. They are not loaded down with the many security constraints which apply to crypto hashes.

In my tests, SuperFastHash (a 32-bit table hash) was remarkably “flat”, with less than 9% of inputs resulting in collisions at full-width, and output entropy about 31.9 bits.

However, I was astonished by XXH32, which at full width produced no collisions whatsoever, a perfect bijective mapping. It was strange to see my entropy measures at 32.000 bits …

========================

Another thing I learned along the way, is that a low-bits truncation of a maximum-dispersion hash looks like a random distribution hash, with excellent accuracy. Up to about 2/3 of the hash width, the hash entropy agrees with random distribution predictions to better than four decimal places.

It happens that XXH32 has a big brother, XXH64; so within my range of planned tests, a truncation of XXH64 will well approximate the statistical properties of a crypto hash.

XXH64 runs in less than 2/3 the time of MD5 in my tests.

MarkH August 24, 2021 10:11 PM

@Weather,

Thanks for the pointer to openmp.org

I first learned about them only a week or so ago, and made the guesses that (a) it would take me a while to figure out how to use their tech, and (b) it’s not very likely to help.

I’ve designed concurrent software since the 80s so I suppose that I know how to do it (famous last words, as we say). My judgment is that the resource contention problem is inherent in the computation I set out to do.

In a simpler context, I would just disable interrupts during the critical section, which probably is shorter than a single-gate propagation delay in the 7400 series TTL I used to design with. With modern CPUs and operating systems, disabling interrupts to enforce atomicity doesn’t seem to be an option.

Something I’d like to investigate is the use of a spin-lock rather than an OS mutex, because the overhead of process switching is supposed to be large. If it runs quicker, I don’t care that the cores are “lit up” busy-waiting: 100% CPU utilization is the intent.

MarkH August 24, 2021 11:02 PM

Freezing_in_Brazil, JonKnowsNothing, et al:

Here’s an example of output from a 32-bit hash over a set of inputs with 32 bits of entropy. In this case, the inputs to the hash function are simply the integers 0 through 2^32 – 1.

It’s a scenario that would not make sense to do in any practical application, but it serves the purposes of:

• demonstration
• illustration
• establishing a basis for comparison

MD5, Least Significant Word:
freq count gamma ratio
0 1580028551 19.90
1 1580051649 16.17 1.0
2 790000617 15.74 2.0
3 263321382 13.92 3.0
4 65836283 15.25 4.0
5 13174250 10.81 5.0
6 2197846 9.35 6.0
7 312357 8.10 7.0
8 39522 6.87 7.9
9 4312 6.69 9.2
10 478 3.35 9.0
11 48 2.23 10.0
13 1 -1.56 48.0
Entropy in bits:
Shannon guessing(median) guessing(mean)
31.173 30.665 30.930

  1. The output of the MD5 hash function is 128 bits; the experiment used the least-significant 32-bit “word” of the hash output as a 32-bit hash function.
  2. I slightly modified MD5 for speed: for security of signing, the specification includes a step of appending the input bit count before making a final hash; because I’m using it as a conditioning function, this adds no value, and I omit it.
  3. Most of the data presented above is a “histogram of the histogram” showing hash function outputs by frequency of occurrence. For example, 1,580,051,649 distinct hash output values occurred exactly once, and there were 478 ten-way collisions (corresponding to 4,780 distinct inputs). All the measures of entropy are derived from this frequency table.
  4. The frequency = 0 case (first row of numbers) shows how many of the 2^32 possible outputs did not occur in the experiment.
  5. The “gamma” column is my made-up terminology for the goodness of fit to the formula predicting how many collisions would occur for a perfectly random distribution of outputs, measured in bits (log base 2). So the 19.9 value for frequency zero means that the number of “missed codes” agrees with the formula to almost 20 bits.
  6. The formula for probability of j-way collisions is 1 / (e j!); because of the factorial in the denominator, the number of expected 3-way collisions is 1/3 of the number of 2-way collisions, and so on. The “ratio” column presents the number of hash outputs counted in the previous row, divided by the count of hash outputs in the current row. It can be seen that up to 9-way collisions, the ratios agree well with the prediction.
  7. The last rows of the experimental output give three distinct measures of entropy; the Shannon entropy agrees accurately with that predicted on the basis of the collision probability formula.

These experimental results support two principal hypotheses:

• The 32-bit hash function synthesized by selecting 32 bits from MD5 output exhibits collisions (and therefore entropy loss) as expected for a random distribution.

• When the entropy of the input equals the hash width, about 1 bit of that entropy is lost in the hashing process.

Please note well that I offer the foregoing as a morsel of empirical confirmation. I do not claim it as “proof”.

MarkH August 24, 2021 11:07 PM

PS:

Drat! I used pre tags, which in “preview” showed the experimental output in a fixed-pitch font.

That formatting was obviously thrown away by the commenting forum software …

If anybody knows a way to dependably get fixed-pitch fonts displayed here, kindly tell me how!

Weather August 25, 2021 2:09 AM

@markh (
Mutex and antmic, I’ve seen those instruction calls, I did think you know what you we’re talking about. But intrupts arnt used, per say its more absolute address to virtual.

MarkH August 25, 2021 3:37 AM

Trying again, as a “fenced code block” instead of using the pre tag (see my last 2 comments above):

MD5, Least Significant Word:
freq count gamma ratio
0 1580028551 19.90
1 1580051649 16.17 1.0
2 790000617 15.74 2.0
3 263321382 13.92 3.0
4 65836283 15.25 4.0
5 13174250 10.81 5.0
6 2197846 9.35 6.0
7 312357 8.10 7.0
8 39522 6.87 7.9
9 4312 6.69 9.2
10 478 3.35 9.0
11 48 2.23 10.0
13 1 -1.56
Entropy in bits:
Shannon guessing(median) guessing(mean)
31.173 30.665 30.930

Because preview is unreliable, I’ll find out when this version is published …

MarkH August 25, 2021 3:55 AM

Aargh … in the preview, all of my spacing was preserved. In publication, leading spaces are suppressed, and all multiple spaces are replaced with one space.

I’ll try to preserve spacing using two fairly brutal techniques:

MD5, Least_Significant_Word:
freq_________count______gamma___ratio
_0_______1580028551_____19.90
_1_______1580051649_____16.17____1.0
_2________790000617_____15.74____2.0
_3________263321382_____13.92____3.0
_4_________65836283_____15.25____4.0
_5_________13174250_____10.81____5.0
_6__________2197846______9.35____6.0
_7___________312357______8.10____7.0
_8____________39522______6.87____7.9
_9_____________4312______6.69____9.2
10______________478______3.35____9.0
11_______________48______2.23___10.0
13________________1_____-1.56___48.0
Entropy in bits:
Shannon guessing(median) guessing(mean)
31.173________30.665________30.930

MD5, Least Significant Word:
freq         count      gamma   ratio
 0       1580028551     19.90
 1       1580051649     16.17    1.0
 2        790000617     15.74    2.0
 3        263321382     13.92    3.0
 4         65836283     15.25    4.0
 5         13174250     10.81    5.0
 6          2197846      9.35    6.0
 7           312357      8.10    7.0
 8            39522      6.87    7.9
 9             4312      6.69    9.2
10              478      3.35    9.0
11               48      2.23   10.0
13                1     -1.56   48.0
Entropy in bits:
Shannon guessing(median) guessing(mean)
31.173        30.665        30.930

MarkH August 25, 2021 4:32 AM

For the record, I finally got the format I needed by:

(a) delimiting the text by lines consisting only of three tilde (~) characters, to make a fenced code block displayed in fixed-pitch font; and

(b) replacing every space with an html nbsp (non-breaking space) entity, to stop the suppression or condensation of spaces.

To borrow from Erik Satie, the process of figuring this out was much more pleasurable than getting bitten by a monkey.

MarkH August 25, 2021 2:02 PM

.
A Subset of Digits from a Random Distribution Has a Random Distribution

I mentioned this interesting property in a previous comment. The data I showed above are from the least significant word of MD5 hashes1.

For comparison, here’s the same computation based on each of the other three words:

MD5, Second to Least Significant Word:
freq         count      gamma   ratio
 0       1580066732     15.40
 1       1579968188     14.64    1.0
 2        790034980     15.28    2.0
 3        263336271     16.94    3.0
 4         65844241     12.74    4.0
 5         13165291     12.98    5.0
 6          2194130     12.59    6.0
 7           313145      9.79    7.0
 8            39297      8.48    8.0
 9             4521      4.71    8.7
10              451      4.80   10.0
11               46      2.62    9.8
12                3      3.47   15.3
Entropy in bits:
Shannon  guessing(median) guessing(mean)
31.173         30.665        30.930

MD5, Second to Most Significant Word:
freq         count      gamma   ratio
 0       1580042004     17.03
 1       1580031439     20.25    1.0
 2        789993593     15.17    2.0
 3        263333919     15.86    3.0
 4         65845409     12.57    4.0
 5         13168367     13.15    5.0
 6          2194510     16.50    6.0
 7           313577     11.96    7.0
 8            39500      6.97    7.9
 9             4489      5.01    8.8
10              440      6.57   10.2
11               46      2.62    9.6
12                3      3.47   15.3
Entropy in bits:
Shannon  guessing(median) guessing(mean)
31.173         30.665        30.930

MD5, Most Significant Word:
freq         count      gamma   ratio
 0       1580078756     14.99
 1       1579971311     14.71    1.0
 2        790008567     16.89    2.0
 3        263340988     16.61    3.0
 4         65842592     13.01    4.0
 5         13171984     11.34    5.0
 6          2195509     11.07    6.0
 7           313517     14.01    7.0
 8            39243      9.46    8.0
 9             4350     10.04    9.0
10              442      6.05    9.8
11               35      3.11   12.6
12                1      0.52   35.0
13                1     -1.56    1.0
Entropy in bits:
Shannon  guessing(median) guessing(mean)
31.173         30.665        30.930

The frequency histograms differ in detail, but the general patterns match, and the entropy measures agree to 5 decimal places.

  1. As it happens, I’m using these 32-bit words “byte reversed” — the GNU MD5 functions copy out the hash in “big endian”; taking a word subset on Intel architecture has the effect of reversing the order. I don’t bother to correct this, because permutations of the bits will have negligible effect on the statistical distribution. 

name.withheld.for.obvious.reasons August 25, 2021 2:49 PM

@MarkH
Here’s hoping the monkey bite wasn’t that deep or rabid.

Meeting where you are at, glad to hear that resolution and satisfaction are coterminous with your effort. Will of course leverage your hard work employing your hard fought formatting victory, a commensurate theorem, the lemmas are short and sweet–proofs not so much.

To add to your distribution space or vector (termed in either a number space w/kernel, or as classic vector) sequence unity (sums, co-product) within the space can be analyzed. Kind of like the size of the key space, data, and transform that is serialized demonstrates a randomness across the full cryptographic stream/space. I have this formalized in my hard notes, need to pull that out.

Irrespective of the clarity of the above paragraph, are you attempting to do keyspace discovery or cryptanalysis via multiple functional and computational methods, just curious.

MarkH August 25, 2021 4:30 PM

@name.withheld.for.obvious.reasons:

are you attempting to do keyspace discovery or cryptanalysis via multiple functional and computational methods?

Simply, I’m exploring the relationship between the entropy of a bit sequence, and a hash thereof.

SpaceLifeForm August 25, 2021 5:54 PM

@ MarkH, name.withheld.for.obvious.reasons, Clive

Good report.

Yes, you definitely want dispersion if dealing with a caching algorithm. You definitely do not want crypto hash for that. You want it to be as fast as possible.

Throw blake2b into your analysis. Curious.

I’ve dealt with a cache mechanism that had to manage data accessed via code written in Forth. I improved the cache. While I got a major speedup, ultimately, there was a bottleneck. No matter what hash algorithm I threw at it, there was still a bottleneck. It turned out that it was a major design problem. There was too many ‘hot blocks’ that had contention and had to be locked to prevent corruption. The locking mechanism required IPC. This is major performance problem when the locking mechanism crosses processors. Tandem.

The application could never, ever, scale.

MarkH August 25, 2021 6:39 PM

@SpaceLifeForm:

You raised an important point.

Much is made of the difficulties of programming for concurrency, and with good reason — it’s easy to make a subtle mistake leading to deadlock, or invalid results.

But assessing the suitability of problems to parallel computation is another kind of challenge, and perhaps a deeper one.

I failed to think the problem through sufficiently, and felt confident that my frequency tabulation was ripe for parallel computation.

I’m lucky that running thread-unsafe gives me nearly perfect results.

Just yesterday, as a check I ran a trial thread-safe with 8 threads (on the 8-core rackmount, out of habit). I saw from my progress indicator that it was creeping like a snail … so I aborted the run, and restarted for 4 cores. Dramatically faster.

MarkH August 28, 2021 12:15 AM

.
Entropy Lost in Hashing Converges to Certain Constants

The table below presents entropy measurements for outputs of a hash function of specified width, when presented with inputs having entropy equal to the width (as described in previous comments). The hash function has a crypto-type (randomized) output distribution. Mean guessing entropy is omitted for the smallest widths, because the measurement code I’m using is not compatible with such small data sets.

bits  Shannon guess(med.) guess(mean)
====  ======= =========== ===========
  8     7.115     6.644
  9     8.137     7.600
 10     9.134     8.592
 11    10.210     9.704
 12    11.196    10.702     10.956
 13    12.169    11.664     11.926
 14    13.172    12.665     12.930
 15    14.170    13.659     13.926
 16    15.174    14.665     14.930
 17    16.176    15.667     15.933
 18    17.174    16.666     16.931
 19    18.174    17.666     17.931
 20    19.172    18.664     18.929
 21    20.173    19.665     19.930
 22    21.172    20.665     20.929
 23    22.173    21.665     21.930
 24    23.173    22.665     22.930
 25    24.173    23.665     23.930
 26    25.173    24.665     24.930
 27    26.173    25.665     25.930
 28    27.173    26.665     26.930
 29    28.173    27.665     27.930
 30    29.173    28.665     28.930
 31    30.173    29.665     29.930
 32    31.173    30.665     30.930
 

For widths greater than 22 bits, the entropy losses for each type of entropy are fixed to four significant decimal digits. The entropy loss measures are pretty near their asymptotic values by 13 bits.

The Shannon entropy loss (in bits) for the “full-width hash” condition is an infinite series listed as A193424 in the Online Encyclopedia of Integer Sequences.

To find an analytic expression for the losses of guessing entropy would need mathematical skill beyond mine. The median guessing entropy loss is very simple to derive numerically, and computing the mean guessing entropy loss is also not difficult.

All three of the entropy loss constants derive from the expected collision frequencies for randomized distributions.

SpaceLifeForm August 28, 2021 4:54 PM

@ MarkH, name.withheld.for.obvious.reasons

Very interesting numbers.

Are you using Linux?

Can you try blake2b instead of MD5?

We know MD5 is not trustable.

It think you got lucky when using 4 vs 8. That the processes magically got spread.

Seriously, try more tests. Vary number of threads with or without SMT enabled.

Specifically, you want to test with 4,5,6,7,8 with SMT enabled, and then with SMT disabled. And time the tests.

I think you will be surprised.

I am pretty sure you encountered LIVELOCK at the microcode level.

MarkH August 28, 2021 9:27 PM

@SpaceLifeForm:

Are you using Linux?

Of course!

Can you try blake2b instead of MD5?

Maybe at some point, but it’s rather time consuming to add a new hash function, and I don’t expect it will be useful.

We know MD5 is not trustable.

I’m glad you brought this up because so many people share the same misunderstanding. MD5 is very insecure for applications such as signing.

However, as I have written multiple times — including above on this thread! — these properties do not affect its usefulness as a conditioning function to yield high-entropy-per-bit output from an extended input sequence with relatively low entropy per bit.

The particular vulnerability of MD5 to collision attacks might reduce output entropy microscopically, but surely not enough to measure; it does not impair the usefulness of MD5 as a conditioning function.

To put this scheme into practice, any of us would use a more modern hash (remember, Freezing_in_Brazil proposed SHA256). I am confident that anything MD5 can do in my tests, a newer approved hash will do at least as well if not better.

For my experiments, the old & fast function is suitable. But I anticipate that the conclusions are (and will continue to be) general: they do NOT depend on the specific function used.

I think you got lucky when using 4 vs 8. That the processes magically got spread.

I’m not quite clear what you mean — in my testing, when I run threadsafe, the incremental gain from 3 to 4 cores is very small. It doesn’t surprise me at all that the slope of the running-time curve reverses with more parallelism: the cores spend most of their time in mutex wait.

Try a little thought experiment: imagine a billion-core computer with every core running a thread requiring exclusive access to the same shared memory. What would you expect to happen?

Vary number of threads with or without SMT enabled.

My cheap little computer does not have SMT/hyperthreading capability. As I was testing various mechanisms to speed thread-safe operation, I tested each of them with at different thread counts (the number of threads is a command line argument). Nothing I tried gave any benefit beyond 4 cores.

I am pretty sure you encountered LIVELOCK at the microcode level.

I can’t prove that this isn’t so, but I’m reasonably certain of it. It’s not a question of conflicting operations synchronized within some fraction of a nanosecond.

At present, I have each thread running 128K hashes and collecting the outputs in an array. Each batch of outputs is then applied to the histogram en masse — there are many milliseconds between mutex calls.

I even tried pseudorandomly varying the number of results batched before updating the histogram to prevent the threads from “lock stepping” … it just slowed things down a little.

MarkH August 28, 2021 10:07 PM

.
A Bit of Philosophy

(kindly forgive the pun)

I previously mentioned that in cryptography, “entropy” is usually applied as a measure of uncertainty. For this reason, it is not a strictly objective measured, but rather depends on the state of knowledge of some observer(s).

A specific state of an information system, from the perspective of three distinct observers, may have three different measures of entropy. In cryptography, entropy is typically a measure of what some presumed adversary is unable to know (within some set of assumptions about what the adversary can do).

It came to me today that there’s another dimension to the observer-dependence of entropy.

As originally applied to information, entropy is defined in terms of the probabilities of each of the possible outcomes of some process of selection.

I’ve made many comments in this thread concerning the entropy of hash function outputs.

For a thought experiment — as an extremal case — imagine that a hash function is used in an application where only two inputs are possible, and that they are equally probable.

Assuming that those inputs don’t (by some stroke of amazing bad luck) collide, only two hash outputs can occur in the imaginary application.

Because (within the constraints of this application) only those two outputs are possible — however long and gaudy their representation might be — literal application of the definition of Shannon entropy, would measure it at one bit. The probability of each of the two codes (for any given selection) is exactly one half.

========================

Now for the tricky part … suppose that the hash is SHA256. If I’m looking at a hash output from the application — but I don’t know its constraints — then I must suppose that any number from 0 to 2^256 – 1 is a possible hash value.

Applying the literal definition of Shannon entropy to the two cases, I would compute the entropy as approximately 4.4 times 10^-75 bits.

The difference here, is knowledge of what the possible selection outcomes are.

From a security perspective, it’s mandatory to apply the first measure: if an adversary doesn’t already know that there are only two possible outputs, it must be presumed that he will discover this soon enough.

Clive Robinson August 29, 2021 3:05 AM

@ MarkH,

For a thought experiment — as an extremal case — imagine that a hash function is used in an application where only two inputs are possible, and that they are equally probable.

If I understand what you have written this is a fairly normal condition.

Regard the hash as just a crypto function which is what it is. What you are describing is the case of a user pressing Y or N in response to a message on a terminal. It maters not how big or complex the output is from the crypto function the result is still the same for each Y or each N thus it is a “simple substitution cipher” also called “being used in ECB mode”[1] (which is generally considered a No No in security advise).

I’ve mentioned this quite a few times in the past on this blog.

There are two basic solutions,

1, Use a different mode.
2, Pad with a “nonce”.

Where a nonce is a “number used once” and if the rules are followed turns it into a sort of “poor mans OTP”. However the use of a “nonce” is not recommend as programers who implement crypto have a very bad habit of not following the rules.

It’s also why I pointed out to you originally why the “true entropy” of a crypto stream is that of the “plaintext” message not that of the “ciphertext” an observer sees.

As I’ve pointed out as an observer of a black box output you can not tell by the tests we currently have, if what is in the black box is a “True Random Number Generator”(TRNG) or a determanistic algorithm of sufficient complexity such as AES256 in CTR mode.

Which is why the likes of all those alledged TRNG’s on Intel and similar chips hidden behind “magic pixie dust” hash functions are highly suspect at best. As a rule of thumb if a TRNG designer does not provide “front panel access” to the actual entropy source output, don’t use it… Because at the very least you can not test it to see if it is functioning…

[1] http://cryptowiki.net/index.php?title=Electronic_Code_Book_(ECB)

MarkH August 30, 2021 12:19 PM

.
Entropy Lost in Hashing Increases as Input Entropy Approaches the Hash Width

I suppose this is among the most straightforward truths about the effect of hashing on input entropy.

The entropy of an n-bit hash output can never exceed n, so when input entropy H > n, entropy must obviously be lost in the hashing process.

On the other end of the spectrum, if H is much smaller than n, I expect the loss due to hash collisions to be less, so that the hash entropy is very close to H.

I’ve shown above that in the “full-width” scenario of H = n, the loss of Shannon entropy is about 0.83 bits.

To illustrate the “transfer function” of input entropy to hash output entropy, I set n = 28 bits (if it were any larger, the “excess entropy” runs would get very slow).

Column headings (all measures in bits):
H_in – input entropy
Delta – H_in minus hash width
Shannon – Shannon entropy with probabilities p based on the size of the set of inputs
G_median – guessing entropy based on the median cost of a guessing attack
G_mean – guessing entropy based on the mean cost of a guessing attack

H_in Delta Shannon  G_median  G_mean
 12   -16   12.000   12.000   12.000
 13   -15   13.000   13.000   13.000
 14   -14   14.000   14.000   14.000
 15   -13   15.000   15.000   15.000
 16   -12   16.000   16.000   16.000
 17   -11   17.000   16.999   16.999
 18   -10   17.999   17.998   17.998
 19    -9   18.998   18.997   18.997
 20    -8   19.996   19.994   19.994
 21    -7   20.992   20.988   20.989
 22    -6   21.984   21.977   21.978
 23    -5   22.969   22.955   22.955
 24    -4   23.938   23.909   23.912
 25    -3   24.878   24.816   24.827
 26    -2   25.762   25.622   25.667
 27    -1   26.547   26.199   26.384
 28     0   27.173   26.665   26.930
 29    +1   27.589   27.060   27.297
 30    +2   27.808   27.347   27.531
 31    +3   27.908   27.548   27.682
 32    +4   27.954   27.688   27.782

For small input entropies, the hash entropy is practically equal to the input entropy. As input entropy increases, the entropy measures asymptotically approach the hash width (recall that this is 28 bits in these tests).

Here are the same data, with the hash measures replaced by their difference from the input entropy — in other words, all of the measures in the three right-hand columns below show how many bits of entropy were lost in hashing.

H_in Delta Shannon  G_median  G_mean
 12   -16    0.000    0.000    0.000
 13   -15    0.000    0.000    0.000
 14   -14    0.000    0.000    0.000
 15   -13    0.000    0.000    0.000
 16   -12    0.000    0.000    0.000
 17   -11    0.000    0.001    0.001
 18   -10    0.001    0.002    0.002
 19    -9    0.002    0.003    0.003
 20    -8    0.004    0.006    0.006
 21    -7    0.008    0.012    0.011
 22    -6    0.016    0.023    0.022
 23    -5    0.031    0.045    0.045
 24    -4    0.062    0.091    0.088
 25    -3    0.122    0.184    0.173
 26    -2    0.238    0.378    0.333
 27    -1    0.453    0.801    0.616
 28     0    0.827    1.335    1.070
 29    +1    1.411    1.940    1.703
 30    +2    2.192    2.653    2.469
 31    +3    3.092    3.452    3.318
 32    +4    4.046    4.312    4.218
 33    +5    5.023    5.216    5.151

That the losses are essentially zero in the first few rows is unsurprising; there were no hash collisions until H_in reached 16 bits.

The distribution of collisions for the “full-width” case of H_in = 28 bits is similar to data I showed in preceding comments.

The histograms have a whole new personality, when the input entropy exceeds the hash width. For 33 bits of input entropy (Delta of +5 bits), only 6 of the 268,435,456 possible hash outputs were missed, and there was a single 70-way collision.

One more way to see the relationship, showing the Shannon measure as a function of Delta:

Delta n-3       n-2       n-1        n
       |         |         |         |
 -3   *
 -2            *
 -1                   *
  0                          *
 +1                              *
 +2                                *
 +3                                 *
 +4                                  *
 +5                                  *

At first, output entropy is a nearly linear function of input entropy, converging to the hash width around Delta = 0.

MarkH August 30, 2021 4:58 PM

@JonKnowsNothing:

Your acknowledgement — and even more, your attention to this work — mean a lot.

I saw that the “graph” line-wrapped on my phone, destroying the effect. Here’s a narrower version:

 Η: n-3       n-2       n-1        n
     |         |         |         |
  Δ  |         |         |         |
 -3 *          |         |         |
 -2          *           |         |
 -1                 *    |         |
  0                        *       |
 +1                            *   |
 +2                              * |
 +3                               *
 +4                                *
 +5                                *

and a swapped-axes version which goes a little deeper:

 Δ: -5 -4 -3 -2 -1  0 +1 +2 +3 +4 +5
 Η
 n                           *  *  *
                          *
                       *
                    *
n-1

                 *

n-2
              *

n-3        *

n-4     *

n-5  *
 Δ: -5 -4 -3 -2 -1  0 +1 +2 +3 +4 +5

I noticed to my annoyance that in the first table above, I somehow managed to lop off the highest entropy run (which was, of course, the most time consuming) … here’s the table with all the rows:

H_in Delta Shannon  G_median  G_mean
 12   -16   12.000   12.000   12.000
 13   -15   13.000   13.000   13.000
 14   -14   14.000   14.000   14.000
 15   -13   15.000   15.000   15.000
 16   -12   16.000   16.000   16.000
 17   -11   17.000   16.999   16.999
 18   -10   17.999   17.998   17.998
 19    -9   18.998   18.997   18.997
 20    -8   19.996   19.994   19.994
 21    -7   20.992   20.988   20.989
 22    -6   21.984   21.977   21.978
 23    -5   22.969   22.955   22.955
 24    -4   23.938   23.909   23.912
 25    -3   24.878   24.816   24.827
 26    -2   25.762   25.622   25.667
 27    -1   26.547   26.199   26.384
 28     0   27.173   26.665   26.930
 29    +1   27.589   27.060   27.297
 30    +2   27.808   27.347   27.531
 31    +3   27.908   27.548   27.682
 32    +4   27.954   27.688   27.782
 33    +5   27.977   27.784   27.849

The last row highlights the “diminishing returns” effect of increasing the amount of input entropy.

JonKnowsNothing August 30, 2021 5:30 PM

@MarkH

The “linear to non-linear” appearance shift is really important to note. On either end of the scale one might make an assumption about the next data point and that could be very painful years later.

The old “what’s next” question 1 2 3 4 5 6 ?

sorry Im not able to write more, road rash vaporizes them. Not had this many held up since the system was ported.

Weather August 30, 2021 5:55 PM

@markh
Thanks help me. Are you sure the expansion from 5 chars for entropy doesn’t change with 6?
Noticed that low entropy didn’t mix(function) well with hash output, but came to the opposite conclusion, but yours fits better, can you take suggestion to speed up the code, so you can try 6 in a month, threads arnt cores, you should be able to run 7 for the program 1 for the os. You mentioned histogram computed at block sizes ,if you have

For ;I<0xffffffffffff;
Pramgra sections
Pramgra section
Var[I] = 2
I =I+7
Pramgra section
Var[I+1] =2
I=I+7
……

SpaceLifeForm August 30, 2021 6:12 PM

@ MarkH, Clive, JonKnowsNothing, FreezingInBrazil

At first, output entropy is a nearly linear function of input entropy, converging to the hash width around Delta = 0.

Exactly. As expected. You can not create entropy out of thin air.

The reason I mentioned Blake2b is because you can play with the output widths.

The question to investigate is whether different hash algorithms possibly leak at different rates.

I am not saying that Blake is great, but it is fast, and can provide another angle.

BTW, I had to fire the shaker inserter and the shaker dumper.

After intensive investigation, and secret cams, and outside audit, it was determined that the shaker inserter preferred the green gumballs to insert over the red gumballs. The shaker inserter was eating the red gumballs. And the shaker dumper did not like the green gumballs, so randomly would secretly trade a dumped green gumball or two with the shaker inserter in order to obtain a red gumball at the right time.

The entropy in the shaker turned pretty green over time.

SpaceLifeForm August 30, 2021 7:24 PM

@ MarkH

there are many milliseconds between mutex calls.

This points to SpinLock.

And therefore possible issues at microcoode level.

So, you are using Linux. What is the kernel version? What is your toolchain versions?

Which versions of gcc, binutils, libc?

Any other external libraries involved?

They may be an important factor.

SpaceLifeForm August 31, 2021 12:12 AM

@ MarkH, Clive, JonKnowsNothing, FreezingInBrazil

echo “0” > zero
echo “1” > one

you now have 2 two byte files that differ by one bit

md5sum zero
897316929176464ebc9ad085f31e7284 zero

md5sum one
b026324c6904b2a9cb4b88d6d61c81d1 one

sha256sum zero
9a271f2a916b0b6ee6cecb2426f0b3206ef074578be55d9bc94f6f3fe3ab86aa zero

sha256sum one
4355a46b19d348dc2f57c046f8ef63d4538ebb936000f3c9ee954a27460dd865 one

The hashes sure look different.

It appears as though the hashes have some entropy.

But where did the extra bits come from?

Weather August 31, 2021 12:25 AM

@slf all
They have a Initial 8 dword constant, the one or zero mixed with a exponential value say 1.6 which loops multiple times

MarkH August 31, 2021 1:55 AM

@SpaceLifeForm:

The hashes sure look different. It appears as though the hashes have some entropy. But where did the extra bits come from?

First, to take the question at face value, that is the definition of a cryptographic hash function. If the hash is n bits wide, it will output an n-bit integer for any input file. Although the mapping to the hash output is fully deterministic, it is supposed to not follow any simple pattern of mappings. In fact, the set of mappings is supposed to have the statistical properties of a random distribution.

I went into this in some detail in an earlier comment, please read it carefully.

Second, there was a long “dialogue” between Clive and myself which didn’t seem to lead to any particular resolution. Clive repeatedly explained that hashing does not {add | create | multiply} entropy … even though I never yet found a statement on this thread suggesting that hashing can do ANY of those things.

The concept from Freezing_in_Brazil, to which I responded positively, was to hash a relatively long file which already has a significant quantity of entropy so that its scattered entropy would be gathered into a much shorter bit sequence.

This concept does not presuppose any addition, creation or multiplication of entropy by hashing … only the transfer of the entropy (with a slight loss) from a long sequence of symbols to a shorter one.

I’ve tried to understand why Clive and I seemed to be discussing such orthogonal questions, and came up with the hypothesis that Clive was responding (just a guess, no disrespect intended) to a common mistake of people who don’t understand crypto: they look at the many random-looking digits of a hash (or cipher), and wrongly imagine that entropy was “created” or “added” in the process of evaluating the cryptographic function. Perhaps Clive was continuing some debate from another time and place with people who’ve made this elementary mistake.

========================

Your example is actually a very useful one. Each of the two files is mapped to really long hexadecimal numerals, which look to be completely unrelated.

But if the application is limited to those two files, and they are equally probable, then the hash output has exactly one bit of entropy, whether it is represented by 1-digit numerals or billion-digit numerals.

========================

I take the liberty of quoting from my earlier comment:

Entropy is NEVER determined by the type, name, internal structure, symbolic representation, coding, etc. etc. etc. ad infinitum of the elements of that set … entropy is based only on the distribution of probabilities among them.

It doesn’t matter what the elements of that set “look” like; any non-empty set will do.

Attempts to infer the entropy of zero, or of 5A3122F29D912222382260C25D606EC8B3B8330C29A8441302C8AEC037EF5E8C, or any other element of any set by analysis of internal structure are doomed to failure, because they ask the wrong question.

One might as well enquire, “how many bicycles are needed to make water wet?”

To those who wish to better understand entropy: don’t let yourself get distracted by the symbols or other representations of outcomes; the entropy is in the probability distribution, and nowhere else.

Clive Robinson August 31, 2021 2:52 AM

@ SpaceLifeForm,

But where did the extra bits come from?

As I’ve already said to @MarkH above,

“What you are describing is the case of a user pressing Y or N in response to a message on a terminal. It maters not how big or complex the output is from the crypto function the result is still the same for each Y or each N thus it is a “simple substitution cipher” also called “being used in ECB mode”[1]”

The “extra bits” are thus just an artifact of a simple mapping function. There is no extra “entropy”.

It’s a point I’ve been making for years on this blog when talking about the “magic pixie dust” thinking exhibited by various IC manufacturers like Intel, who hide their on chip TRNGs behind hash or other crypto functions…

That is there is the same amount of true entropy at the ouput of the crypto function as there is at the input to the crypto function. The fact an observer sees lots and lots of bits does not change this.

In fact if you repeatedly send just your “one” or “zero” files into the hash function and take the output and feed it into an ideal compression function you would expect to see a decreasingly sized output from the compression function.

It’s fairly easy to see that a even a nonideal “Run length” compression function would fairly quickly recognize that there are only two output strings and depending on the algorithm used get down to just outputing a symbol for each string.

I’m realy not sure what @MarkH is trying to prove, but he clearly does not remember what our previous conversation was about and how it started from an erroneous posting he had made that I pointed out some of the errors it contained.

Anyway, busy day ahead as yesterday was a public holiday in the UK, I’ve got chasing up medical results etc on top of trying to get some more everyday work done in four days not five.

JonKnowsNothing August 31, 2021 11:47 AM

@MarkH Clive SpaceLifeForm All

re: What’s to prove… ”

What was (here) became another road rash victim and ended in the abyss bit bucket.

SpaceLifeForm August 31, 2021 4:28 PM

@ MarkH, Clive, JonKnowsNothing, FreezingInBrazil

But where did the extra bits come from?

The question was rhetorical. I know where they come from.

A better question would be:

If you measure the entropy of those hash strings, what do you see?

If you ran Ent or Dieharder on the hash strings, and compared results, would that tell you exactly that there only one bit of difference in the entropy?

Another angle: If one was to not realize that they were hashes, and blindly treat them as random, would they be safe?

Consider a process that generates hashes from a PRNG. The output appears random, but we know that it is not.

If the secret algorithm is known to some, they may be able to determine where in the sequence of hashes the position of the PRNG is at, and then know what the output will be on future steps.

The hash itself may leak the position.

If the output of the generator is accessible to an attacker, because the generator is a server to the entire machine, then it may be possible for the attacker to decrypt the FUTURE ciphertext created by a process on the same machine. If the attacker can leak the position over network, then the ciphertext can potentially be decrypted elsewhere.

If the generator is ‘locked in’ at the motherboard level, there is no way to trust it. Even if you mix in other sources of random, the motherboard can still see it.

This is why the crypto must be separate from the comms.

One must not allow crypto state to leak.

Clive Robinson August 31, 2021 5:46 PM

@ SpaceLifeForm, FreezingInBrazil, JonKnowsNothing, ALL,

If you measure the entropy of those hash strings, what do you see?

The honest answer is “it depends on the tools you use”. As I noted earlier if you use a “compression function” eventually given enough input thus time memory etc it will squeeze out much that is not random.

The question then is about what remains, “Is it actually random or just determanistic behaviour?”

The chances are it is more likely to be determanistic behaviour “That can not be picked up by your tools algorithms” than it is random. “Why?” Well more than half a professional life has shown me two things,

Firstly, the more it looks like “ideal” randomness the less likely that is to be the case and the more likely it is to be based on something determanistic. Because nearly all natural sources of true randomness are heavily poluted by determanistic noise.

Secondly nature does not do “ideal” randomnes. Nature is either circular in nature or exponential in nature within the scope of most practical measurment.

But as I’ve explained befor if you look at the output of a noise source you see a spectrum that broadly falls into three catagories,

1, Determanistic
2, Chaotic
3, random

Where “random” is just a tiny fraction of the total noise signal.

From which we can see, the answer to your question,

If one was to not realize that they were hashes, and blindly treat them as random, would they be safe?

Is very clearly “No” not just for the “hashes” but nearly all alleged “random” sources in all but the “quantum” environment.

The rest of your argument goes on to show why this can be so problematical.

It’s a point I’ve been making for years, long befor Linus made his gaff and appology over the Intel “on chip” alleged random bit generator.

However it still appears people want for conveniance or other reason to believe that such RNGs are “cryptographically secure” when in all probability they are not.

Which is where we start to move into Upton Sinclair observation territory. For which the only two likely resolutions are,

1, You walk away shaking your head.
2, You figuratively rip off their head, stuff it where the sun does not shine and throw the corpse out for the stray dogs to consume…

But then in the real world you have to run it as a process… So having taken the second option, when you talk to the next person, you will probably repeate option two…

My prefered option is the first, and leave then to rot in their own cognative bias and have no future contact with them of any consequence.

MarkH August 31, 2021 7:37 PM

@SpaceLifeForm:

If you ran Ent or Dieharder on the hash strings, and compared results, would that tell you exactly that there only one bit of difference in the entropy?

When I first noticed your handle here a couple of years ago, you repeatedly said (mistakenly, I believe) that RSA is insecure, citing articles about how incorrect application — ignoring security guidance published many years ago — can create vulnerabilities.

Woe to him, who uses tools without understanding. Enormous harm can come from using an automotive jack, or even a simple screwdriver, if the operator doesn’t know how to do it properly.

Mindless application of statistical tools leads directly to wrong conclusions. This has happened many, many thousands of times.

My suggestion to all readers: if you are applying ANY security tech to protect something important, LEARN ABOUT IT FIRST.

Another angle: If one was to not realize that they were hashes, and blindly treat them as random, would they be safe?

It’s my impression that Clive has warned against blindly trusting any supposed random source dozens of times on this forum.

Bruce has had numerous blog posts about the danger of bad “random” sources, along with many other people who write about security.

See above: if you are applying security tech to something important, LEARN ABOUT IT FIRST.

Whoever declines to do so, is like a toddler with a loaded handgun.

It’s like the old saying about road casualties: the safety-critical component of road vehicles which engineers cannot improve, is the nut that holds the wheel.

Clive Robinson September 1, 2021 4:10 AM

@ MarkH,

is the nut that holds the wheel.

The version I first heard was “behind the wheel”.

But speaking of using tools, even though they may not be used dangerously, they can be used inappropriately, pointlessly, or wastefully. That is they are used to make things of no worth or less, because somebody gets “a bee beneath their bonnet” which is wrong, but trys to prove otherwise. It happens so often these days we have euphemisms for it just one of which is “going down the rabbit hole”.

But speaking of remembering back to peoples old comments, you’ve made one or two in the past that were not just wrong but inappropriate.

In past times things like that were ephemeral, and that is how society realy wants them to be for good reason, because as I’ve noted before society can not move forward without most of what is said being ephemeral.

Interestingly @echo linked to a piece in the Atlantic just yesterday about one of the dangers of a non-ephemeral society. I guess you’ve not read it, you might want to though.

MarkH September 1, 2021 12:57 PM

@Clive, who addressed to me:

you’ve made one or two [comments] in the past that were not just wrong but inappropriate

You once again show your gentlemanly qualities by estimating this at one or two, as the actual number is surely greater than that.

I don’t hold myself up as an oracle, nor a paragon. My “exit line” as I’m finishing a conversation is often, “don’t do anything I would do!”

I err often. My hopeful self-image is that my path resembles a random walk, with a barely perceptible drift in the direction of fact, illumination, and understanding.

I don’t mind that my errors are durably (and in practice, indelibly) recorded. It was almost 47 years ago, that a graduate student admonished my laboratory classmates and me that we must never cover, erase or obscure writings in our lab notebooks which we realized to be in error. It was permissible to strike them through with lines, so long as the mistaken entries remained clearly readable.

Those mistakes are — and need to be — part of the record.

My particular regret is about the times when I have wounded the hearts of my brothers and sisters, and especially you, when my sloppy communication implied something which I never believed.

My sometimes Byronic harangues are about my hunger for truth, not the diminution of persons. I hope that my interlocutors will remember that I love you all, whether or not any of us (or even all of us) happen to be mistaken at any given moment.

MarkH September 3, 2021 12:00 PM

.
Chaining cannot wither her, nor custom stale
Her finite entropy

I am embarrassed by my lack of success in following Clive’s preceding reasoning concerning entropy of hashes, or even how those arguments relate to predictions I offered of hash entropy as a function of input entropy.

To the extent that I understood some the comments, Clive referred several times to the iterative chaining of the hash map function. In case that was meant to argue that input entropy might be lost in the chaining process, I’ve run some direct tests.

The hash function I’ve been testing with has 512 bit (64 byte) input blocks. I present here some results from confining all of the input entropy to the first of multiple blocks.

As a sanity check, this experiment applied 20 bits of entropy to the first of 100 blocks for the 32-bit hash:

freq         count      ratio
 0       4293918833
 1          1048350     4095.9
 2              113     9277.4
entropies in bits:
  H       H*    G_median  G_mean
 0.008  20.000   20.000   20.000

The hash output entropy is practically equal to the input entropy, as expected.

========================

These results are from 32 bits of entropy applied to the first of 50 blocks into a 32 bit hash:

freq        count   gamma  ratio
 0      1580003056  15.83
 1      1580075585  15.09   1.0
 2       789997389  15.45   2.0
 3       263344491  15.39   3.0
 4        65829727  13.72   4.0
 5        13164517  12.42   5.0
 6         2194930  12.27   6.0
 7          313817   9.94   7.0
 8           38911   7.15   8.1
 9            4388   7.01   8.9
10             450   4.90   9.8
11              32   2.38  14.1
12               3   3.47  10.7
entropies in bits:
  H       H*    G_median  G_mean
31.173  31.173   30.665   30.930

========================

These results are from 32 bits of entropy applied to the first of 100 blocks into a 32 bit hash:

freq        count   gamma  ratio
 0      1580015213  16.69
 1      1580021985  17.56   1.0
 2       790070619  13.80   2.0
 3       263309275  13.14   3.0
 4        65833855  16.45   4.0
 5        13164460  12.39   5.0
 6         2194267  13.29   6.0
 7          313383  11.41   7.0
 8           39411   7.45   8.0
 9            4367   8.40   9.0
10             420   4.82  10.4
11              38   4.64  11.1
12               3   3.47  12.7
entropies in bits:
  H       H*    G_median  G_mean
31.173  31.173   30.665   30.930

Compared to single-block experiments presented in previous comments, the frequency table numbers are plainly different. The entropy measures, in contrast, remain constant; they precisely match those calculated by analysis of expected hash collision frequencies.

As readers might expect, multi-block full-width hash experiments at 32 bits take some time. The 100 block full-width run shown just above consumed more than 150,000 core-GHz seconds (on an x64, I don’t remember which species).

Clive Robinson September 3, 2021 7:00 PM

@ MarkH,

To the extent that I understood some the comments, Clive referred several times to the iterative chaining of the hash map function.

I think the problem is,

You see the hash as a single functional block and I do not.

I see the hash you see as being in two parts.

The first is effectively a one to one map N-bits wide if you could make one that big then it would be the equivalent of a ROM.

The second part is the “chaining function” that is where a block has two inputs N-bits wide, the current input to the hash and the previous output of the one to one map. The chaining function combines the two inputs and produces one N-bit output that goes into the one to one map.

The simplest chaining function would be a latch to hold the previous map output, and n XOR gates to mix the two n-bit inputs down to one n-bit output to drive the map.

In reality there are a whole bunch of different ways you can chain a crypto function like the one to one map. Have a look at some of the DES and AES “chaining modes” to see this.

Now consider the map output from the previous N-bit input has X-bits of entropy, The fact it has gone through the map, which significantly changes the output –avalanche requirment– it can not increase the entropy so it would still be X-bits of entropy. That is latched by the chaining function to use in the next hash. So the current N-bit input has Y-bits of entropy thus the chaining function has one input of X-bits of entropy and a second input of Y-bits of entropy.

But the chaining function only has an N-bit output, so the question arises of,

“How much entropy from X-bits and Y-bits of entropy at the input appear at the N-bit output?”

When you think about it a little while you will see it is one of those “it depends” type answers.

MarkH September 4, 2021 2:34 PM

@Clive:

We know from the basic math that if X + Y ≥ N then some of the given entropy must be lost. Beyond that, simple math probably doesn’t offer much guidance.

Here’s my informal reasoning:

Consider a hash chaining step in which (a) X + Y is significantly less than N, and (b) the hash-state entropy E after the step has completed is less than X + Y.

Shannon entropy measures the distribution of variations among a set of alternatives. X is the distributional variation of inputs processed prior to the chaining step (minus any entropy lost along the way), and Y is the distributional variation of inputs in the current input block.

Because (by supposition) E < X + Y, some set of alternative inputs up to and including the current block have collapsed into the new hash-state. In other words, two or more distinct input sequences (among the defined distribution of inputs up to that point) will fail to lead to distinct hash output.

This lost-entropy case guarantees one or more collisions.

========================

A fundamental requirement of cryptographic hash functions is that they produce as few collisions as is mathematically possible.

I presume that practically all crypto hashes standardized in the past 25 years meet this criterion (at least, to a very good approximation).

Unlike the other requirements for a cryptographic hash (like first and second preimage resistance), which must be gauged by Herculean efforts of cryptanalysis, I suppose that the minimum collision-probability property can be verified by comparatively straightforward test and analysis.

========================

If any hash function satisfies the minimum collision probability property, then the scenario considered above:

E < X+Y < N

occurs as infrequently as possible.

Therefore, I conclude that such chaining losses are very small. This must be so — regardless of how the designers chose to implement the hash — because otherwise the hash function would fail to satisfy the minimum collision probability requirement.

========================

You might know the Alfred Hitchcock anecdote which ends with one man saying “but there are no lions in the Scottish highlands,” and the other replying “well then, that’s no MacGuffin!”

If E < X+Y < N occurs with more than vanishingly tiny probability, then we can say "that's no hash function!"

MarkH September 4, 2021 10:27 PM

@Clive et al:

CORRIGENDA

Reviewing my comment above, I see language which is wrong, without additional qualification.

Where I wrote “as few collisions as is mathematically possible,” that should be “as few collisions as is mathematically possible for a function whose output should have the statistical properties of a random distribution” (such distribution being another foundational requirement for cryptographic hash functions).

Likewise, where I wrote “E < X+Y < N occurs as infrequently as possible," that should read, "as infrequently as possible for a function whose output should have the statistical properties of a random distribution."

Clive Robinson September 5, 2021 7:03 AM

@ MarkH, ALL,

Beyond that, simple math probably doesn’t offer much guidance.

Not much, but you can use semi-logic and reason from there.

We know that X and Y are seen as integers in simple math. However you can also view them as binary arrays (vectors) of width N-bits and you can then examin things “bitwise” via logic rather than “wordwise” arithmetically and make life a “little” simpler.

You can produce a table of X by Y where each entry is a bit array of the mixing function output and a count of differing bits etc. Likewise each X or Y index location on the table can hold not just the binary array with a bit pattern that corresponds to the integer value, but also the number of “set bits” in the bit pattern etc.

You can then use such tables to deduce other information such as when individual bits have changed or not and importantly why

If you run such a chaining function where the map is not used (that is each input bit becomes the corresponding output bit without change), you can see that you have the mixing function –XOR gate for this example– with a latch acting to give “delayed feedback”.

That is the latch’s D-input is driven from the mixing function XOR gate’s output, and the latch’s Q-output goes back to one of the mixing functions XOR gate’s two inputs.

So the latch acts like a single bit of memory holding the previous mixing function output.

You can draw up the equivalent of a truth table for this circuit, thus determine what effect it has on that bit’s state over the clocking of the latch.

To save doing some of the work, you can look the circuit up, it’s treated as a universal function for all sorts of things and is one of the two simplest digital filters[1] (the other has the latch between the two inputs and is used as a von Newman de-bias circuit in TRNGs you clock it twice to get one bit of de-biased output).

But you can also view it as a Linear Feedback Shift Register (LFSR) or a single bit “cellular automata” both of which come up very frequently in the design of “stream generators” and similar crypto functions.

The next step in the analysis is to replace the map with an XOR gate where the extra input becomes a “control” input.

So the circuit you now have, is an XOR gate acting as the mixer function with one of it’s inputs is the X-bit input the other the Y-bit feedback from the latch. The mixer function XOR gate is the first XOR gate and its output now goes into the non-control input of the second –new– XOR gate which has replaced the map.

The output of the second XOR gate is the circuit output and drives the data input to the latch.

Which leaves the control input of the second XOR gate as a –new– second input to the circuit. which is an input from a complex logic circuir what will become part of the non-linear wordwise function that is the map (just think of it as one bit from a ROM that holds the mapping function for now).

At this point you can still do logical analysis on the circuit… After all it only has two input bits and one bit of internal state making it effectively the equivalent of a three input gate in terms of the number of states, which is 2^(2^n) or 2^(2^3) which is, 256, not that hard 😉

As for that control input map function circuit, that’s a different order of magnitude by quite some way. With an N-bit wide map you are looking at 2^(2^(N-1)) potential states so with a ridiculously small 32bit width you’ll have 2^2147483648 states which is bigger than most calculators or computers will give up on and indicate infinity or overflow or similar. And… that is for just one of the 32 output bits…

It’s why removing the map makes sense when you want to start analysing things, and another reason why I treat a hash function as having two parts,

1, The map function,
2, The chaining function.

And keep the two as seperate as possible whilst analysing things.

[1] Be warned however “digital filter” or just “filter” should act as a warning flag of “higher maths” involving ‘e’ “approaching” which is enough to send many running for cover. Thankfully though with just single bits involved other techniques such as Walsh transforms can be used.

Clive Robinson September 5, 2021 8:41 AM

@ MarkH,

On a different note,

Shannon entropy measures the distribution of variations among a set of alternatives.

Depending on your usage of “distribution” makes that statment true or false.

Shannon entropy is not about “objects” or “data” but “relations” between objects or data. That is the normallised ratio of occurance usuall by the data value. That is probabilities in a given set of values. It’s meta-meta-data not data.

Where “data” is technically an object’s “held value” say 73.

Without meta-data such as ft, lb, m/s, the “held value” is useless.

The object has one or more “identifiers” by which the object becomes unique.

Without meta-data such as the objects location address say 0xFF38 the “object” is useless.

Which means the “held value” data is not accessable for use.

As programers we tend to take this as a given whilst most others do not even think about it.

What entropy is is the “relationship” between two or more “objects” in a defined set of objects, expressed by the normalised ratio of occurrences of the set of “held value” data. So it’s meta-meta-data or meta-meta-meta-data[1] depending on your view point…

Thus if your set of “objects” holds eleven addresses and the set of data held by those objects is just two unique “values” Shannon Entropy is the normalised “ratio” of the objects by their contained “value”.

[1] Which is more “meta” than most people can get their head around with just one reading. It was also tucked away in a book in the Victorian era –by the logician and photographer Charles Dodgson– most have heard of the book if not had it read to them when they are young or seen a film of it etc.

MarkH September 5, 2021 3:14 PM

@Clive:

I don’t know which writing of Mr Dodgson you had in mind, but my favorite exposition of meta-meta-meta is the White Knight’s song with four titles from “Through the Looking Glass.”

The song is “A-SITTING ON A GATE”;

the song is called “WAYS AND MEANS”;

the name of the song is “THE AGED AGED MAN”; but

the name of the song is called “HADDOCKS’ EYES.”

When I was still a squirt, and had never yet even seen a computer, I was puzzling over a manual for IBM 360 assembly language, and feeling mightily confused about semantics: when did a notation mean the identity of a register, and when did it refer to the register’s contents?

So when I later absorbed the Alice dialogue about song nomenclature, the whimsical explication of a truly deep problem held much resonance for me.

========================

While still in my teens, I read in a book of science fiction a note by a famous author (I want to say Canadian A. E. van Vogt, but don’t trust my recall) that he had participated in work on a never-completed encyclopedia, which contained the entries:

Carroll, Lewissee Dodgson, Charles Lutwidge

Dodgson, Charles Lutwidgesee Carroll, Lewis

Having an exceedingly literal mind, I took it to mean that the SF author was humorously pointing to a mistake the (would-be) encyclopedians had made.

Much later it came to me that they had paid tribute to the noted logician, in a manner I’m sure he would have relished.

MarkH September 5, 2021 3:47 PM

@Clive, et al.

I have resolutely resisted analyzing the mechanics of hash functions for two excellent reasons.

[1] For my taste, the matter is dull, tedious, and boring. I’m deeply grateful to the accomplished cryptographers — including Bruce Schneier and his colleagues — who have done the prodigious labor to figure this out. They did the hard work, so I don’t have to!

[2] Such analysis is simply unnecessary to determine the entropy of an N-bit hash for an input with N bits of entropy:

a) If the distribution of hash function outputs is a good approximation to a random distribution, then the probability that any particular one of the 2^N possible hash outputs will occur exactly j times is accurately give by p = 1 / (e j!), except for very small N.

b) From that probability formula, it is straightforward to compute that the hash of an input with N bits of entropy will have N – 0.83 bits of Shannon entropy, N – 1.34 bits of median guessing entropy, and N – 1.07 bits of mean guessing entropy. My experiments above show these predictions to obtain to .01 bit of entropy for N > 11.

c) If the output entropy is less (or greater) than these figures, then the hash function fails to meet the random distribution standard.

Really, it’s not more complicated than that!

I don’t worry about chaining, because it’s obviously not sufficient that the round function conform to a random distribution. If the chaining process does not also preserve the near-random distributional characteristic, then the hash function would fail to meet the requirement for multi-block inputs.

We don’t need to know how the boffins satisfied the random-like distribution requirement, in order to know the output entropy.

Either the hash function is a good approximation to a random distribution, or it ain’t.

MarkH September 9, 2021 3:18 PM

@Clive et al:

Clive has raised a question about what happens to entropy in a hash function’s chaining process.

I’m confident based on my understanding and my experiments, that as long as Σ Hi (where Hi is the entropy of the ith block) over the first n blocks is significantly less than the width of the hash function’s internal state variable, that the retained entropy will be less than that sum by a negligible amount.

However, I haven’t found an argument or experiment to adequately account for the case of entropy approaching the internal state size.

I’ve offered an indirect proof that chaining loss is negligible, summarized thus:

(a) hash collisions cause entropy loss; in the absence of collisions, no input entropy is lost

(b) a fundamental security requirement for crypto hashes is that their distribution of outputs be indistinguishable from a random distribution

(c) because of (b), standardized crypto hashes are thoroughly vetted, by analysis and statistical testing, to as to whether any method can be found to distinguish their distributions from random

(d) as a consequence of (c), standardized crypto hashes have distributions sufficiently close to a random distribution, that their collision frequencies accurately follow the random model

(e) the requirement of approximation to random distribution is not dependent on input length: it inherently includes the effect of chaining

(f) from (d) and (e), hash collision frequencies accurately follow the random model even when the effects of block chaining are taken into account

(g) because entropy loss in hashing is due to collisions (a), and hash collisions follow the random distribution model (f), input entropy lost in hashing accurately conforms to the values of ~1 bit I have presented above

I could be mistaken on one or more of these points! I welcome factual corrections.

========================

As I wrote above, I don’t know how to make a more direct proof or demonstration.

For those concerned about Clive’s question — which I am taking seriously — there is a simple remedy: use a hash with an internal state significantly larger than its output.

There are two simple ways to do so.

[1] Use an SHA3 hash, which has internal state much wider than its output. (Note well that the SHA3 family of Keccak hashes uses the “sponge” construction, very different from earlier typical hash functions; and that the extra state is provided as a safeguard against preimage attacks, not to ensure that the output distribution is statistically random.)

[2] Use any old hash function wider than the number of bits you want, and truncate its output. This may seem counterintuitive, or even impossible! If I throw away half the bits from SHA512, aren’t I throwing away my entropy too?

Nonetheless, I’ve demonstrated that it works: I put 32 bits of entropy into a 160-bit (or even 64-bit) hash, and the truncated output has ~31 bits of entropy in precise accord with the math for randomized distributions.

Mowmowfi September 10, 2021 1:08 AM

@markh all
You can tell what hash function was used by the output, they are not random distribution.
512 bit truncated to 256 on the output worked for one class of attack, I don’t think looking at entropy is the right àngle. When a byte is 0x00-0xff 256 BYTE hash needs to be used

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.