New Open SSH Vulnerability

The vulnerability, which is a signal handler race condition in OpenSSH’s server (sshd), allows unauthenticated remote code execution (RCE) as root on glibc-based Linux systems; that presents a significant security risk. This race condition affects sshd in its default configuration.

[…]

This vulnerability, if exploited, could lead to full system compromise where an attacker can execute arbitrary code with the highest privileges, resulting in a complete system takeover, installation of malware, data manipulation, and the creation of backdoors for persistent access. It could facilitate network propagation, allowing attackers to use a compromised system as a foothold to traverse and exploit other vulnerable systems within the organization.

Moreover, gaining root access would enable attackers to bypass critical security mechanisms such as firewalls, intrusion detection systems, and logging mechanisms, further obscuring their activities. This could also result in significant data breaches and leakage, giving attackers access to all data stored on the system, including sensitive or proprietary information that could be stolen or publicly disclosed.

This vulnerability is challenging to exploit due to its remote race condition nature, requiring multiple attempts for a successful attack. This can cause memory corruption and necessitate overcoming Address Space Layout Randomization (ASLR). Advancements in deep learning may significantly increase the exploitation rate, potentially providing attackers with a substantial advantage in leveraging such security flaws.

The details. News articles. CVE data. Slashdot thread.

Tags: SSH, vulnerabilities

Posted on July 3, 2024 at 11:27 AM • 28 Comments

Comments

Morley • July 3, 2024 11:58 AM

I hope for AI hardened systems. I wonder if anyone is working on that yet.

Leon • July 3, 2024 2:52 PM

Not only a serious one but shows vividly how utmost pointless the ASLR is.
Coders have cursed ASLR already for ages – it makes debugging the software more difficult, thus actually helping to increase the bugs. It just makes things more difficult and actually resolves nothing.

Sadly the industry is full of those pointless and half bakes solutions that are supposed to solve important problems but actually make things more difficult and solve nothing.

JonKnowsNothing • July 3, 2024 3:20 PM

MSM report indicates

The vulnerability is the result of a code regression introduced in 2020 that reintroduced CVE-2006-5051, a vulnerability that was fixed in 2006.

The problem was fixed in 2006

The problem was unfixed in 2020

Clive Robinson • July 3, 2024 3:40 PM

@ ALL,

Hmm and the first mitigation is the one in many cases that should have been done long before.

Which is,

“Pull the plug from the external wall.”

To cut off the systems communications to the outside world.

We really should be asking about the supposed benefits of external communications and how we limit the ever present harm such communications paths cause.

I hear a lot about “the benefits” but it’s arm wavery at best and when examined is usually found to be wishful thinking.

The simple fact is all our commercially available “Applications” and the “OS’s” they run on are defective and riddled with vulnerabilities just waiting to become obviously exploited. By the time sufficient people are aware and the alarm gets out very many will probably have been harmed.

As for Open Source, yes it has vulnerabilities as well, which makes the point that maybe we should not be running with scissors.

Mark • July 3, 2024 6:58 PM

@Leon

This actually shows exactly the opposite: ASLR slows the attack process. Attacking i386, with its weak ASLR, takes twice as many tries as attacking a non-ASLR system. Attacking amd64’s stronger ASLR takes so much longer that the researchers only theorize that such an attack is possible — it’s expected to take a quarter-million times as many tries as a non-ASLR system would.

Clive Robinson • July 3, 2024 7:55 PM

@ Morley,

“I hope for AI hardened systems. I wonder if anyone is working on that yet.”

Short and quick answers are,

“Yes” and by “starving the AI of input”.

Slightly longer answer,

When you think about it current AI systems work as “collect it all” back ends.

That is they slurp in any “plaintext” and build it’s “Digital Neural Network”(DNN) weights / multipliers with it.

Logically if there is no plaintext or information with usable statistics to build the DNN with then the AI is defunct.

So anything that has two basic properties is in effect AI hardened

1, All information outside the processing of the system is suitably randomised / coded / encrypted.
2, Only properly authenticated and authorized processes have access to the internal processing.

The first is actually the hardest because there are three levels of information you have to deal with currently,

1, Actual data.
2, Meta-Data.
3, Meta-Meta-Data.

I’m not going to go into how you deal with each layer as that’s been covered fairly frequently on this blog in the past and it’s fairly lengthy.

Michael • July 4, 2024 1:47 AM

Not so serious for well maintained Linux systems because Qualys forgot to mention, that an exploit is currently only available for 32 bit systems (i368).

But if you have some kind of router or other device with ssh exposed, chances are good that you might have “visitors”.

And fixing is difficult. OpenWRT has a fixed version but how to get that onto your device? Other vendors tend to neglect the care of their products.

I’m afraid, pulling the plug and replacing might be the only option in some cases.

sitaram • July 4, 2024 2:27 AM

I don’t think it’s that serious in practice. It’s almost certainly not amenable to “spray”-attacking hundreds of sites; there’s a lot of effort and hours of sending thousands of crafted packets involved, as I understand it.

And they’ve not actually managed to run it on 64-bit systems, only 32-bit systems so far.

I’ll try and dig up URLs for all this but I was reading all that on my mobile and I didn’t save/bookmark them.

JonKnowsNothing • July 4, 2024 2:37 AM

@Morley, All

re: I hope for AI hardened systems

The problem is not in AI and AI cannot solve the problem.

The problem spans decades of code that is carried forward as legacy code system from one iteration to another.

fixed in 2006

The problem is “legacy bloat” and blind trust in “outsourced code libraries”. These work fine in the first iterations but later constant regression testing of legacy bloat begins to exceed “allowed time assignment” parameters: it just takes too long to test.

At some point, some “smart fella” orders a “code clean up”, and 20 years after the fix, when no one remembers Who How Why an extra “;” was added to a massive IF THEN ELSE code blob that things get unhooked.

unfixed in 2020

This is not an AI issue but AI is certainly affected by it.

AI datasets trained on different servers do not create exact reproductions of outputs. You load up the training set on N-servers; run the same dataset through each independently and you get something different on each server. It’s designed this way.

It also means, the datasets containing data-images that are required to be removed by courts and laws; cannot be removed: ever. Any attempt at removal will alter the outputs of the system and any attempt at re-training minus the embargoed information, will cause the systems to have completely different profiles.

It the first case; the reintroduced error happened because no one realized it was a fix to anything at all. To repair the damage they have to find the original code location and deal with 20yrs of new code design. Repairs may or may not work depending on how “elegant” the newer code is.

In the second case; attempting to remove any data-image will create an AI output profile that is inconsistent with the existing system. The Big AI Techs are offering a slight alteration to their weighted-averaging for these items in an attempt to minimize chances of getting a direct hit on the blocked data. Like sending a SEO link to the bottom of the 99 page URL query return list. If you know how to game the AI input line, you can retrieve the blocked data as it is still included in the system.

Who? • July 4, 2024 11:00 AM

@ Leon

ASLR helps finding bugs too, at least those that would remain hidden in a static address space.

Some ASLR implementations have (had?) design weaknesses that made them less secure than expected, but I understand the development teams on the affected projects have increased the strength of their implementations over time.

Who? • July 4, 2024 11:02 AM

@ JonKnowsNothing

Sadly it is usual these days; Microsoft had a lot of those reborn vulnerabilities in the nineties, it seems they reused a lot of old code perhaps because they did not run a centralised source code repository at that time.

Carmine • July 4, 2024 12:31 PM

A co-worker once managed to convince a group of us that signals are not “evil”; and, that given the historical context in which they were developed, they even kind of make sense. Still, they remain quite problematic and poorly understood, and I feel like they’re an area in which the system offers little help. Remember the early days of multi-threading on Unix systems, before POSIX decided that every thread needed its own ‘errno’ value? Kind of like that; one’s almost fighting the design to use it. Had multi-threading and thread-local storage existed when signals were developed, signals would probably look quite different.

I think system libraries can and should be more helpful. For example, if library code could know whether it’s executing from a signal handler, all of the functions that aren’t valid to call in that context—that is, almost all system functions—could refuse to run. This could be done by having the system add a dummy in-signal-handler flag to a thread’s signal mask on entry, or by using a thread-local entry/exit counter. (longjmp() and such might need special-case code to handle these: it’s sometimes valid to use them to jump out of a signal handler, and possibly into another signal handler.) Maybe we could also automatically save and restore ‘errno’—a common source of trouble—if allowed by POSIX.

Also little-known is that, if fork() is called in a multi-threaded program, only async-signal-safe functions may be called before executing a new program. I don’t know of any system that enforces that either.

Actually enforcing these documented restrictions would reveal a multitude of problems, I’m sure. An integrated system such as OpenBSD, though, might be a good place to start. And, yes, I’m aware that this specific OpenSSH code was not present on OpenBSD; that’s because OpenBSD provides syslog_r(), an async-signal-safe variant of syslog(), and OpenSSH uses it when handling signals. The manual says “syslog_r() and the other reentrant functions should only be used where reentrancy is required (for instance, in a signal handler). syslog() being not reentrant, only syslog_r() should be used here.” But why should syslog_r() not be used everywhere? They don’t say. And why can’t they just make syslog() safe, maybe by having it call syslog_r() when used from a signal handler?

POSIX, and the systems implementing it, seem way too willing to create little “traps” for the programmer. With a little more ambition, instead of saying “all hell will break loose if you make a tiny mistake and overlook this detail”, they could require the bad stuff to actually be caught. Till then, catching such things is very much allowed wherever POSIX says “undefined”.

JonKnowsNothing • July 4, 2024 7:14 PM

@ Who?

re: [using] a centralised source code

This by itself will not fix the problem, because of code forks and spinoff projects. Rarely is code fully updated between code branches.

If the code is considered fixed & stable code, with no errata then having a centralized system can assist in that any changes to that code base will propagate into different branches but only if those branches request a full code update.

There are lots of projects that never request an update because the devs do not realize a change affected them; the OBJ Linker files are considered fixed-standard so they are not recreated on build day; it adds to the D2D workload of devs to do a daily pull down and it adds time to the build and compile process.

It’s really a function of being “uninformed as to truer state of the code” and while there are timestamps and push-pull notices of code changes, in a fairly large development environment there are so many of these they are ignored.

Clive Robinson • July 5, 2024 4:06 AM

@ Carmine

Re : Am I calling you…

Still, they [signals] remain quite problematic and poorly understood, and I feel like they’re an area in which the system offers little help.

You mean the,

“There be dragons here”

Type warnings found in,

https://www.man7.org/linux/man-pages/man2/signal.2.html

(Note the change etc dates and how current they are).

A question,

‘Have you ever looked up the formal definition of “chaos”?’

You will find it difficult to find for instance,

https://mathworld.wolfram.com/Chaos.html

Says a couple of things of note. Firstly,

‘”Chaos” is a tricky thing to define. In fact, it is much easier to list properties that a system described as “chaotic” has rather than to give a precise definition of chaos.’

Hmmm that should sound a clarion of caution with red flags waving madly. Secondly,

‘Rasband (1990, p. 1) says, “The very use of the word ‘chaos’ implies some observation of a system, perhaps through measurement, and that these observations or measurements vary unpredictably. We often say observations are chaotic when there is no discernible regularity or order.”‘

Our host @Bruce has written a couple of papers and a book on the subject of “Random number Generation”. As with others who have done similar, he says you need to find a good source of entropy or as many sources as you can of asynchronous activity and use them.

Which is a more workman like way of saying “observation of a system, perhaps through measurement”, where the “measurements vary unpredictably” or nondeterministically.

Yes *nix signals and their issues are a consequence of both history and asynchronicity, for which there is still no solution, and nor is there ever likely to be, if “the laws of nature” we currently use hold.

Speak to designers of embedded systems for industrial control, who still write code in assembler for “ultimate control”, about how they deal with interrupts and having multiple processes that run asynchronously.

Their war stories about lack of control are legion. I’ve a few of my own to do with many apparently unrelated fields. Just two of which are “industrial robots”, and “random bit generators”. The former where you really do not want chaos ever, and the latter where you want rather more than chaos as much as possible.

Ultimately all the real world waveforms (signals) we observe, can be described in the past tense by summing circular functions or “discrete fourier transforms”(DFTs). However all our observations are seen in “a window” of time so we can not know what is either side of that window, so our knowledge is at best uncertain. So we can never make accurate predictions just probabilistic ones that are by definition “chaotic”, and that makes life difficult from time to time.

As some of those idiot life coaches / motivational speakers say,

“Embrace the chaos”, “Make it yours!”

Which is why I take great stock in,

“Great deity I pray, grant me the serenity to accept the things I cannot change, the courage to change the things I cannot accept, and the wisdom to hide the bodies of those people I had to kill today because they hacked me off. Oh and a new chain saw under the Xmas tree as that serenity still eludes me.”

Because they say it ages you less to smile, even manically, than it does to frown 😉

Bob • July 5, 2024 1:21 PM

I’ve half joked that the only thing that would make some of them happy is a domain admin account hooked up directly to unfiltered WAN. If one were to snap her fingers and magically fully secure ASLR immediately, everything would break in that same moment. It’s a herculean task to get devs to stop relying on undefined behaviors, let alone deprecated ones.

For those entering the infosec field today, it’s a shame that comp-sci is taught as an auxiliary skill if at all. At some point in your career, you’ll almost certainly run into situations where it’s beneficial to be able to show them (and the C-suite, god help you) where their assumptions stopped being good circa 2002.

Carmine • July 5, 2024 3:37 PM

@ Clive Robinson

You mean the, “There be dragons here” Type warnings

Yes and no. As you say, asynchronous events bring a certain amount of complexity that cannot be avoided. But, in general, complex systems have a mix of “inherent” and “artificial” complexity.

If you compare how processors handle hardware interrupts, there are significant differences in ease of use. For example, some can automatically switch stacks, save registers, manage nesting and priorities, and so on. Others leave all that stuff to the operating system designer. For x86, one can consider the difficulty of handling double-faults (such as kernel stack overflows), the subtle security vulnerabilities related to SYSRET, the non-nestability of SWAPGS, and basically all the details Intel’s “FRED” is intended to fix.

There are really several classes of “dragons” relating to Unix signals. One is just arbitrary historical differences. It’s understandable given the history, and while POSIX could’ve done a better job (like by refusing to standardise ill-defined functions, including signal(), and by using a prefix for the replacements), these problems are basically avoidable: use sigaction()—and set SA_RESTART—instead of signal(), pthread_sigmask() instead of sigprocmask(), etc. But then there’s the whole idea of handlers being per-process rather than per-thread, which makes them very difficult to use in libraries (which, as a bonus, can’t assume their application set SA_RESTART). That’s just bad design, still not fixed, that’s not inherent in the nature of interrutibility. The same goes for the refusal to save errno; the problems could’ve been easily predicted. I saw an interesting and related story on Hacker News yesterday: the fields of siginfo_t overlap on some systems, few people know what’s valid when, and the implementors of Java screwed it up. For reference:

If the signal was not generated by one of the functions or events listed above, si_code shall be set either to one of the signal-specific values described in XBD <signal.h>, or to an implementation-defined value that is not equal to any of the values defined above.

If si_code is SI_USER or SI_QUEUE, [XSI] [Option Start] or any value less than or equal to 0, [Option End] then the signal was generated by a process and si_pid and si_uid shall be set to the process ID and the real user ID of the sender, respectively.

In addition, si_addr, si_pid, si_status, and si_uid shall be set for certain signal-specific values of si_code, as described in XBD <signal.h>.

If si_code is one of SI_QUEUE, SI_TIMER, SI_ASYNCIO, or SI_MESGQ, then si_value shall contain the application-specified signal value. Otherwise, the contents of si_value are undefined.

Got that? The separate page for signal.h says a bit more: “In addition, the following signal-specific information shall be available”, and si_addr is listed for SIGSEGV and SIGBUS, but it’s not actually true on Linux! Is that a Linux defect, or should the specification have said it’s only valid for system-generated signals of those types? The latter, I suspect, because POSIX doesn’t give any way to set si_addr when sending a signal, and it does kind of describe the opposite case:

On systems not supporting the XSI option, the si_pid and si_uid members of siginfo_t are only required to be valid when si_code is SI_USER or SI_QUEUE. On XSI-conforming systems, they are also valid for all si_code values less than or equal to 0; however, it is unspecified whether SI_USER and SI_QUEUE have values less than or equal to zero, and therefore XSI applications should check whether si_code has the value SI_USER or SI_QUEUE or is less than or equal to 0 to tell whether si_pid and si_uid are valid.

Here’s a fun one: “If the process is multi-threaded, or if the process is single-threaded and a signal handler is executed other than as the result of [any of several synchronous function calls,] the behavior is undefined if the signal handler refers to any object other than errno with static storage duration other than by assigning a value to an object declared as volatile sig_atomic_t”. Note that only assigning a value is allowed; no variable apart from errno may be read, even if it’s declared _Atomic or const. And that type may not be large enough to store a pointer. In this case, we’re lucky no compiler is so strict.

A signal-like interface will never be easy, but this is a real mess, and much of it’s fixable. Like, sigaction(SA_THREAD | signum, …) could get and set a per-thread handler, for which SIG_DFL would reference the per-process handler. Easy to implement and understand, with a very low chance of breaking anything. And we could require pid and uid in siginfo_t to be -1 when not available; the definitions of fork() and setreuid() make those values effectively reserved already (making the fields contain -1 may break binary-compatibilty, thereby being about as difficult as the time64 transition; but having POSIX define siginfo_get_uid() and such would be trivial and would benefit new code).

lurker • July 5, 2024 5:22 PM

https://www.qualys.com/2024/07/01/cve-2024-6387/regresshion.txt

In our experiments, it takes ~10,000 tries on average to win this race condition; i.e., with 10 connections (MaxStartups) accepted per 600 seconds (LoginGraceTime), it takes ~1 week on average to obtain a remote root shell.

This is for an 18yr old Debian. The numbers vary and TTF reduces to a few hours for modern systems.

These are new connections per try, so MaxAuthTries in sshd_config cannot mitigate. But what about fail2ban and iptables as defence in depth? We don’t yet need @Clive’s “ultimate solution.”

MarkH • July 8, 2024 3:03 AM

I’ve found SSH to be extremely useful.

I was also surprised (at first) by the number of disclosed vulnerabilities in various SSH versions — all of which I learned by looking a this blog.

For years, accordingly, my operating premise has been that SSH is somewhat porous. I won’t set up a public-facing SSH server without cryptographic port knocking — surely not practical for many use cases, but suitable for mine.

ResearcherZero • July 8, 2024 3:18 AM

@lurker, @Mark

fail2ban is probably the easiest solution

Most Linux distributions and many open source projects have memory-unsafe languages and code or dependencies. Likely to be many more vulnerabilities waiting to be discovered.

There are still many long outstanding bugs in the kernel yet to be patched.

fib • July 8, 2024 10:45 AM

I gather it is a 32-bit problem.

Clive Robinson • July 9, 2024 6:13 AM

@ fib,

“I gather it is a 32-bit problem.”

That rather depends on how you look at it…

I’ce not given it more than a passing glance so far, but as reported, it’s a problem that is caused by a human failing on a known bug / vulnerability that is general not specific.

So the fact it is only reported on some 32bit systems, not other systems is by chance, or as I understand it more correctly probability.

The argument given by some is that ASLR on 64bit systems is different because it’s bigger thus the probability of success is way smaller.

If you think about it such an argument is not a good thing. Because it does not mean that it is “not vulnerable” just improbably so currently (a change elsewhere could blow that out of the water).

Clive Robinson • July 9, 2024 7:34 AM

@ ResearcherZero, lurker MarkH, All,

Re : Host based log tools that IP block.

“fail2ban is probably the easiest solution”

Fail2Ban is in some ways better than DenyHosts. They both use a variation of the “Garden Path” design I described some years ago now of “instrument and deny” before your front door.

However they both have issues.

Firstly they are both written in Python that is not the choice I would use for a whole heap of reasons (requiring vast amounts of disk space being just one).

But the way they work is it’s self problematic. That is they are reliant on the logging process and so use quite a few CPU cycles etc to be burned and have significant latency.

Thus they are “slow to be proactive” and inefficient.

Which is acceptable for modern PC based systems, but not older PC or embedded systems, that get used as the “front door” systems in many cases.

Carmine • July 9, 2024 11:39 AM

More fallout from this bug: CVE-2024-6409: OpenSSH: Possible remote code execution in privsep child due to a race condition in signal handling

That one seems to be only in Red Hat and Fedora. It’s the same root cause: a non-signal-safe logging function called from a signal handler (for SIGALRM; they could’ve just left that at SIG_DFL and had the parent log on seeing CLD_KILLED).

MarkH • July 9, 2024 9:58 PM

@ResearcherZero:

For this vulnerability — or any other requiring a large number of connections — I’d expect a tool like fail2ban to offer at least a degree of protection.

There have been vulnerabilities requiring very few connections, or even just one.

With port knocking in place, both the server and the knocking gate must be penetrated. I suppose that this considerably reduces the probability of harm.

Clive Robinson • July 10, 2024 7:57 AM

@ Carmine, ALL,

Re : CVE-2024-6409

More fallout from this bug

Just remember the old saying,

“Bad news comes in threes”

Actually it’s almost always an “odd number” like 1, 3, 5 etc. The reason is tied to the same logic of

“The third wish undoes the harm of the first two wishes”.

The more upto date explanation is when there is an issue, or other unexpected event, some humans have a habit of doing a “dive right in” on a hunch / gut feeling etc. Without thinking things through, or investigating properly (it’s part of “looking like a hero” machismo). Part of this is thinking “errors are singletons” often they are not, because “humans are creatures of habit”.

Often what those diving in chose to do/select goes wrong due to side effects etc, so they “quick fix” again, which also has the habit of “going wrong” in a different way.

It’s only then that they think of sitting back and working their way through logically… and drawing graphs of what happens on white boards even timing/signal flow charts etc.

So then you get the more measured “third wish fix” effect.

Then if it goofs, haste appears again and fixes tend to come in pairs till finally “the beast is slain” hence fixes tend to hit odd numbers not even…

Carmine • July 10, 2024 12:28 PM

Often what those diving in chose to do/select goes wrong due to side effects etc, so they “quick fix” again, which also has the habit of “going wrong” in a different way.

It’s an interesting theory, but apparently false in this case. It seems that Solar Designer, being privy to the previous “embargoed” report, looked for similar problems and found one; then agreed to delay the publication, for the convenience of the packagers.

Now that the reports are public, “everyone” will be combing through the signal handlers of OpenSSH and other important software projects, open-source or not.

It’s a common pattern in security research. Remember the original publication of the “Meltdown” and “Spectre” vulnerabilities, 6.5 years ago? They were the first-known major ones of that type that were not caused by hyper-threading; and, since then, we’ve seen a constant stream of such attacks. Older people may remember the original publication of “Smashing The Stack for Fun and Profit” and the decade of easy stack-smashing attacks that followed (which, due to compiler hardening and such, have turned into much harder—thus less fun but more profitable—stack-smashing attacks).

I’ll be surprised if this stops at any small odd number. I expect the attacks will just migrate to progressively less important software. “Sanitizers” and other system-level mitigations will come; the only question is whether the “good guys” will have them.

vas pup rejected • July 10, 2024 4:20 PM

https://nocamels.com/2024/07/google-picks-israeli-infrastructure-security-startup-for-its-ai-program/

“A Ramat Gan startup that combats cyberattacks against key institutions and
infrastructure has been selected by Google for its prestigious AI program to promote and support promising startups whose technology is based on artificial
intelligence.

The platform collects information from sensors placed around a company’s
infrastructure. It uses its algorithms to analyze millions of pieces of data every day in order to spot any anomalies that either point to suspicious activity or even just a fault in the system.

It is currently in use by Israel’s national water carrier, Mekorot.

“IXDen’s remarkable journey showcases the vibrancy of Israel’s high-tech sector,” said Google.

“We are thrilled to announce our acceptance into Google for Startups’ AI
program,” said Harel.

“This grant will be a significant catalyst for us, enabling us to advance our
ambitious goals in the global markets and strengthen our position as leaders in AI-powered critical infrastructure security.”

Bob • July 15, 2024 10:19 AM

@Clive

If you think about it such an argument is not a good thing. Because it does not mean that it is “not vulnerable” just improbably so currently (a change elsewhere could blow that out of the water).

You just described encryption.

New Open SSH Vulnerability

Comments

Leave a comment Cancel reply