Leaving Authentication Credentials in Public Code

Interesting article about a surprisingly common vulnerability: programmers leaving authentication credentials and other secrets in publicly accessible software code:

Researchers from security firm GitGuardian this week reported finding almost 4,000 unique secrets stashed inside a total of 450,000 projects submitted to PyPI, the official code repository for the Python programming language. Nearly 3,000 projects contained at least one unique secret. Many secrets were leaked more than once, bringing the total number of exposed secrets to almost 57,000.

[…]

The credentials exposed provided access to a range of resources, including Microsoft Active Directory servers that provision and manage accounts in enterprise networks, OAuth servers allowing single sign-on, SSH servers, and third-party services for customer communications and cryptocurrencies. Examples included:

Azure Active Directory API Keys
GitHub OAuth App Keys
Database credentials for providers such as MongoDB, MySQL, and PostgreSQL
Dropbox Key
Auth0 Keys
SSH Credentials
Coinbase Credentials
Twilio Master Credentials.

Tags: credentials, vulnerabilities

Posted on November 16, 2023 at 7:10 AM • 30 Comments

Comments

des • November 16, 2023 7:15 AM

There must be some tool for scanning your code to detect this

Björn • November 16, 2023 7:34 AM

I think there are some errors in the post here.

The author of the article is Dan Goodin, and the link should be: https://arstechnica.com/security/2023/11/developers-cant-seem-to-stop-exposing-credentials-in-publicly-accessible-code/

Grahame Grieve • November 16, 2023 7:34 AM

Many of those will be mock credentials for unit tests. No doubt some of mine are in that count. Good luck doing something with them

lassen • November 16, 2023 8:29 AM

Human-Error routinely exists in any activity that one cares to examine. unsurprisingly.

… So this article’s key lesson to all of us here is What ??

(what should we do differently now ?)

Clive Robinson • November 16, 2023 9:09 AM

@ des,

“There must be some tool for scanning your code to detect this”

Err no.

To “detect” something you must be able to “distinquish” it.

All “roots of trust” boil down to being a “bag of bits” like many other “bags of bits”.

Whilst their may be,

1, An implicit structure in,
2, Or meta-data about,

the bag of bits, neither is required or asured to be there.

So “How do you distinquish?”

The same issue has existed with malware for six decades if not more. A problem we’ve tried many ways to resolve and always failed to do. So why expect it to be any different for the likes of tokens that are also roots of trust?

The real issue is the assumption of “tools to go faster / better” to correct human failings can actually exist. We know from the history of compilers and interpreters that some “bad code / logic” will always get past them.

There is no magic wand or pixie dust solutions, much though we might hope otherwise.

Clive Robinson • November 16, 2023 9:31 AM

@ lassen,

“So this article’s key lesson to all of us here is What ??”

Well one is Take the time to do it right.

Another is to “engineer solutions” not an “artisanal pattern” approach

Another is take the “pilot approach” of building and using “check lists” based on history.

Which means the ICT industry has to effectively grow up, by applying the methods of science to design and construction. As happened during the Victorian era to early industry, that gave us “engineers” and relegated “artisans” to less harmful activities in some cases by law.

The reason the ICT industry has got away with not doing what needs to be done is that mostly it’s harms are not instantly highlighted by blood and body bits spread over the walls or countryside (see history of boiler explosions).

However that is only because untill fairly recently nobody has given much in the way of “physical agency” to poorly designed and constructed systems in nearly unconstrained environments.

However that is changing, and the body count is not just rising but being noticed by legislators who are imposing sanctions and restrictions.

Some of which are arguably as harmfull because they get perverted and weaponised into “restriction of trade” practices (For instance I can not call myself an Engineer in many places because I’ve not payed the thiething tax there to those who create very profitable faux-markets).

Geordie W Korper • November 16, 2023 9:42 AM

Expecting people to be good at security is the real problem in my view. Github has secret scanning for public repositories that is rule based and also has AI based scanning in beta. More details can be found at: https://docs.github.com/en/code-security

Many (most?) of the credentials would have been flagged with those sort of tools.

Perhaps someday more repositories will use similar tools.

Lagrandeimage • November 16, 2023 10:01 AM

Of course there are tools to flag Authentication Credentials in repositeries.

But good security is a process, not a tool.

You need to setup a process in order to implement the tool and exploit its results. That takes time and money.

So few do it.

A • November 16, 2023 10:07 AM

One approach is to scan your repository in a git pre-commit hook and block the commit if a regular expression match on said credentials occurs:

https://www.ndss-symposium.org/ndss-paper/how-bad-can-it-git-characterizing-secret-leakage-in-public-github-repositories/

Not foolproof, but the effort to review any false positive is well spent considering the effort required if a secret actually did get published.

Morley • November 16, 2023 10:43 AM

Years ago, a coworker accidentally did that on GitHub. Within minutes we had AWS charges. I guess someone had a script monitoring repo changes, probably across a large number of repositories.

emily’s post • November 16, 2023 11:27 AM

My take – leave the credential in the code maybe even contrive to leave it in several places, but, here’s the kicker, make it the wrong credential ! Have the real credential written by hand on paper where the paper sat on a hard glass plate during the writing, and delivered in person by a trusted courier with the paper locked in a briefcase chained to the wrist. Ha on you, bad guys!

Pro tip: do this for all your environment variables .

Clive Robinson • November 16, 2023 12:29 PM

@ Lagrandeimage,

Re : Tools and measurands.

“Of course there are tools to flag Authentication Credentials in repositeries.”

But they can only flag what they “know” before they are run so,

“Will not pick up that which does not match.”

Whilst it is possible to do the opposite which is,

“Eliminate what is known and flag the rest.”

The result will be a lot for someone with experience to walk their way through.

But there is a third group to consider,

“Those made to look like something else”

They will get through both the previous types of tool.

For instance you want to hide a binary string. Well you can simply base 64 or similar encode it to an ASCII string. To obfuscate that, you can do an old malware trick of first spliting the binary string into two or more valid ASCII strings where adding them “mod two” or similar will give the binary string back.

The point is if someone wants to hide something in the code, they will always be able to “beat the tool” we know this from many decades of malware geting past virus and other malware tools.

Another trick possible on the x86 and other,architectures is to use the duality of,

“Code is data.”

Whereby you take a section of code and by using equivalent machine code produce a “bag of bits” that is both code, and the root of security token.

There is a paper going back to the 1980’s that mentions using “redundancy” to hide information in plain sight. One concequence of which is hiding an RSA encrypted “magic number” in the upper third of bits in a larger PubKey Certificate.

As Adam Young and Moti Yung pointed out in their book, this magic number can be such that it gives a close start point for searching for one of the PQ primes in the PubKey, making a brut force search very fast, for the person who knows what the private key that recovers the “magic number”. Thus the near perfect “golden key / backdoor”.

The consequence of which is that,

“That takes time and money.”

In some cases there can never be enough of either.

As I’ve mentioned before, I’ve used such a method to prove a point about “Code Reviews” on an encrypted communications program.

The point being,

“If your development programer is smarter than your code reviewer, your turkey is rather more than cooked, it’s toast.”

As many will have found the “smart programers” are often the product programing leads, not code reviewers for the “check list”. Because managment think that is the best utilisation of a resource for “shareholder value” in the short term thus their bonus etc.

bl5q sw5N • November 16, 2023 2:57 PM

This and other security problems, and actually all problems of every kind, seem to arise from neglect in program design.

What is the context of the program task, and what is the program intended to do ?

Does it do that, and only that, or also potentially haphazardly a number of other things that weren’t intended? Is there programming in the body that is over-specific when compared to the program intended purpose? Is there some ad hoc structure that solves a practical problem but quietly violates the intended design and introduces unplanned behavior potential ?

It’s a tall order to completely understand the problem context and to state exactly what task is intended, but if this is done, a program free of (logical) bugs can be implemented. In the security context, bug free means secure.

t.bruce • November 16, 2023 9:01 PM

@ des,

There must be some tool for scanning your code to detect this

Yeah, the very tool that GitGuardian, who published the report, are trying to sell us. GitHub also has built-in secret scanning. A common suggestion to programmers is to make their software include some structure in its secrets, maybe a fixed prefix, rather than just a bare UUID or a hexadecimal DSA key for example. I have mixed feelings about the idea that all software should change because its users might do something dumb.

Unfortunately, “bl5q” is right that a lot of programs are just poorly designed: it’s often really difficult to not hardcode a secret into a file. For some, I’ve written scripts that use BubbleWrap to mount a tmpfs into which I’ll write a file containing the secret, after prompting for it or decrypting it via gpg. Others I’ve hooked via $LD_PRELOAD or $PYTHONPATH. It’s absurd that people have given no thought to prompting for credentials, but there you go. I’m still not how sure how one’s meant to set up, say, a Postfix mail server that relays through a password-authenticated host (like Gmail) without writing one’s password into /etc. (It’ll be running as root and need to get the password from a non-root user, which complicates things.)

Ismar • November 16, 2023 9:53 PM

@Clive and others- any tool that might be used to detect any type of pattern can, at best, give you a list of potential matches to check manually. This is because no generic tool can be aware of the full context of the code you write and context is the one thing which is critical at determining matches (matches have meaning under certain context).
So one way is to use an assistant tool but spend time curating it’s ‘matches’ hence increasing costs of maintaining.
Other tools, like GitHub Copilot , have started being more mature in generating secure code and , IMHO, they will be more successful as they have more direct access to the context of the application you’re coding at the time they generate the code reducing potential for introduction of these types of errors.
Nonetheless, always test your copilot generated code as well 😀 before pushing it to a repository

ResearcherZero • November 17, 2023 4:13 AM

@lassen

What is uploaded to PyPi stays there. Only commits containing malicious code are removed.

Your mistakes stay there, including any sensitive information. It’s not just credentials that can be used and abused, the repo can be cloned, modified then used with techniques like type squatting in all manner of malicious behaviour. Supply chain attacks, like the one used against SolarWinds for example.

ResearcherZero • November 17, 2023 4:44 AM

Some examples…

“this attacker embedded malicious scripts deep within the package, within specific functions. This meant that the malicious code would only execute when a specific function is called during regular usage.”

‘https://checkmarx.com/blog/users-of-telegram-aws-and-alibaba-cloud-targeted-in-latest-supply-chain-attack/

The intent of this attack is to provide the expected functionality while exfiltrating access and secret cloud credential keys.
https://blog.phylum.io/cloud-provider-credentials-targeted-in-new-pypi-malware-campaign/

…Another case in which the malware tries to steal credit card information from Chrome.

‘https://jfrog.com/blog/malicious-pypi-packages-stealing-credit-cards-injecting-code/

…And in 2018, researchers discovered 12 malicious Python libraries uploaded on the official Python Package Index (PyPI).

“also contained additional functionality, including the ability to obtain boot persistence and open a reverse shell on remote workstations.”
https://www.zdnet.com/article/twelve-malicious-python-libraries-found-and-removed-from-pypi/

Clive Robinson • November 17, 2023 6:00 AM

@ ResearcherZero, lassen, ALL,

Re : The official “Python Package Index”(PyPI) a garbage can?

“Your mistakes stay there, including any sensitive information. It’s not just credentials that can be used and abused, the repo can be cloned, modified then used with techniques like type squatting in all manner of malicious behaviour. Supply chain attacks, like the one used against SolarWinds for example.”

Yup, it’s just one of several reasons not to use Python in the cavalier way oh so many do.

Another is that Python Packages / libraries are a mess… Oh and there’s a little to much “see how expert I am” C.V. shining and equivalent, that is being “a little to clever” for it’s own good. Which makes it at best difficult for ordinary mortals to be able to read and understand it in a way that helps them spot “errors, ommissions, faults, and maliciousness”.

So the reality is they realy need to be properly,

1, Curated.
2, De-duplicated.
3, Cleaned up.

Not that any of this is realy news, as this near two year old article shows,

https://www.activestate.com/blog/pypi-security-pitfalls-and-steps-towards-a-secure-python-ecosystem/

Another issue that is PyPI related causes harms with it’s assumptions to try and stop other harms…

One such is with name clashes PyPI is assumed global thus to be “the source to use” not a local source. Thus a malicious coder finds or guesses a private name you use they can put a malicious refrence in PyPI and you are as they say, “Oh so screwed up”. That is the name is not fully qualified in ordinary usage…

So yup, time to open the stable door and divert the rivers, but also evade the fate Hercules suffered,

https://www.perseus.tufts.edu/Herakles/stables.html

t.bruce • November 17, 2023 9:28 AM

I don’t understand the concern about “Your mistakes stay there, including any sensitive information.” As opposed to what? You remove your key from PyPI, hope nobody saw it, and go on using it as if it’s secure?

“Your mistakes stay there” is how the internet’s worked for a very long time. Sure, some of us were embarrased to see what Deja News and the Wayback Machine captured, but we got used to it by the end of the 1990s (a few years later, the Usenet archives were extended back to 1981). There are several GeoCities archives. Debian has CD images back to 2002, and mailing list messages to 1994. Perl’s “BackPan” claims to have “A Complete History of CPAN”. PyPI is hardly unique in this.

JonKnowsNothing • November 17, 2023 10:03 AM

@t.bruce, All

re: What’s the problem? I don’t see any problem….

The ostrich is not just in the code.

Not to worry. Lots and lots of folks have the same problem. Probably all of us at some time in our careers did similar. It’s always Somebody Else is doing the security. However, no one really knows who that Somebody Else is or what it is they are actually doing.

I cannot recall any code change that Somebody Else made to my code that involved a security issue. It isn’t that there wasn’t one, probably a lot of them over the years. The only code changes I saw were overall design or marketing directed changes (button v checkbox). I suspect that Somebody Else never existed.

Clive Robinson • November 18, 2023 4:28 AM

@ JonKnowsNothing, t.bruce,

Re: Past mistakes that were not.

“I cannot recall any code change that Somebody Else made to my code that involved a security issue.”

And I’ve seen bug-logs, where all security issues were downgraded so they never got corrective attention.

The problem everyone appears to forget though is new attacks are in the future.

You write code and as far as you know it’s secure by “what is known at the time”. But some time later someone discovers a new avenue of attack and what was secure is now vulnerable.

One such was “side channels” and data leaks via time based signals.

We wrote code with conditional branches etc that took different times, based on the data. Not realising that those time changes leaked information far and wide about the data…

So such code often does not get fixed, but also with PyPI it hangs around, for good.

Which has three major down sides,

1, Vunerable code remains active
2, Vunerable code gets copied
3, It can years later damage a persons reputation.

The second point is one people should think about.

A dirty little secret of the software industry is all the forms of “code reuse”. One of which is the “stack exchange effect” where “example code” that is usually quite deficient for brevity/clarity becomes someone elses production code. Likewise any visable source code has a probability of being cut-n-pasted into other code.

Entire OS functionality has been taken this way. Have a look at Microsoft and “their” networking code… Back a long time ago there was an attack that effected nearly every computer with networking, because they had all copied the BSD networking code… Few appear to remember it or have heard about it but it very clearly illustrated the problem. The more modern examples are entire code bases like log4j that gave rise to,

https://xkcd.com/2347/

But… Without the random person in Nebraska…

Who in real life would get their reputation tarnished, and not be able to do anything about it…

In the EU people have “The right to be forgotton” something tells me that PyPI and similar are not in anyway compliant, and at some point someone will excercise their rights.

t.bruce • November 18, 2023 12:58 PM

So such code often does not get fixed, but also with PyPI it hangs around, for good. […] Vunerable code gets copied […] It can years later damage a persons reputation.

Yes, that’s a good point, but it’s not obvious to me that deleting that code will improve the situation. Replacing it with better code probably will; simply deleting it, without comment, is probably prioritizing personal reputation above security.

Such “deleted” code may still be available via archive.org, Software Heritage, Debian, copies embedded in other projects…. Not all sources will be subject to European privacy laws, and if it’s a multi-author work, I’m not sure that “right to be forgotten” could even be used unilaterally—especially if the other author(s) have licensed it so as to require source publication.

But let’s say that someone fails to find any relevant code to use, and would’ve used the “deleted” code had they known about it. Now what? Probably they’ll write it themselves. Are they likely to do better? If they’re the type of person who would’ve used old vulnerable code without noticing its flaws, I doubt it.

I don’t know what you mean by “Vunerable code remains active”. Windows 3.1 and 95 are vulnerable to all sorts of things. One can find both on archive.org, and I’m sure there are still businesses, somewhere, using those for “important” things. Would it be better to erase them from the internet entirely, and pretend that’s not happening? Some uses, such as in retro-computing, aren’t particularly security-sensitive.

I still haven’t seen anyone explain the benefit of removing a leaked credential. Once leaked, it needs to be considered compromised, and to be revoked. That’s true even if it was leaked only to Github which immediately said “we saw your secret so we rejected your upload”. So, again, removing such data seems more about personal reputation than security; maybe fewer people will notice one’s carelessness, and one will be spared some embarrassment.

JonKnowsNothing • November 18, 2023 2:13 PM

@t.bruce, @Clive, All

re: The never expunged code problem : “deleted” code may still be available

There is no easy way for all deprecated code to be deleted across the entire spectrum of code bases. That should not justify including known errors in on-going code.

There are several reasons known errors continue in perpetuity

The error condition has no easy way to prevent or avoid
There are financial constrains imposed: internal-profit & external-LEA exploitation

LEAs have been know to prevent repair of error conditions because they use them for their own zero-day attacks. It doesn’t matter which country or group, they withhold or mandate maintaining those error conditions.

Outside of corporate and LEA influences, it really is a hallmark of what sort of person you are and how trustworthy you are as a programmer if you have a fixable error, you have the ability to fix it, but you chose not to fix it.

For every fix, there is one less item that can be exploited.

Consider: Heartbleed.
It was introduced into the software in 2012 and publicly disclosed in April 2014

As of 20 May 2014, 1.5% of the 800,000 most popular TLS-enabled websites were still vulnerable to Heartbleed.

As of 21 June 2014, 309,197 public web servers remained vulnerable.

As of 23 January 2017, according to a report from Shodan, nearly 180,000 internet-connected devices were still vulnerable.

As of 6 July 2017, the number had dropped to 144,000

The rhetorical question is:

Would you still knowingly deploy the errant Heartbleed code because there are still 144,000 devices using it?

===

1) ht tps://en.wikipedia.o rg/wiki/Heartbleed

lurker • November 18, 2023 4:21 PM

@t.bruce
“erase them from the internet entirely”

Good luck with that …

t.bruce • November 18, 2023 5:08 PM

That should not justify including known errors in on-going code.

I’m not sure we’re talking about the same thing. Several people have made comments along the lines of “with PyPI [the code] hangs around, for good”. That doesn’t sound like “on-going” to me; more like a dead project whose code is still visible.

If someone deletes the code without explanation, people might just dig deeper to find it. We’re better off to keep the repository around, with the file tree maybe replaced with a big “don’t use this” warning. People could, in theory, pull code from the previous revision, but there’s only so much we can do.

Of course nobody should be shipping code (except for archival/research uses) with known security vulnerabilities that would put the users at risk. “Known errors”, though? The only realistic way to avoid shipping software with those is to stop shipping software entirely—or to use the “head-in-the-sand” approach of ensuring there’s no way for anyone to tell you about errors.

JonKnowsNothing • November 18, 2023 5:27 PM

@t.bruce, All

re: Code Comments

I’m not sure your suggestion of “big “don’t use this” warning” will work very well.

In older days, when you were supposed to document and comment your code in both the source and engineering books, I placed lots of comments about tricky implementations, only later to get a bug report after “another programmer” altered the code incorrectly.

Having asked “did you read the comment?” and getting the answer “no”, it was pretty clear that even in LARGE CAP comments, text warnings, references to design marketing engineering documents were really not going to prevent anything.

It’s almost like the Pink Elephant condition, telling someone Don’t Mess With This, is going to get the opposite reaction.

Ostrich doesn’t work any better with the classic comeback: Why didn’t you document it?

There was once a programmer who didn’t write any documentation or comments at all in their code or engineering books. When asked why not, the answer was “I don’t do documentation”.
Then there were people who wrote lots of documents and kept full audit trails of code, testing, deployment. When charged with a code error, had boxes of reference material to validate the state of the code. Management was not always happy as they wanted a quick sacrifice when things went pear shaped.

You cannot win for losing…

Clive Robinson • November 18, 2023 9:15 PM

@ t.bruce, ALL,

Re : Time moves on and attacks improve.

“That doesn’t sound like “on-going” to me; more like a dead project whose code is still visible.”

Well the BSD networking code is certainly a “Dead Project” and it’s code is not just visable it’s still very much in use. As Microsoft include it along with the NT base code in all their OS’s. Likewise it’s in the Apple Mac code and also Linux and the BSD derivatives out there. Whilst not ubiquitous it’s in most Personal Computers.

Similar is AES encryption code that was effectively “back-doored” by the NSA.

I could go on, but the point is low level or base code especially tends to get included and forgotten. Worse you can not change it because in all probability you will break somebodies system and they will blaim you rather than fix it. We had an example of this when an Open Source developer got sick to death of his work being abused by corporates so he pulled it and chaos followed,

https://dev.to/chaitanyasuvarna/how-a-developer-broke-the-internet-by-un-publishing-his-package-containing-11-lines-of-code-31ei

But it gets worse…

Some of this base code is old enough that when originally released it was not known to have vulnerabilities.

But given sufficient time vulnerabilities were discovered…

What do you do?

Leave it vulnerable, or fix it and risk breaking the Internet…

But log4J showed the “kitchin sink” issue… It started off as a simple bit of code and just kept getting added to as it tried to become all things to all men… The problem with that sort of organic growth in code is vulnerabilities appear like mushrooms… and they did.

So as the old saying has it,

“Time and tide wait for no man”

And the vulnarabilities appear where they were not before. You can call them “black swans” if it makes it sound sexier, but I regard such new vulnerabilities occuring to old code as an inevitability.

t.bruce • November 18, 2023 11:04 PM

I’m not sure your suggestion of “big “don’t use this” warning” will work very well.

Well, the suggestion was also to delete all the other files from the repository (but leave the history), the idea being that maybe it’ll still be the top search result and maybe people will read the warning before trying harder to find the old code. A “git pull” from an ancient clone will also bring the warning and delete the code, as opposed to mysteriously failing while leaving the code. I think it’ll work about as well as such things can work, which is “not very well”; I doubt deleting from PyPI entirely would give a better result.

Clive, BSD networking isn’t a “dead project” so much as a collection of 10 or 20 projects that share a lot of code but are not always paying attention to each other’s ongoing work (you say “not ubiquitous”, but who’s not using it?). And it gives a great example of “you can not change it because in all probability you will break somebodies system”: the TCP “urgent pointer” (for “out-of-band data”) was incorrectly specified at first, such that different implementations disagreed about which byte it pointed to. Thus, the common advice was, and remains, to simply never use it, because nobody uses it (or tests it; that was the basis of “WinNuke”); if absolutely necessary, send only one urgent byte at a time.

Blame is uninteresting. Treat a software failure like a plane crash: figure out what happened and how to prevent it, not whom to scapegoat. (Arguments about that will occur online anyway, and will have about as much importance as arguments about the best Star Trek captain.)

JonKnowsNothing • November 19, 2023 12:10 AM

@ t.bruce, @Clive, All

re: old v new pulls

iirc(badly)

It maybe part of the current situation but there was a problem recently where some malware code got inserted into a common project build. It was code insertion, not a certificate issue. The bad code included a re-direct to a mirror site, that looked legit but was in fact a malware loading page.

When the bad code (one or two lines iirc) was discovered and removed it did not fix the malware hijack because there were hidden redirects in other parts of the pull and build that redeployed the malware redirect.

It seems that perfection is reached not when there is nothing left to add, but when there is nothing left to take away.

Antoine de Saint Exupéry

PaulBart • November 20, 2023 9:49 AM

Boss man says go write Python. I write Python. He says go faster. I go faster. Tests not needed, code review not needed. Get paycheck. Company layoffs for offshoring. Next company. Don’t care.

Bigger companies just offshore security where offshore company has liability. Rinse. Repeat.

Again. Don’t care. Nod head, get pat on back and paycheck, next company.

Schneier on Security

Leaving Authentication Credentials in Public Code

Comments

Leave a comment Cancel reply