Entries Tagged "open source"

Page 1 of 3

Fake Signal and Telegram Apps in the Google Play Store

Google removed fake Signal and Telegram apps from its Play store.

An app with the name Signal Plus Messenger was available on Play for nine months and had been downloaded from Play roughly 100 times before Google took it down last April after being tipped off by security firm ESET. It was also available in the Samsung app store and on signalplus[.]org, a dedicated website mimicking the official Signal.org. An app calling itself FlyGram, meanwhile, was created by the same threat actor and was available through the same three channels. Google removed it from Play in 2021. Both apps remain available in the Samsung store.

Both apps were built on open source code available from Signal and Telegram. Interwoven into that code was an espionage tool tracked as BadBazaar. The Trojan has been linked to a China-aligned hacking group tracked as GREF. BadBazaar has been used previously to target Uyghurs and other Turkic ethnic minorities. The FlyGram malware was also shared in a Uyghur Telegram group, further aligning it to previous targeting by the BadBazaar malware family.

Signal Plus could monitor sent and received messages and contacts if people connected their infected device to their legitimate Signal number, as is normal when someone first installs Signal on their device. Doing so caused the malicious app to send a host of private information to the attacker, including the device IMEI number, phone number, MAC address, operator details, location data, Wi-Fi information, emails for Google accounts, contact list, and a PIN used to transfer texts in the event one was set up by the user.

This kind of thing is really scary.

Posted on September 14, 2023 at 7:05 AMView Comments

Open-Source LLMs

In February, Meta released its large language model: LLaMA. Unlike OpenAI and its ChatGPT, Meta didn’t just give the world a chat window to play with. Instead, it released the code into the open-source community, and shortly thereafter the model itself was leaked. Researchers and programmers immediately started modifying it, improving it, and getting it to do things no one else anticipated. And their results have been immediate, innovative, and an indication of how the future of this technology is going to play out. Training speeds have hugely increased, and the size of the models themselves has shrunk to the point that you can create and run them on a laptop. The world of AI research has dramatically changed.

This development hasn’t made the same splash as other corporate announcements, but its effects will be much greater. It will wrest power from the large tech corporations, resulting in both much more innovation and a much more challenging regulatory landscape. The large corporations that had controlled these models warn that this free-for-all will lead to potentially dangerous developments, and problematic uses of the open technology have already been documented. But those who are working on the open models counter that a more democratic research environment is better than having this powerful technology controlled by a small number of corporations.

The power shift comes from simplification. The LLMs built by OpenAI and Google rely on massive data sets, measured in the tens of billions of bytes, computed on by tens of thousands of powerful specialized processors producing models with billions of parameters. The received wisdom is that bigger data, bigger processing, and larger parameter sets were all needed to make a better model. Producing such a model requires the resources of a corporation with the money and computing power of a Google or Microsoft or Meta.

But building on public models like Meta’s LLaMa, the open-source community has innovated in ways that allow results nearly as good as the huge models—but run on home machines with common data sets. What was once the reserve of the resource-rich has become a playground for anyone with curiosity, coding skills, and a good laptop. Bigger may be better, but the open-source community is showing that smaller is often good enough. This opens the door to more efficient, accessible, and resource-friendly LLMs.

More importantly, these smaller and faster LLMs are much more accessible and easier to experiment with. Rather than needing tens of thousands of machines and millions of dollars to train a new model, an existing model can now be customized on a mid-priced laptop in a few hours. This fosters rapid innovation.

It also takes control away from large companies like Google and OpenAI. By providing access to the underlying code and encouraging collaboration, open-source initiatives empower a diverse range of developers, researchers, and organizations to shape the technology. This diversification of control helps prevent undue influence, and ensures that the development and deployment of AI technologies align with a broader set of values and priorities. Much of the modern internet was built on open-source technologies from the LAMP (Linux, Apache, mySQL, and PHP/PERL/Python) stack—a suite of applications often used in web development. This enabled sophisticated websites to be easily constructed, all with open-source tools that were built by enthusiasts, not companies looking for profit. Facebook itself was originally built using open-source PHP.

But being open-source also means that there is no one to hold responsible for misuse of the technology. When vulnerabilities are discovered in obscure bits of open-source technology critical to the functioning of the internet, often there is no entity responsible for fixing the bug. Open-source communities span countries and cultures, making it difficult to ensure that any country’s laws will be respected by the community. And having the technology open-sourced means that those who wish to use it for unintended, illegal, or nefarious purposes have the same access to the technology as anyone else.

This, in turn, has significant implications for those who are looking to regulate this new and powerful technology. Now that the open-source community is remixing LLMs, it’s no longer possible to regulate the technology by dictating what research and development can be done; there are simply too many researchers doing too many different things in too many different countries. The only governance mechanism available to governments now is to regulate usage (and only for those who pay attention to the law), or to offer incentives to those (including startups, individuals, and small companies) who are now the drivers of innovation in the arena. Incentives for these communities could take the form of rewards for the production of particular uses of the technology, or hackathons to develop particularly useful applications. Sticks are hard to use—instead, we need appealing carrots.

It is important to remember that the open-source community is not always motivated by profit. The members of this community are often driven by curiosity, the desire to experiment, or the simple joys of building. While there are companies that profit from supporting software produced by open-source projects like Linux, Python, or the Apache web server, those communities are not profit driven.

And there are many open-source models to choose from. Alpaca, Cerebras-GPT, Dolly, HuggingChat, and StableLM have all been released in the past few months. Most of them are built on top of LLaMA, but some have other pedigrees. More are on their way.

The large tech monopolies that have been developing and fielding LLMs—Google, Microsoft, and Meta—are not ready for this. A few weeks ago, a Google employee leaked a memo in which an engineer tried to explain to his superiors what an open-source LLM means for their own proprietary tech. The memo concluded that the open-source community has lapped the major corporations and has an overwhelming lead on them.

This isn’t the first time companies have ignored the power of the open-source community. Sun never understood Linux. Netscape never understood the Apache web server. Open source isn’t very good at original innovations, but once an innovation is seen and picked up, the community can be a pretty overwhelming thing. The large companies may respond by trying to retrench and pulling their models back from the open-source community.

But it’s too late. We have entered an era of LLM democratization. By showing that smaller models can be highly effective, enabling easy experimentation, diversifying control, and providing incentives that are not profit motivated, open-source initiatives are moving us into a more dynamic and inclusive AI landscape. This doesn’t mean that some of these models won’t be biased, or wrong, or used to generate disinformation or abuse. But it does mean that controlling this technology is going to take an entirely different approach than regulating the large players.

This essay was written with Jim Waldo, and previously appeared on Slate.com.

EDITED TO ADD (6/4): Slashdot thread.

Posted on June 2, 2023 at 10:21 AMView Comments

Securing Open-Source Software

Good essay arguing that open-source software is a critical national-security asset and needs to be treated as such:

Open source is at least as important to the economy, public services, and national security as proprietary code, but it lacks the same standards and safeguards. It bears the qualities of a public good and is as indispensable as national highways. Given open source’s value as a public asset, an institutional structure must be built that sustains and secures it.

This is not a novel idea. Open-source code has been called the “roads and bridges” of the current digital infrastructure that warrants the same “focus and funding.” Eric Brewer of Google explicitly called open-source software “critical infrastructure” in a recent keynote at the Open Source Summit in Austin, Texas. Several nations have adopted regulations that recognize open-source projects as significant public assets and central to their most important systems and services. Germany wants to treat open-source software as a public good and launched a sovereign tech fund to support open-source projects “just as much as bridges and roads,” and not just when a bridge collapses. The European Union adopted a formal open-source strategy that encourages it to “explore opportunities for dedicated support services for open source solutions [it] considers critical.”

Designing an institutional framework that would secure open source requires addressing adverse incentives, ensuring efficient resource allocation, and imposing minimum standards. But not all open-source projects are made equal. The first step is to identify which projects warrant this heightened level of scrutiny—projects that are critical to society. CISA defines critical infrastructure as industry sectors “so vital to the United States that [its] incapacity or destruction would have a debilitating impact on our physical or economic security or public health or safety.” Efforts should target the open-source projects that share those features.

Posted on July 27, 2022 at 7:03 AMView Comments

Developer Sabotages Open-Source Software Package

This is a big deal:

A developer has been caught adding malicious code to a popular open-source package that wiped files on computers located in Russia and Belarus as part of a protest that has enraged many users and raised concerns about the safety of free and open source software.

The application, node-ipc, adds remote interprocess communication and neural networking capabilities to other open source code libraries. As a dependency, node-ipc is automatically downloaded and incorporated into other libraries, including ones like Vue.js CLI, which has more than 1 million weekly downloads.

[…]

The node-ipc update is just one example of what some researchers are calling protestware. Experts have begun tracking other open source projects that are also releasing updates calling out the brutality of Russia’s war. This spreadsheet lists 21 separate packages that are affected.

One such package is es5-ext, which provides code for the ECMAScript 6 scripting language specification. A new dependency named postinstall.js, which the developer added on March 7, checks to see if the user’s computer has a Russian IP address, in which case the code broadcasts a “call for peace.”

It constantly surprises non-computer people how much critical software is dependent on the whims of random programmers who inconsistently maintain software libraries. Between log4j and this new protestware, it’s becoming a serious vulnerability. The White House tried to start addressing this problem last year, requiring a “software bill of materials” for government software:

…the term “Software Bill of Materials” or “SBOM” means a formal record containing the details and supply chain relationships of various components used in building software. Software developers and vendors often create products by assembling existing open source and commercial software components. The SBOM enumerates these components in a product. It is analogous to a list of ingredients on food packaging. An SBOM is useful to those who develop or manufacture software, those who select or purchase software, and those who operate software. Developers often use available open source and third-party software components to create a product; an SBOM allows the builder to make sure those components are up to date and to respond quickly to new vulnerabilities. Buyers can use an SBOM to perform vulnerability or license analysis, both of which can be used to evaluate risk in a product. Those who operate software can use SBOMs to quickly and easily determine whether they are at potential risk of a newly discovered vulnerability. A widely used, machine-readable SBOM format allows for greater benefits through automation and tool integration. The SBOMs gain greater value when collectively stored in a repository that can be easily queried by other applications and systems. Understanding the supply chain of software, obtaining an SBOM, and using it to analyze known vulnerabilities are crucial in managing risk.

It’s not a solution, but it’s a start.

EDITED TO ADD (3/22): Brian Krebs on protestware.

Posted on March 21, 2022 at 10:22 AMView Comments

Finding Vulnerabilities in Open Source Projects

The Open Source Security Foundation announced $10 million in funding from a pool of tech and financial companies, including $5 million from Microsoft and Google, to find vulnerabilities in open source projects:

The “Alpha” side will emphasize vulnerability testing by hand in the most popular open-source projects, developing close working relationships with a handful of the top 200 projects for testing each year. “Omega” will look more at the broader landscape of open source, running automated testing on the top 10,000.

This is an excellent idea. This code ends up in all sorts of critical applications.

Log4j would be a prototypical vulnerability that the Alpha team might look for ­—an unknown problem in a high-impact project that automated tools would not be able to pick up before a human discovered it. The goal is not to use the personnel engaged with Alpha to replicate dependency analysis, for example.

Posted on February 2, 2022 at 9:58 AMView Comments

Backdoor Added—But Found—in PHP

Unknown hackers attempted to add a backdoor to the PHP source code. It was two malicious commits, with the subject “fix typo” and the names of known PHP developers and maintainers. They were discovered and removed before being pushed out to any users. But since 79% of the Internet’s websites use PHP, it’s scary.

Developers have moved PHP to GitHub, which has better authentication. Hopefully it will be enough—PHP is a juicy target.

Posted on April 9, 2021 at 8:54 AMView Comments

Open Source Does Not Equal Secure

Way back in 1999, I wrote about open-source software:

First, simply publishing the code does not automatically mean that people will examine it for security flaws. Security researchers are fickle and busy people. They do not have the time to examine every piece of source code that is published. So while opening up source code is a good thing, it is not a guarantee of security. I could name a dozen open source security libraries that no one has ever heard of, and no one has ever evaluated. On the other hand, the security code in Linux has been looked at by a lot of very good security engineers.

We have some new research from GitHub that bears this out. On average, vulnerabilities in their libraries go four years before being detected. From a ZDNet article:

GitHub launched a deep-dive into the state of open source security, comparing information gathered from the organization’s dependency security features and the six package ecosystems supported on the platform across October 1, 2019, to September 30, 2020, and October 1, 2018, to September 30, 2019.

Only active repositories have been included, not including forks or ‘spam’ projects. The package ecosystems analyzed are Composer, Maven, npm, NuGet, PyPi, and RubyGems.

In comparison to 2019, GitHub found that 94% of projects now rely on open source components, with close to 700 dependencies on average. Most frequently, open source dependencies are found in JavaScript—94%—as well as Ruby and .NET, at 90%, respectively.

On average, vulnerabilities can go undetected for over four years in open source projects before disclosure. A fix is then usually available in just over a month, which GitHub says “indicates clear opportunities to improve vulnerability detection.”

Open source means that the code is available for security evaluation, not that it necessarily has been evaluated by anyone. This is an important distinction.

Posted on December 3, 2020 at 11:21 AMView Comments

DARPA Is Developing an Open-Source Voting System

This sounds like a good development:

…a new $10 million contract the Defense Department’s Defense Advanced Research Projects Agency (DARPA) has launched to design and build a secure voting system that it hopes will be impervious to hacking.

The first-of-its-kind system will be designed by an Oregon-based firm called Galois, a longtime government contractor with experience in designing secure and verifiable systems. The system will use fully open source voting software, instead of the closed, proprietary software currently used in the vast majority of voting machines, which no one outside of voting machine testing labs can examine. More importantly, it will be built on secure open source hardware, made from special secure designs and techniques developed over the last year as part of a special program at DARPA. The voting system will also be designed to create fully verifiable and transparent results so that voters don’t have to blindly trust that the machines and election officials delivered correct results.

But DARPA and Galois won’t be asking people to blindly trust that their voting systems are secure—as voting machine vendors currently do. Instead they’ll be publishing source code for the software online and bring prototypes of the systems to the Def Con Voting Village this summer and next, so that hackers and researchers will be able to freely examine the systems themselves and conduct penetration tests to gauge their security. They’ll also be working with a number of university teams over the next year to have them examine the systems in formal test environments.

Posted on March 14, 2019 at 1:20 PMView Comments

1 2 3

Sidebar photo of Bruce Schneier by Joe MacInnis.