AI and the Corporate Capture of Knowledge

More than a decade after Aaron Swartz’s death, the United States is still living inside the contradiction that destroyed him.

Swartz believed that knowledge, especially publicly funded knowledge, should be freely accessible. Acting on that, he downloaded thousands of academic articles from the JSTOR archive with the intention of making them publicly available. For this, the federal government charged him with a felony and threatened decades in prison. After two years of prosecutorial pressure, Swartz died by suicide on Jan. 11, 2013.

The still-unresolved questions raised by his case have resurfaced in today’s debates over artificial intelligence, copyright and the ultimate control of knowledge.

At the time of Swartz’s prosecution, vast amounts of research were funded by taxpayers, conducted at public institutions and intended to advance public understanding. But access to that research was, and still is, locked behind expensive paywalls. People are unable to read work they helped fund without paying private journals and research websites.

Swartz considered this hoarding of knowledge to be neither accidental nor inevitable. It was the result of legal, economic and political choices. His actions challenged those choices directly. And for that, the government treated him as a criminal.

Today’s AI arms race involves a far more expansive, profit-driven form of information appropriation. The tech giants ingest vast amounts of copyrighted material: books, journalism, academic papers, art, music and personal writing. This data is scraped at industrial scale, often without consent, compensation or transparency, and then used to train large AI models.

AI companies then sell their proprietary systems, built on public and private knowledge, back to the people who funded it. But this time, the government’s response has been markedly different. There are no criminal prosecutions, no threats of decades-long prison sentences. Lawsuits proceed slowly, enforcement remains uncertain and policymakers signal caution, given AI’s perceived economic and strategic importance. Copyright infringement is reframed as an unfortunate but necessary step toward “innovation.”

Recent developments underscore this imbalance. In 2025, Anthropic reached a settlement with publishers over allegations that its AI systems were trained on copyrighted books without authorization. The agreement reportedly valued infringement at roughly $3,000 per book across an estimated 500,000 works, coming at a cost of over $1.5 billion. Plagiarism disputes between artists and accused infringers routinely settle for hundreds of thousands, or even millions, of dollars when prominent works are involved. Scholars estimate Anthropic avoided over $1 trillion in liability costs. For well-capitalized AI firms, such settlements are likely being factored as a predictable cost of doing business.

As AI becomes a larger part of America’s economy, one can see the writing on the wall. Judges will twist themselves into knots to justify an innovative technology premised on literally stealing the works of artists, poets, musicians, all of academia and the internet, and vast expanses of literature. But if Swartz’s actions were criminal, it is worth asking: What standard are we now applying to AI companies?

The question is not simply whether copyright law applies to AI. It is why the law appears to operate so differently depending on who is doing the extracting and for what purpose.

The stakes extend beyond copyright law or past injustices. They concern who controls the infrastructure of knowledge going forward and what that control means for democratic participation, accountability and public trust.

Systems trained on vast bodies of publicly funded research are increasingly becoming the primary way people learn about science, law, medicine and public policy. As search, synthesis and explanation are mediated through AI models, control over training data and infrastructure translates into control over what questions can be asked, what answers are surfaced, and whose expertise is treated as authoritative. If public knowledge is absorbed into proprietary systems that the public cannot inspect, audit or meaningfully challenge, then access to information is no longer governed by democratic norms but by corporate priorities.

Like the early internet, AI is often described as a democratizing force. But also like the internet, AI’s current trajectory suggests something closer to consolidation. Control over data, models and computational infrastructure is concentrated in the hands of a small number of powerful tech companies. They will decide who gets access to knowledge, under what conditions and at what price.

Swartz’s fight was not simply about access, but about whether knowledge should be governed by openness or corporate capture, and who that knowledge is ultimately for. He understood that access to knowledge is a prerequisite for democracy. A society cannot meaningfully debate policy, science or justice if information is locked away behind paywalls or controlled by proprietary algorithms. If we allow AI companies to profit from mass appropriation while claiming immunity, we are choosing a future in which access to knowledge is governed by corporate power rather than democratic values.

How we treat knowledge—who may access it, who may profit from it and who is punished for sharing it—has become a test of our democratic commitments. We should be honest about what those choices say about us.

This essay was written with J. B. Branch, and originally appeared in the San Francisco Chronicle.

Posted on January 16, 2026 at 9:44 AM2 Comments

Comments

Rontea January 16, 2026 12:35 PM

Democracy, at its core, is an information system—a collective agreement that the wisdom of the many shapes the governance of the few. Yet somewhere along the way, we gave the keys to that knowledge to gatekeepers. We locked up the raw materials of our shared understanding—academic papers, public research, the very data of our civic life—behind paywalls and proprietary systems. And then, with a kind of quiet irony, we handed that same knowledge to machines and the corporations that own them, granting them the power to harvest and interpret it without returning it to the public.

The result is an inversion of democracy itself: instead of knowledge flowing freely to the people to inform debate, decision, and dissent, it flows upward, into black boxes controlled by a few companies. These new custodians decide what is knowable, what is profitable, and increasingly, what is true. In a system where access to knowledge determines power, the question is not just about copyright or innovation—it’s about whether democracy can survive when the people are denied the tools to understand their own world.

A democracy that defers its knowledge to private algorithms is one that risks becoming a spectator to its own governance. If knowledge is a public good, it should belong to the public—not as a privilege, but as a right.

RIP Aaron Swartz, who reminded us that the fight for open knowledge is the fight for a living democracy.

Clive Robinson January 16, 2026 3:28 PM

@ Bruce,

Some of us remember the actual details of what happened at the time…

Aaron was probably not guilty of any crime other than “theft of electricity” which would have been a misdemeanor.

The prosecutor who forced her way in was acting under political pressure from the executive to “make an example of him” in effect a “witch hunt for a public burning at the stake” as a “warning to all”.

Because the prosecutor knew there was little hope of gaining a conviction in a fair court, she was going for a life wrecking plea deal solution.

So rather than go into court, she quite deliberately delayed and delayed and ratcheted things up by repeatedly putting more threats on the table to pressure Aaron.

In the process of “rights stripping” she was also trying to bankrupt Aaron.

We know this from still available records.

Unfortunatly Aaron was given bad advice by others who had positioned themselves as activists for their own advantage…

So Aaron was caught between a rock and a hard place and ended up in a state that should have been a cause of great concern for those around him. Apparently they claim not to have noticed… So Aaron did not receive the medical help he should have.

The outcome you mention.

People need to remember that the US has the worst figures for the number of people in jail against population size. Because running jails can be very profitable we know judges have taken bribes to get not just the number of people jailed up but the also the length of their sentence. As for conditions well the violence levels are some of the highest and it’s difficult to get figures to say where America stand with regards deaths in prison but we know it’s high.

The US Justice system is clearly “two tier” for those that get pulled into it.

However there is a third upper tier that few think about, that is those that “buy the legislators” so their crimes don’t get effective legislation or regulation if any against them. Throw in politics as well and that is why,

‘But this time, the government’s response has been markedly different. There are no criminal prosecutions, no threats of decades-long prison sentences. Lawsuits proceed slowly, enforcement remains uncertain and policymakers signal caution, given AI’s perceived economic and strategic importance. Copyright infringement is reframed as an unfortunate but necessary step toward “innovation.”’

The simple fact is General AI by Current AI LLM and ML Systems is not going to happen, regardless of scale, or how much money is thrown at them. We now also know they can not be made either “safe or secure” and their potential for harm almost unlimited in comparison. Further no matter what is currently claimed the required if not essential “world view” is not going to happen either.

This also means that AI Agents apart from really trivial tasks are not going to be “safe or secure” thus a very real security nightmare that will prove endless and unfixable with an arms race that will develop where the attackers not the defenders will have the upper hand.

So there really wont be a ‘step toward “innovation”’ that investors, politicians, and the AI Corporates are desperate for currently or in the foreseeable future.

This as I’ve noted on a number of occasions is unfortunate because the US Economy is at best stagnant due to political stupidity and is already in an underlying recession. With the only thing hiding it being clearly visible is “AI froth” that is already turning into scum.

Do the Current AI LLM and ML Systems have useful “innovative” functions, the answer is actually yes but in very limited and specific types of applications. But there are really very of these thus that all important ROI investors want is not there.

I can not predict when or how the AI Hype Bubble will end (expload, implode or deflate). But it’s now abundantly clear there really is no deliverables of worth to support it in the general usage sense thus at some point it’s going to be over.

What I can say is that the “lost opportunity cost” will be immense and the “Tech Sector” as we know it will be forever changed and probably as history shows “not for the weavers” be it code rather than cloth this time.

It’s almost certain the US Corporates will make the same mistake as they’ve done since the 60’s in that they will let “experience and expertise in the US age out”. Then when things get dire try to “go abroad for it” if they can… Which they probably won’t be able to do for various “short sighted” reasons most here can probably work out for themselves.

The question for other Western Nations is, are they going to just,

“Rearrange the deckchairs on what is a ship in distress, or man the lifeboats and cast off? So as to get a good safe distance away whilst they still can, so they don’t get pulled down…”

Keep an eye on European Defence Spending, if they buy US systems then they are probably not heading for the lifeboats… Likewise what they do to resolve the energy crisis that is slowly building.

Leave a comment

Blog moderation policy

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.