Google Is Using Its Vast Data Stores to Train AI

No surprise, but Google just changed its privacy policy to reflect broader uses of all the surveillance data it has captured over the years:

Research and development: Google uses information to improve our services and to develop new products, features and technologies that benefit our users and the public. For example, we use publicly available information to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.

(I quote the privacy policy as of today. The Mastodon link quotes the privacy policy from ten days ago. So things are changing fast.)

Posted on July 12, 2023 at 10:50 AM22 Comments


jon w July 12, 2023 11:05 AM

Publicly available information doesn’t sound to me as if it could mean Google’s proprietary data.

Clive Robinson July 12, 2023 12:27 PM

@ jon w,

“Publicly available information doesn’t sound to me as if it could mean Google’s proprietary data.”

Google’s “proprietary data” is almost entirely derived from data that is or was not theirs and they view all 2nd and 3rd parties the same… That is all our data is not public, all yours is public…

They view others data they have as “Publicly available” by Google’s long held belief of “anything they can get their hands on”… And if you think about their attitude to book copying and copyright, they care not a jot for legal rights either.

It’s just that people are now starting class action lawsuites over “breach of copyright” on books against LLM’s and Google being the originators of “the transformer” behind them all, are trying the old “preemptive legal excuses” to get manovering room. That is so they can argue things out in court till those bringing action “run out of resources” or negotiate Google better terms…

When you’ve got in practical terms a near unlimited legal budget and your opponent has not… Then as demonstrated by WASP Governments you can commit more or less any legal abuse you like…

Corporations have to be only a little more carefull than Goves, in that it’s tort/civil legislation they can near freely trash and abuse it is after all “only money”… But criminal, not quite so much (unless they can buy it of by paying billion dollar fines that are but a drip from their revenue streams, and are effectively tax deductable with a little creativity).

Oh and it’s not just Alphabet / Google, take a look at the other big corps…

Oh and this might make you smile,

Or if insufficient to get a chuckle,

Both hitting the news in the past day… So plenty more to come.

Post Script July 12, 2023 1:21 PM

I wonder if they have managed to resist plundering gmail and google drive contents like docs, spreadsheets and photos? There’s a LOT of tasty, human, personal, valuable data just sitting there on their servers.

Phillip July 12, 2023 2:48 PM

Cloud providers obviously feel “there is no such thing as a free lunch.” The problem is, they initially behave as though this were the case. Sure, they are happy to collect sufficient data from the user for certain purposes. In the end, any cloud user will eventually be “ordered around.”

Whether one does a data transfer owing to policy changes, or other captive data leveraging, one must always have an exit strategy. That is, these cloud services are actually contributing to the usual IT complexification of “waste-me.” I will not point a finger at any specific cloud provider, several do this.

lurker July 12, 2023 3:52 PM

@Post Script

gmail has been a standing joke since day one. Users can give you war stories of how they get pestered with ads for stuff they just bought wlsewhere …

Clive Robinson July 12, 2023 9:04 PM

@ Phillip, ALL,

“… these cloud services are actually contributing to the usual IT complexification of “waste-me.” I will not point a finger at any specific cloud provider, several do this.”

I suspect it would be both quicker and easier to point at those that don’t…

Likewise those that “on ramp” organisations or individuals, that see every byte of traffic as it passes to/from the cloud.

I’ve always maintained that for most commercial organidations the cloud was a bad idea, from before it was even called “the cloud”. The reality is that you have to go to such extrodinary measures to protect yourself or your organisation it would be less expensive to do it “in house” in all but a very few quite specialized cases.

Y July 12, 2023 10:33 PM

@Post Script

Gmail has privacy policy targeted to its specific context, just like each Google service or product will have privacy policy specific for the product or service.

This becomes apparent when you you receive notifications that such and such product or service has updated its privacy policy.

Email has its own privacy challenges, requirements and expectations that are widely different from Google search.

Ted July 12, 2023 11:10 PM

That is an intense privacy policy. Not necessarily the ‘privacy’ part, though.

(The above link also has archived versions of the policy, and policy change comparisons.)

The repeated mention of third-party integrations has me feeling like the elephant is the room. 🙃

Winter July 13, 2023 2:55 AM

I must admit that I have never expected Google to not use every bit of data they got from me.

And whatever privacy policy they publish, I am sure I do not understand what the text really means legally in terms of what Google can and will do with my data.

I will change my mind when courts really show their teeth and stop Google from doing something, anything.

K.S. July 13, 2023 8:21 AM

Poison your data, it is the only way to be sure. For lazy people, BleachBit includes Clinton-emails auto generator as ‘Make chaff’ feature. I am sure there are others.

bob July 13, 2023 9:21 AM

Google just changed its privacy policy to reflect broader uses of all the surveillance data it has captured over the years:

But, but, Google does not track anyone or gather data on anyone. HOw dARe YoU sAY soMEthIng LiKE thAT…

bob July 13, 2023 9:42 AM

Email has its own privacy challenges, requirements and expectations that are widely different from Google search.

That would be the case if Alphabet actually followed their own privacy policies. Or, maybe they do but their “third-party” subsidiaries do not. The only “challenges” they have is on on how to maximize the use of the stuff while minimizing legal responsibility and bad PR for themselves. But of course since it is all read by automated processes (which gather it) to “protect the user” they are obviously not doing anything wrong…

Post Script July 13, 2023 9:49 AM

Maybe we’ll see future lawsuits from authors who keep their unpublished drafts in Google drive.

Clive Robinson July 13, 2023 11:43 AM

@ K.S., ALL,

Re : Fitting your hemp neck tie.

“Poison your data, it is the only way to be sure.”

It’s a very unwise thing to do.

Firstly they will know your phone has such an app on it in one of two ways,

1, All such Apps are realy bad at what they do so are easily spotted by automated systems.
2, Most apps will get aquired by the average person in a way that is directly tracable.

Thus you’ve painted a target on your back that at some time in the feuture will probably be used against you.

If you have such an app on your phone there is no way it can be made to make you look good only bad. In short you are one or more off,

1, Hiding something
2, A paranoid person
3, Someone interfearing with honest investigators work.

And several more that are easy sells for a prosecutor that you are a bad / deviant / dangerous person so “guilty of something”. Which means you’ve effectively “hung yourself”.

I just wish people would stop trying to sell the “lift the noise floor to hide the signal” thinking, because signal processing will catch you out unless you realy know what you are doing, and many experts don’t…

There is a 2002 paper on stenography that supposadly proves stenography can work by this method. It came up a couple of weeks back in the AI Stego thread,

The problem is the paper has flaws as I’ve mentioned…

vas pup July 13, 2023 6:05 PM

@all: Google/Alphabet gradually moving away from initial motto: “Don’t be evil” to “Pecunia non olet”.[Money does not stink]

Elon Musk announces new AI start-up

“Tesla boss Elon Musk has announced the formation of an artificial intelligence

The new company is called xAI, and includes several engineers that have worked at companies like OpenAI and Google.

Mr Musk has previously stated he believes developments in AI should be paused and that the sector needs regulation.

He said the start-up was created to “understand reality”.

Elon Musk was the one of the original backers of OpenAI, which went on to create the popular large language model ChatGPT, which has – often controversially – become
popular for uses such as assisting students with writing homework.

However, the billionaire’s relationship with the company has soured. He has criticized ChatGPT for having a liberal bias.

“What we need is TruthGPT”, Mr Musk tweeted in February.
He also disagrees with how ChatGPT has been run – and its close relationship with
Microsoft. [as I pointed on this blog before Microsoft could screw any good technology by adding financial interest and screw privacy – e.g. Skype]

“It does seem weird that something can be a nonprofit, open source and somehow
transform itself into a for-profit, closed source,” Musk said in a CNBC interview.”

vas pup July 13, 2023 7:15 PM

@all – interesting parallel of bomb and AI in this article:
Who was the real Robert Oppenheimer?

This is extract – enjoy the whole article:

“Einstein would later say: “The trouble with Oppenheimer is that he loves [something that] doesn’t love him – the United States government.” His patriotism and desire to please clearly played a role in his recruitment. General Leslie Groves, the military leader of the Manhattan Engineer District, was the person responsible for finding a scientific director for the bomb project. According to a 2002 biography, Racing for the Bomb, when Groves proposed Oppenheimer as scientific lead, he met with opposition.

Oppenheimer’s “extreme liberal background” was a concern. But as well as noting his talent and his existing knowledge of the science, Groves also pointed out his “overweening ambition”. The Manhattan Project’s chief of security also noticed this: “I became convinced that not only was he loyal, but that he would let nothing interfere with the successful accomplishment of his task and thus his place in scientific history.”

After the war, Oppenheimer’s attitude seemed to change . He described nuclear weapons as instruments “of aggression, of surprise, and of terror” and the weapons industry as “the devil’s work”. ==>At a meeting in October 1945, he famously told President Truman: “I feel I have blood on my hands.” The President later said: “I told him the blood was on my hands – to let me worry about that.”

During the development of the bomb, Oppenheimer had used a similar argument to assuage his own and his colleagues’ ethical hesitations. He told them that, as !!!scientists, they were not responsible for decisions about how the weapon should be used – only for doing their job. The blood, if there was any, would be on the hands of the !!!politicians. However, it seems that once the deed was done, Oppenheimer’s confidence in this position was shaken. As Bird and Sherwin relate, in his role at the Atomic Energy Commission during the post-war period, he argued against the development of further weapons, including the more powerful hydrogen bomb, which his work had paved the way for.”

Same with AI and any new technology. The difference is that AI is in private hands. So, responsibility of application is undivided.

lurker July 13, 2023 10:28 PM

@vas pup
re liberal GPT

All “AI” will be biassed, because the makers will “create it in their own image.”

Who? July 14, 2023 12:39 PM

Not all information is “public.”

I have never agreed with the abominable privacy policy terms of this corporation, but sometimes my emails end in email accounts (sometimes because users have forwarded their corporate emails to these accounts).

Same about other services; when I publish something on the Internet (let us say, this post) it does not mean it is available for whatever use this unregulated corporation wants to do with it. Period.

Google should learn that the Internet is not its playground. Internet is not the property of Google.

K.S. July 14, 2023 3:10 PM

@Clive Robinson

I think you vastly overestimate competence of people likely to go after you. Please don’t take this criticism as an attack, but my impression that in your risk analysis you assign infinite value to yourself. So unless you are Satoshi Nakamoto or one of the Dread Pirate Roberts, it is clearly not reasonable. In my case, I only need to worry about an average-skill prosecutor going after me as a consequence of white-hat research activities. In such scenario, exponentially increasing difficulty of meta-data analysis on encrypted (but we all know backdoored) drive is a worthwhile risk vs. reward. Plus, I am already on the list (and likely so are you).

Clive Robinson July 14, 2023 5:29 PM

@ K.S.,

“I think you vastly overestimate competence of people likely to go after you.”

Err no… as I’ve mentioned before I once had the Prime Minister of the United Kingdom, via the Fraud Squad of the Metropolitan Police try to get me on false charges.

I’d done nothing wrong, other than tell the truth semi-publicly about the competence of British Telecom (BT) staff and she was not happy as it was at that time “property of the people” and she had decided to sell it off for her gain. My honest statments would bring the sale price down, so she wanted not blood but destruction of credibility so she decided I had to become a criminal… and her minions ploted via the Met Police.

They tried what would be entrapment in the US, but I totally refused to play unless certain conditions were met first. Which if they had met them would have ment I could not be charged with fraud. I was unaware that this was their game, but my boss had said if I was going to demonstate the issue to BT staff personaly they should pay me for my time, transport, a meal, etc with an agreed contract so they could not back out of paying and to stick to my guns as being a hero does not put money in the bank.

BT stopped trying when I dug my heals in, and I thought “what a bunch of cheap skates” untill just a short while later two people I knew Steve Gold and Robert Schifreen got grabed by the Met Police, prosecuted and found guilty of fraud, for demonstrating BT had a major security hole in their Prestel System… Then after chating with my boss we both realised what a near escape we both had had.

So with regards,

“but my impression that in your risk analysis you assign infinite value to yourself.”

I know that your entire life could be ruined by events beyond your control. Because back in the 1980’s a conviction for fraud would mean you were unemployable for life… Even with the provisions of the “Rehabilitation of Offenders Act”(ROA74) which allegadly gave people with “spent” convictions and cautions the right not to disclose them when applying for “most” jobs it all turned on the definition of Spent and Most, and the sort of work I’d been doing they would be a permanent problem/disbarment…

So your notion of,

“So unless you are Satoshi Nakamoto or one of the Dread Pirate Roberts, it is clearly not reasonable.”

Is shall we say a little bit naive. The “Might is Right” nonsense is strong in US politics and it cares not if you are guilty or innocent, if they decide you are going to be an example then they will pursue you to death.

So look up Aaron Swartz, he was just 26, and it was just a decade ago, how easily people appear to forget,

Leave a comment


Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via

Sidebar photo of Bruce Schneier by Joe MacInnis.