Schneier on Security
A blog covering security and security technology.
« A Better Voting Machine |
| Total Information Awareness Is Back »
October 30, 2006
Privacy and Google
Mother Jones article on Google and privacy:
Google Larry Page and Sergey Brin, the two former Stanford geeks who founded the company that has become synonymous with Internet searching, and you’ll find more than a million entries each. But amid the inevitable dump of press clippings, corporate bios, and conference appearances, there’s very little about Page’s and Brin’s personal lives; it’s as if the pair had known all along that Google would change the way we acquire information, and had carefully insulated their lives -- putting their homes under other people’s names, choosing unlisted numbers, abstaining from posting anything personal on web pages.
That obsession with privacy may explain Google’s puzzling reaction last year, when Elinor Mills, a reporter with the tech news service cnet, ran a search on Google ceo Eric Schmidt and published the results: Schmidt lived with his wife in Atherton, California, was worth about $1.5 billion, had dumped about $140 million in Google shares that year, was an amateur pilot, and had been to the Burning Man festival. Google threw a fit, claimed that the information was a security threat, and announced it was blacklisting cnet’s reporters for a year. (The company eventually backed down.) It was a peculiar response, especially given that the information Mills published was far less intimate than the details easily found online on every one of us. But then, this is something of a pattern with Google: When it comes to information, it knows what’s best.
Posted on October 30, 2006 at 12:56 PM
• 50 Comments
To receive these entries once a month by e-mail, sign up for the Crypto-Gram Newsletter.
I like how the article starts out using 'Google' as a verb- something Google wishes we wouldn't do.
I just picked up a copy of Future Shock. What happens if we all wake up one day and Google is gone or the results don't make any sense? That would be strange. We could call it Google Shock.
I think the reason you can't google more data on Brin and Page is that Google has Brin and Page filters that exclude pages based on their names and other criteria.
And these filters are buried deep in the code and tool base. Google for "Reflections on Trusting Trust".
Schmidt doesn't have his own filter because he came to the game too late, and he already had a public profile when he did.
I removed Google's toolbar when I discovered it auto-updated without my permission. I no longer trust their software. Obviously the Google founders don't.
Google's almost-mythical policy to not do any "evil" (whatever that means), was probably easier when they were a small company. Its not so easy to avoid "evil" as a large corporation.
I think there's a dark smoke-filled room somewhere with plush leather chairs for Microsoft and IBM and the like. I wonder if they're dusting off a seat for their newest member...
my favorite part the company is also the do no evil mantra. the nice part is it is pretty flexible. for example, it is not evil to help the Chinese government control internet access by their citizens but it is ok to fight the US government when they are trying to get access to information on child molesters. one also can't fault them for their dedication to a green planet when they are building a massive solar cell installation at their headquarters while at the same time buying a gigantic passenger plane for their corporate jet.
I'm not sure the lack of "paper" trail is all that sinister. I've been online for 12+ years and if you looked me up by my proper name, you would find very little personal information.
If you knew the name I used when I posted to alt.xena.subtext, well, that's a different story...
Anonymity by obscurity helps. My first and family names are both among the most common in the world. Even my middle name is quite common. For instance, there are three of us on this campus alone with exactly the same name. Strangely enough, Googling my exact name yields only two hits from genealogy sites (neither is me). Using my first name, middle initial, and last name gets hundreds of hits, the vast majority of which have nothing to do with me. Using just my first and last name yields over 1.5 million hits. Good luck with that haystack.
I suppose if you knew more about me to begin with, you might be able to cull the haystack and find the needles. But I have yet to find anything I consider personal that I didn't personally put on the internet myself at some point.
Now there is doubtless much info about me available through the internet, but Google alone won't yield much. You'd have to be digging in some more specific databases.
Mutter, have you tried yahooing instead of googling for Brin/Page?
I wonder if I should be happy that intensive searches for myself on the Internet yields no information, or discouraged that intensive searches for myself on the Internet yields no information.
Perhaps a positive side effect of a less interesting existence.
I used to post comments to various sites under a consistent pseudonym. One day, I decided to query my pseudonym, using Google, to see how extensive Google's information was. well I should have known; everything was there. After reading my comments I came to realise that I had actually posted a detailed portrait of my interests and opinions; I felt as if I was looking at a sketch of my soul.
After that, I dumped the pseudonym and have made up new names for every post.
Now, I think data mining will not pick me out easily but there is a real loss here. To switch from an online "presence" to a series of unconnected comments is a backwards step; if we all did this, the web would be the worse for it.
So far, I cannot find anything at all on my real name and I am very happy for it to stay this way but what happens if somebody does decide to post information about me on the net? If you vandalise my property or my car, you may be caught and jailed. If you vandalise my privacy, there is nothing I can do.
Also, if someone decides to start posting a bunch of stuff using your name, it may be falsely attributed to you. Not such a problem if you have a very common name, but if your name is "Xavier Unknown", you may have trouble dissassociating.
Yet there is a category of persons with very common given and surnames, Robert Johnson for example, who benefit not at all from the "obscurity" of sharing names with a large class.
I am referring to the unfortunates whose names appear on the infamous no-fly list maintained by Homeland Security's TSA.
When the oprobrium of one use of the name is attributed to the whole class, as appears to be the prerogative of the government, you will rue the day you were so named by your parents.
For such an important subject area, the article was pretty light on new information.
Google's interactions with governments are of great intereset, but I would also like to know what safeguards exist at Google to prevent scenarios like the AOL search database release.
Do regular Google employees have access to the database? Is the data anonymized in any way to preserve user privacy? What is the overall awareness of privacy issues amongst the Google rank and file?
The issue of individual privacy is bound to hit the fan again.
Makes you wonder what (if any) the protocol is for people who get their names legally changed, to get off the no-fly list.
>I like how the article starts out using 'Google' as >a verb- something Google wishes we wouldn't do.
I wonder why they wouldn't want that? I figure it'd be a kind of mass homage - they've become more than just a household name - they're a household *verb*!
It's testament to their phenominal success.. I remember first seeing Google in high school. I loved how it was devoid of ads and all the other trash creeping in an the time.. and I'm absolutely amazed that it's *still* devoid. That's a *rare* thing nowadays!
Do you put on a new identity for each trip you take to the grocery store?
Data mining is a reasonable thing to be uncomfortable about, but it isn't really that clear that gathering things that a person publishes willingly is a violation of privacy.
I am starting to get fed up with the anti-Google stuff.
Google is a search engine, which makes it easier to index all the stuff which you (or someone else) has made publically available about you.
The most obvious conclusion is that these "stories" are being planted by Google's commercial rivals. So when you complain about Google, just remember you are a cat's paw for Uncle Bill.
Now, if we delete the word "Google" everywhere above and write "search engine" instead, there is more of a story. But note, there isn't an awful lot about Sergey Brin's personal life on Yahoo Search, either. I suspect that the reason isn't some vast Googloid conspiracy, but rather that the guy groks the web and is careful with his privacy.
As, indeed, we all need to be in this age. In that sense, discovering private facts about yourself through a search engine isn't really the search engine's fault; it is, after all, just indexing stuff that someone else leaked (most likely, you). In fact, you could say that the search engines are just Vulnerability Full Disclosure for the rest of us. You're already leaking private information; Google just warns you about it, giving you more of an incentive to fix those buggy form-filling policies.
> there’s very little about Page’s and Brin’s personal lives; ... putting their homes under other people’s names, choosing unlisted numbers,
In fact, the very first page of results returned by Google for query "Sergey Brin" includes his home page at Stanford, which includes his real world home address and phone number. Larry also gets a home page listing, but is obviously more worried about weirdos phoning him at home.
> search engines are just
> Vulnerability Full Disclosure
Well put. Well put.
Perhaps the author didn't query for "Sergey Brin"; the Google results have an image of him in drag. That is quite a bit more personal than something I would want to see about myself on the top of the search results page.
I agree, its a (not so) startling discovery how much is tracked.
That is one of the prime benefits IMO for anonymous blogs, as Bruce has pointed out time and again, privacy is a necessary human right.
If I don't want my boss to know about me blogging about S&M it should be possible for me to do so, yet almost daily I see a call from one government or another to outlaw anonymous blogging.
>Do you put on a new identity for each trip you take to the grocery store?
Why would you if you pay cash for groceries and don't have a 'bonus' card....
I think the problem is one of competence. Your ISP which can correlate multiple site visits could be far more dangerous. But Google is more competent with a subset of the information an ISP could mine. What is really scary is Google's free WiFi concept, and if they scale that out. Tor!
Roger pointed out that if you google "Sergey Brin" you get his Stanford page with address and telephone number. I noticed that you also get his page on Wikipedia, which goes into his childhood and current net worth.
More importantly, I think this article is confusing separate things in order to make a charge of hypocrisy. Having a search engine, where people who decide to look for something can find it, is different than doing a mass-market publication where you shove it under people's noses.
Nowadays I use blackboxsearch.com for searches on Google, Yahoo and MSN. So my searches are never linked with my IP address.
Ahh han... interesting ...!
@X the Unknown: Good point! The govt needs to mandate that all name changes be forwarded to TSA so the list can be searched for matches; then if someone, say John Smith, changes his name to Fred Thompson and the old name is on the list; they need to create a new entry for the new name.
Otherwise Bin Laden could change his name to Bin Laaden and get on planes with no hassle. Well, I mean no EXTRA hassle. Obviously the only people who get on planes with NO hassle is the ground crew. Or homeless people wandering around on the tarmac looking for a dry place to sleep who stumble into the cargo bay by accident.
"... that in nine years of operation, it has never knowingly erased a single search query."
Really? Wow. Sounds like a challenge. Have everyone you know submit a huge long query to google every day, with 1 character changed each time. Maybe we can drive the price of hard drives down.
@Rich, do we know why Google doesn't want Google to be used as a verb? I also us ``Wikipedia it" on a regular basis.
This is an interesting topic... I posted to various newsgroups/forums under the same handle for years. My question is, can a potential employer use what they find against me (prior to interview/appointing me)? Most of what I post is helpful, however, I do occassionally get bored and resort to `Google` or `search this forum, I answered the same question 5 times last week` or `http://www.php.net` or `wikipedia` or http://www.googleityoumoron.com/?... . All of which are valid responses.... (well, they are intended to be, I am attempting to educate the op about these tools).... but will my potential employer view them in the same light. Incidently if my employer doesn't see them in the same light, do I really want to work for them?
@Anonymous (October 31, 2006 03:08 AM),
You have absolute faith in blackboxsearch.com, right?
> You have absolute faith in
> blackboxsearch.com, right?
No, of course not. It is run by the FBI to catch all those trying to stop Google attaching their IP address to their search queries. :)
Don't have absolute faith in anything. Blackboxsearch just claims to be an easy step to change the IP address on your search queries. So simple that anybody can use it.
No, if you want to get *really* serious about anonymising your web usage, you have to do a lot more than just use blackboxsearch.
That should be glaringly obvious to anyone knowledgeable enough to be reading this forum.
"As, indeed, we all need to be in this age. In that sense, discovering private facts about yourself through a search engine isn't really the search engine's fault;"
Perhaps not, but the fact that Google saves your search history forever and links it to your name whenever possible **is** Google's fault. This treasure trove of discoverable information creates an X-Ray of your most personal interests and proclivities but Google won't erase it ever. As a result, it sits their awaiting a secret National Security Letter or a corporate or divorce lawsuit subpoena--or just a data breach.
Is bashing Google for something they have actually done "Google bashing?" I think not.
"...do we know why Google doesn't want Google to be used as a verb?"
They don't want their trademark to fall victim to genericide. If people start "googling" yahoo, the trademark might be weakened or lost. Trademark law is strange in that failing to act to protect your mark can weaken your legal rights to do so in the future.
Regarding in-house filtering, for a long time the #1 traffic driver to my little vanity site was msn, specifically a query on 'bill gates' rendered this page
in the top 20 or so, despite being buried 1000s deep in any other search engine. What are the odds that msn search algorithms randomly buffed up the humanity angle on BG in a way that no other engine did...
OT and can of worms:
"Trademark law is strange in that failing to act to protect your mark can weaken your legal rights to do so in the future."
It only seems strange because corporations pretend that trademarks are their property like copyright. In fact, the protection of a company's mark is designed to protect consumers from being fooled by others pretending to be associated with the company. Thus, once a mark becomes generalized it no longer represents a company in a way that needs legal protection. This is why Microsoft should never have been allowed to trademark the generic term Windows--which was already in use as referring to application windows in non-Microsoft products.
"there’s very little about Page’s and Brin’s personal lives"
Maybe, just maybe, there really isn't very much that is notable about Page and Brin's personal lives. It's certainly not a crime to be monodimensional and yet successful, no? Then again, maybe there is a goldmine of data about their charity work and favorite pets, but their publicist is not yet ready to release the Page and Brin file(s) from beta.
Frankly, I'm a little surprised at this coming from Mother Jones. I would expect them to hold up the right to privacy of individuals while hailing the value of a search engine that brings transparency to public affairs. Granted the lines get blurry, but do they really think it is a bad thing that the founders have a way to stay clean/boring/invisible while building an empire on exposing/indexing others' information?
"it’s as if the pair had known all along that Google would change the way we acquire information"
I think many people who were posting on the web and using early search engines (remember when Scooter first crawled around?) knew all along that the web would change the way we acquire information. Many folks I know cleaned up their online profiles from as early as '93, while others started engineering an online persona (in the same way politicians and movie-stars have public personas). So I wouldn't say this perspective on personal privacy was unusual or nefarious by the time Google started.
> Perhaps not, but the fact that Google saves your search history forever and links it to your name whenever possible **is** Google's fault.
OK, let's see:
1. We were talking about Google providing other people with access to personal information, not collecting it themselves, but OK, I can deal with a 180° topic change.
2. I complained that pawns^Wpeople bash Google by accusing it, and it alone, of practices which are actually common to ALL the other search engines as well. So you responded by accusing Google, and Google alone, of doing something which ALL the search engines do as well...
3. In this case your criticism is even more discriminatory because the practice of recording customer transaction information is a worrisome privacy threat not just from search engines, but from nearly all businesses of all types! The worst offenders, in fact, are credit card companies; and -- unlike Google, which has even fought off court orders -- those slimebags even blatantly sell the data to anyone who asks (including, within the last twelvemonth, the Mafia.)
4. The phrase "links it to your name whenever possible" is ambiguous. If you mean that Google tries to link searches to your real world identity, then this is an extraordinary allegation which will require quite a lot of supporting evidence or I will call "bull". If you mean that they link searches to your pseudonymous Google accounts then yes they do, but:
4a. Once more time for the music: so do all the rest! Why aren't you complaining about MSN doing this too?!?
4b. Most Google users do not have a Google account anyway (I don't know how this compares to Yahoo! and MSN Search; but for MSN Search in particular I suspect most users would have an account); and
4c. They obviously don't care too much about forcing people who are uncomfortable with this, because it is trivial to get around it. In fact with modern browsers, it takes only two clicks to swap between a log-in-to-Google mode (e.g. for reading Gmail) and accept-no-Google-cookies mode (which works perfectly for all activities not specific to an account).
Or use Scroogle: "no cookies | no search-term records | access log deleted within 48 hours" http://www.scroogle.org/scraper.html
Although it lacks certain features and there is no way to prove statement 2 and 3. Scroogle also features a Yahoo scraper.
Ignoring the argument concerning upcoming data retention law in EU, it is probably best to use a EU-based search engine (as in corporation residing there) since the EU has strict privacy laws on selling data to 3rd parties. A legitimate business risks a lot if they break those laws. It may be hard to verify such, though. A Scroogle clone in the EU would be suffice but it'd require a lot of bandwidth, also by design to and from US (ie. Google's servers).
Hence, your statement that all search engines do this is incorrect. Perhaps all those where this is a legal practice and those where it is an illegal practice and where the law is broken. It is hard to verify such broad statements anyway...
Mother Jones -- Is this not the rag of the communist party?
> Hence, your statement that all search engines do this is incorrect.
No, it isn't. Scroogle isn't a search engine, it is just a proxy to Google. There are only a handful of genuine general purpose search engines, a lot of proxies/portals to them, and a few special purpose search engines.
Daniel Brandt is widely regarded as a bit of a net loon. He became vitriolically anti-Google apparently because it ranked his web site lower than he wanted, and he is the main if not only drive behind behind Google Watch.
Despite the fact that Brandt has been a political activist for 30 years and makes his living editing and selling a database of the private details of public people he also became vitriolically anti-Wikipedia because they insisted on creating an article on him. Later he was also blocked him from editing Wikipedia for repeated vandalism:
Brandt may be a bit more than just a net loon. The well-known privacy rights activist Chip Berlet was once also a member of PIR, but long before Google even existed he resigned. He has publically stated that this is because Brandt and another board member were associating with and supporting neo-Nazi organizations.
Sigh. Yes, it is. See my arguments and the 2 examples posted by BillK. Please provide documented evidence Ixquick and Clusty do what you claimed all search engines do.
Change "search engine" to "service providing a service similar to what search engines provide", and Scroogle fits.
That said, I did not know what you stated about the Daniel Brandt. Thanks for the pointer.
"Despite the fact that Brandt has been a political activist for 30 years and makes his living editing and selling a database of the private details of public people"
Please provide documented evidence he still makes his living editing and selling databases of the private details of public people.
For if he did not recently, I assume you also still see IBM as a monopolist. Or Steve Jobs as a LSD user. Or *insert something you did 'wrong' or wrong 30 years ago* [...].
Every time an anti-Google article surfaces, I think of Microsoft and evil minions chuckling and throwing chairs.
Every time an anti-Microsoft article surfaces, I think of freedom and hope that maybe one day, people will resist this convicted monopoly once and for all.
> Change "search engine" to "service providing a service similar to what search engines provide", and Scroogle fits.
No, it doesn't. All Scroogle does is proxy your searches to Google, stripping out cookies on the way. It has no search capability of its own, at all.
In other words, you can achieve exactly the same effect by surfing via a multiuser web proxy or NAT proxy (some very large % of surfers already do so for performance or network reasons), and rejecting Google cookies (trivial to do in any modern browser.) Further, for various reasons that we have discussed here in the past, it is actually unlikely that Google uses IPs for analysing search data, so step a) (the proxy part) is probably superfluous; just turn off cookies.
In fact, using a multiuser web proxy or NAT proxy is much *better* than using Scroogle since it obscures your IP from all the other sites you visit as well (oddly, privacy activist Brandt is happy just blocking Google, and doesn't seem to care about the personal information you might be leaking to all those nice ethical mafia-run pr0n sites).
Oh, actually Scroogle does do one other thing: it also strips out the little text ads that Google puts on the top right of your results page. This is not surprising, since Brandt hates Google so much he probably sees this method of hitting them in the pocketbook as the main benefit. Most people, however, would regard this as seriously unethical.
> Please provide documented evidence Ixquick and Clusty do what you claimed all search engines do.
OK, no problemo. (BTW, Ixquick and Clusty are both metasearch engines, not search engines. The difference is not insignificant but I won't go into it here.)
Cookies: Ixquick and Clusty both set cookies. The longest lived Ixquick cookie lasts for 4 years; 5 others last for one year after the last time you use Ixquick. The longest lived Clusty cookie lasts the maximum time possible, until 2038 , just like Google (source: use them, and check your cookies.) Both Ixquick and Clusty note that their cookies do not contain a UUID, however they do contain quite a lot of other information which could potentially be used for data matching.
Also, like most other metasearchers, Ixquick's revenue model is paid advertisements at the top of some pages, which have embedded referrer information (source: try some searches, "copy link location" on some of the URLs, and decode them somewhere.)
> Please provide documented evidence he still makes his living editing and selling databases of the private details of public people.
OK. Here's one of his order forms:
Simple facts are:
a) Google's privacy policies may not be perfect but are fairly typical for a reasonably ethical internet company, and are a heck of a lot better than the real bottom-feeders;
b) true anonymity on the internet is almost impossible to obtain without serious technical measures; however
c) from most people's point of view an adequate degree of privacy can be achieved by three simple practices and one hard one:
** surfing through a standard web proxy (a simple config setting in any browser, but you need to find the address to use first);
** accepting no third party cookies (in Firefox, a simple config setting);
** deleting all cookies at end of session (in Firefox, a simple config setting);
and the hard one:
** being very careful about entering personal information into web forms.
1. There's a lot of hype about 2038 cookies. If you spend any time maintaining PCs for the security-clueless, you will see that there is a very low probability of any cookie lasting more than about 2 or 3 years. So in practice, there is negligible difference between a 2038 cookie and a 2010 cookie. Further, cookie expiry dates get updated every time you visit the site, so a 12 month cookie will last just as long as a 2038 cookie if you visit the site at least once a year. If you use a site where cookies are obligatory, but don't want them to link sessions, the only solution is to delete them all at the end of every session, regardless of expiry date.
2. In particular the main Clusty cookie is opaque encrypted data over 300 bytes long, while Ixquick includes 6 long lived cookies, including two timestamps accurate to the second (Ixquick claims to average 4 queries per second, so two timestamps are likely to be a unique ID just by themselves) and an encoded prefs cookie with a minimum of 5 subfields and the invitation to set 7 others, including your home postcode.
Excellent advice to help anonymous searching. The earlier comments were only intended to provide quick fixes to help users.
But your hints are much more comprehensive. The problem with that is that full instructions about anonymising frighten people off as they sound much to complex for the average user to cope with. That's why 'quick fixes' (though not perfect) provide a helpful step for users.
I like the TorPark software for anonymous browsing as it is easy to use.
(Windows only at present. Runs under Wine on Linux. A native Linux version is in development).
Is Google a Monster? created by Monsters? I don't know. Only time will tell us, till then, none of us will ever know.
One recent incident.... about a monster calle monster :-) named Monster.com
My account on monster was unavailable... when I called to find out the problem, it seemed that it had been linked to some other email address.
Guess what... the person politiely told me that my problem was fixed...and then proceeded to email my user name and pwd. to her manager.
Now her manager at Monster.com knows my user name and pwd. for him to do whatever he wants with it....
I wonder how many other job web sites / web sites do this....?
Where is privacy and confidentiality when U need it... Or is it just another artifact like the bill of rights... subject to interpretation ....like it always has been
"If you vandalise my property or my car, you may be caught and jailed. If you vandalise my privacy, there is nothing I can do."
OK. Your assertion appears to be that your so-called privacy (i.e. free speech on the internet, in the context you used) can be vandalized by the common Google searcher . . . and it is a crime that should be punished by serving a jail sentence?
Until free and open speech in the public internet forum becomes *property* by some operation of law, you have no property interest and little privacy interest in the commentary you make using a common pseudonym on blogs or message boards. Most sites hosting such things have warnings to this end.
Until the fed or state legislatures or common law determines that citizens searching on Google for information (about whomever or whatever) is a wrongdoing, we're not going to jail for stumbling upon the information you give out.
My issue is that somebody could post information about me without my consent e.g. someone at work posts some photos of me on a night out with the rest of the staff when I was a bit drunk or something else I'd rather just forget about and certainly wouldn't want a future employer to see. That's what I meant by "vandalise my privacy". I accept that once the information is out, there is not a lot I can do about it apart from asking nicely to have it removed.
I agree that if I post something about myself, then I shouldn't complain if people look at it - it's the potential for information about me to leak out against my desire that causes me to use anonymous pseudonyms.
It used to be people had to follow you into the gas station bathroom with a camera, or at least know the guy who followed you into the gas station bathroom with a camera, to know what you did in that gas station bathroom.
Now, any 13 year old with a few minutes to spare and a creative knack for quoted search terms can find all that stuff within minutes. It's a shame. A real shame...
Schneier.com is a personal website. Opinions expressed are not necessarily those of BT.