Privacy of Photos.app’s Enhanced Visual Search
Initial speculation about a new Apple feature.
Initial speculation about a new Apple feature.
Winter • January 6, 2025 4:52 PM
In the link, there is a discussion about the use of homomorphic encryption that, if true, would ensure privacy and be a monumental breakthrough.
The Apple blog mentioned writes:
‘https://machinelearning.apple.com/research/homomorphic-encryption
One of the key technologies we use to do this is homomorphic encryption (HE), a form of cryptography that enables computation on encrypted data (see Figure 1). HE is designed so that a client device encrypts a query before sending it to a server, and the server operates on the encrypted query and generates an encrypted response, which the client then decrypts. The server does not decrypt the original request or even have access to the decryption key, so HE is designed to keep the client query private throughout the process.
I haven’t followed the developements of HE recently, but only a few years ago, HE was worse of a resource hog than AI and Crypto currencies combined. So I really wonder how secure a practical implementation currently is.
lurker • January 7, 2025 1:41 AM
The linked blog has claims that this feature is installed silently and Opt-in by default, with suggestion that this might not be ethical, or in users best interests. This behaviour goes waaay back. Only yesterday I rediscovered a folder of music files that had mangled titles and ID3 tags because I inadvertently happened to be still online during an iTunes update – 20 years ago. It threw out my Preference: Do Not Organize My Music Library, and decided it knew better than me what music I should have. Nothing seems to have changed since Steve left.
As for Apple claiming that the data is encryped client side, and Apple works on the encrypted data and never sees or needs the key: what I didn’t find was any reference to what happens to that encrypted data after the query has finished. Does it just pile up in a “Store it All” policy, in case somebody finds some value in it in future?
Clive Robinson • January 7, 2025 6:10 PM
@ Winter, ALL,
With regards,
“… there is a discussion about the use of homomorphic encryption that, if true, would ensure privacy and be a monumental breakthrough.”
Whilst the use of homomorphic encryption might and I really do mean might be a “breakthrough” it won’t in this application ensure privacy from Apple or others.
Because in this application you are talking about interrogating a very very large database of information to match a building or other feature. That is the users phone will make s series of more specific quires to Apple’s database in the search for a “match”.
This process will tell Apple just about everything about the photo including the angle the photograph is taken from.
It would not be hard from that to work out when and in all probability when the photograph was taken.
So no “homomorphic encryption” won’t protect against this loss of privacy.
As I’m known to observe,
“You have to consider the whole system”,
Looking for weak links is what you should be doing,
“Not admiring boat anchor technical strong points”
That in reality only,
“Weigh things down needlessly and achieve no increase in security or privacy in the whole system”.
Clive Robinson • January 8, 2025 2:49 PM
@ Bruce, Winter, All,
Some years ago you were vocal about both weak and strong links in the chains that form systems.
Did you ever write up an Op Ed or similar about “balancing the links”?
That is as a chain can only ever by as strong as it’s weakest link.
If you can not strengthen that link, replace it, or mitigate it’s weakness, spending time and effort on making other links stronger and stronger may well be wasted effort and other resources.
Arguably as a rational actor the only time you would apparently over strengthen other links are when the link also serves or will serve a role outside of the current chain that forms the system.
On the assumption Apple is behaving rationally, we have to ask the question,
“What other purpose does the ‘homomorphic encryption’ serve?”
One my overly suspicious mind can see that “homomorphic encryption” and it’s need for CPU cycles would be,
“To form a covert side channel by timing or power signature.”
Of which timing would be most advantageous to those without physical access to the device. Whilst power signature would be most advantageous to those with access to the device.
Either way by leaking “shared secrets” and similar without it being obvious to those looking at the other covert channels.
Some years ago I did enough personal research into physical device signatures as a side path to thinking about DRM systems and “Digital Watermarking”. One thing I found was that whilst you can close one side channel by careful but obvious design, unless you were prepared to take a massive efficiency hit, as you shut down one side channel it would almost always make others worse.
Hence,
“Efficiency -v- Security”
Is quite a real problem especially if implementers or testers are not sufficiently adept at certain arcane not widely known “Domain Arts” (some of which certain governments still claim to be classified).
Anon • January 9, 2025 2:25 PM
A lot of comments are understandably annoyed with Apple’s decision to make this opt-out.
I suspect part of the reason for this decision is that Apple uses Wally for Private Nearest Neighbor Search (https://arxiv.org/abs/2406.06761), at least for some features, and this requires a large number of clients to use the service in order to make its differential privacy thresholds computationally feasible; Apple needs a crowd for individual users to hide in, and it’s hard to get a crowd with an opt-in feature.
I don’t think this necessarily justifies Apple’s decision, but it may help explain it.
Clive Robinson • January 17, 2025 8:58 PM
@ Winter,
A post by Matt Green,
Reminded me I owed you further comment on why E2EE and homomorphic encryption are going to ve very bad news with ‘AI Agents’.
First of you might want to read these two papers,
https://eprint.iacr.org/2024/2086.pdf
“End-to-end encryption (E2EE) has become the gold standard for securing communications, bringing strong confidentiality and privacy guarantees to billions of users worldwide. However, the current push towards widespread integration of artificial intelligence (AI) models, including in E2EE systems, raises some serious security concerns.”
Whilst not everyone agrees with the findings it’s 40 odd pages do provide food for thought.
But back to my point about databases revealing what a “Device Side AI” is doing / searching by what it requests from a DB on a server.
Whilst in theory you can build a DB that can make such enquires ambiguous in various ways they are to put it bluntly either no good or grossly inefficient in memory or processing time.
A few years back it was proposed to use a variation on homomorphic encryption to “encompass the database” as well as the in RAM circuit. Whilst still grossly inefficient it represented an improvement but not significantly so.
More recently there have been other improvements. This paper is getting on for 80 pages and I’ve written smaller books 😉 Also getting your head around it means way more than making it as far as “base camp”,
https://eprint.iacr.org/2022/1703.pdf
“Building on top of our DEPIR, we construct general fully homomorphic encryption for random-access machines (RAM-FHE), which allows a server to homomorphically evaluate an arbitrary RAM program ‘P’ over a client’s encrypted input ‘x’ and the server’s preprocessed plaintext input ‘y’ to derive an encryption of the output ‘P(x,y)’ in time that scales with the RAM run-time of the computation rather than its circuit size.”
Does this allow “privacy preserving” with ‘device side scanning’ AI and a ‘central point database’ that would normally allow the equivalent of a ‘Man In The Middle’ attack?
In theory yes, but in practice consider two points,
1, would it be efficient enough to be practical?
2, would the RAM-Circuit be side channel free?
Personally I think not currently in both cases, in the future after a lot of work maybe but I’m doubtful sufficient practical work will be done any time soon if at all. That is based initially on the “Who pays v. Who gains” issue and later things will have moved on sufficiently to make resolution moot.
Winter • January 19, 2025 4:39 PM
@Clive
1, would it be efficient enough to be practical?
That is the one question that determines the future of homeomorphic encryption. I, personally, have yet to see an efficient multi-party HE encrypted search procedure. (Where efficiënt is less than exponential growth of effort)
But I don’t follow the research well, so maybe one can still hope?
Subscribe to comments on this entry
Sidebar photo of Bruce Schneier by Joe MacInnis.
Clive Robinson • January 6, 2025 10:23 AM
@ ALL,
How ever you want to look at it, the re are two things that are undeniable,
1, It’s on device scanning of user data.
2, It phones home to the mothership.
Prior to the bare Knuckle arguments between the previous Executive members who staffed positions in the FBI and DoJ and the more thoughtful privacy advocates over Communications Crypto & E2EE and what by any other name were Enhanced CALEA “back doors”, some were making warnings about “on device access to the User Data” that bypassed encryption and went directly to “plaintext user data”.
Apple previously tried this with the “dog whistle” of “think of the children” and the alleged ability to spot CSAM.
Not unexpectedly Apples system was found to be deficient in many ways. Not least because of both the false positive and false negatives the system had.
Privacy advocates pointed out that one a user data side on device scanning process was implemented, what it scanned for was fairly arbitrary as at the end of the day a file of data is a “bag of bits” and for basic search purposes against a “fuzzy hash”,
“One bag of bits is as good as any other bag of bits.”
So it did not matter if it was searching for images or text or anything else.
Now I don’t know about others, but I regard any search be it by machine or human as a search. The fact sone argue that a machine searching is not a search as defined by the founding fathers is some what idiotic. Because computers or other machines of sufficient capability did not exist at the time of the founding fathers, who actually made a blanket statement about searches not a conditional statement.
The question everyone should ask is,
“Why Apple, having already found device side scanning is unacceptable to the vast majority, are forcing it back in on people who really do not want it?”
Especially as due to the points above it can be seen as a very serious breach of privacy and something that can all to easily be used as the base mechanism to perform illicit unwarranted searches that many would see as being illegal (and are illegal in the EU and other jurisdictions).