Schneier on Security
A blog covering security and security technology.
« Demands from Law Enforcement for Google Data |
| The Security of SSL »
October 26, 2011
Cracking the Copiale Cipher
I don't follow historical cryptography, so all of this comes as a surprise to me. But something called the Copiale Cipher from the 18th Century has been cracked.
EDITED TO ADD (11/14): Here's the academic website.
Posted on October 26, 2011 at 6:02 AM
• 20 Comments
To receive these entries once a month by e-mail, sign up for the Crypto-Gram Newsletter.
It's interesting how many different skillsets were required to break it. Makes me wonder what a modern 'cryptoanalyst' is.
Deciphering it doesn't look too difficult to me (a native german). The hardest (or luckiest) part is finding such a document in the first place.
"It has been more than six decades since Warren Weaver, a pioneer in automated language translation, suggested applying code-breaking techniques to the challenge of interpreting a foreign language."
He also did this for biology. As a program officer at the Rockfeller Foundation he promoted the application of ideas taken from mathematics, physics, linguistics, information theory and cryptography to biology to form a field we now call molecular biology, a term he coined. The scientific approach that leads to the 'cracking the genetic code' and beyond comes from Weaver. Of course, it's a metaphor; not a real code.
I couldn't really figure out why this is a big deal. It's basically a classic substitution cipher with a couple neat twists (fake letters and colons). Beyond that, the techniques were no different than doing the cryptoquote in the Sunday paper.
This is still incredibly cool, don't get me wrong - but I don't understand why they're making a big deal about applying a 'new technique'.
The academic website http://stp.lingfil.uu.se/~bea/copiale/
has full text German and English translation as well as table and full page images.
* normal alphabetic characters as nulls (homophones for word-break).
* graphical signs enciphering letters
* homophones for high frequency letters (vowels and key consonants)
* logotype code for key words
The logotypes for words make this a Nomenclator mixed code/cipher. This matches what Kahn would have us expect for practical use in the era from European diplomatic practices in the Enlightenment.
Many such were broken by hand in the 20thC, what is novel here is the computer helped classify the graphic symbols - and the content subject, which is not diplomacy.
Preparing a KWIC index facilitates recovering the logotype codewords, to view ther word contacts. This is classic trench-code / nomenclator breaking.
Note two *lip* and *bigX* take suffixes of 'rey', which would be in modern english 'ry'. *toe* almost always precedes *nee*.
My best guess so far
and it reads fairly well. Encoders errors exist. In one case, *lip* stands for itself, and in few cases 'mason' is not logotyped, ditto 'society'.
*o* might represent a germanic compound for a specific society name.
*tree* is a variant or error for *tri*
*tri* is used as both Lodge and Rite or Ritual.
*gate* and *bigL* stand for their shape.
Were I the encoder, I would have logotyped 'brother', 'apprentice', 'fellow', 'candidate' , 'conduct', 'ceremony', 'cross', 'holy/sacred', 'Scottish' and 'St Andrews' as well. Those words gave context to the logotypes.
And the Voynich Manuscript is next?
Voynich? Nah, how about something more "interesting" and practical like the Beale Ciphers ( http://en.wikipedia.org/wiki/Beale_ciphers ) - describing burried treasure here in Virginia?
I remember going to a talk given at one of the local ACM subcommittee meetings in the 80's given by Dr. Carl Hammer (of UNIVAC) who was trying to break the ciphers and digging up the countryside to find the treasure as a hobby ( http://www.angelfire.com/pro/bealeciphers/... ), but I guess he never found anything (that he'd admit to anyway). One theory was that the two unsolved documents were actually bogus, but as I recall Dr. Hammer's analysis had concluded that their cipher text was not random so might actually contain encrypted text.
Re Voynich ms., the 'Copiale' technique of automatically classifying glyphs / graphics into equivalence classes to prepare digital text from images might actually be applicable.
The question remains if the Voynich ms. is a known natural language in a novel script, or a lost language (unlikely?), or a novelty language, or nonsense.
Textual Statistics and Cipher breaking (with computer support) is only really useful if it is indeed a natural language in peculiar 'disguise', as with the Copiale ms.
If it is instead utter nonsense, a sufficient statistical attack (after a Copiale ms. style digital text preparation) might provide evidence for Voynich ms. being non-structured and badly-randomized by an intelligence trying to be random.
If it is a 'new' language (either a novelty language like Klingon or Elvish, or a lost natural language like Etruscan), it's more like code-book breaking, with rather more thought required.
For such work, software for KWICI and tentative substitution / annotation should help, but that's much less automation than having computer try alternate cipher schemes.
What tools would one use ? A combined KWICI and trial annotation script is what I hacked together in minutes for my Nomenclator discussion up-thread. Nothing fancy, quick little Perl scripts or AWK scripts if one is used to text munging. A full assault on language-as-codebook would require more. Adjacency matrix and affix discovery would also be useful 'preparation' to automate. Perhaps Machine Learning techniques can extract the Grammar automatically, albeit unlabelled.
One expects (hopes?) the fantastical diagrams / imagery interspersed with the text would provide known plaintext in the form of probable words, "cribs", for entry into the lexicon. If however the images are unrelated doodles, mere adornment, then the utter lack of context -- no diplomacy or war that can be assumed to be topic of most encoded cablegrams that aren't proforma monthly reports -- deprives one of most usual entries into a code-book.
Sadly, it could be both nonsense and a novelty language, if the hoaxer developed a grammar (which might be recoverable by adjacency matrices) but not a semantics, leaving the grammar as a trap into which the would be cryptanalyst may pour his own meaning.
The problem with interlocking secret societies is:
One 'O' to guide them,
One 'O' to bind them,
One 'O' to rule them all.
(with apologies to the Inklings of XX)
correction to self - On reading the full 'Copiale' academic paper, I see the 'science writer' summary I saw over-stated the amount of automation in the Copiale ms. prep work.
Same technique would work for Voynich, but no magic.
Does centurys old information still have the same freshness, the same savor and flavor say of centurys old wine?
@ BF Skinner
Depends on the wine. I would not recommend coming anywhere near a 200-year old Beaujolais Nouveau.
what excactly this book is telling? what is about?
@nessss "what excactly this book is telling? what is about?"
It's a ceremonial manual for some branch of Freemasonry, from when it was more secretive. Paper watermarks they say date it to mid 17xx's, the 18xx end-paper date is presumed just a later acquisition date.
It does not specify which Rite or branch it is from, but whatever it is recognizes Scotch rite as brotherly rite, so is not that one. Might be York, might be antecedent of Shriner, or, given Germanic locale and ophthalmic imagery, it might be the germ of truth behind the legend of the Bavarian Illuminati. Or a dead end branch with no modern descendants.
The academic paper notes that a second copy was found in a northern European library. It does not indicate how exact was the match.
The academic website has German and English translations. Up-thread here is a proposed key for the remaining logotypes or keywords in the Nomenclator (=a mixed small code and simple cipher with homophones and nulls, used by Courtiers in pre-modern periods, precursor to Trench codes of WW1).
(While the code for one logotype is *lip* , given the ophthalmic theme of the initiation ordeal, I would be sure it is an *eye* that codes for their more illuminated branch of Freemasonry, possibly Illuminated Freemasonry?)
Interesting. After perusing the paper, the cipher appears to be very similar in concept to the one used by the Zodiac Killer a few decades ago.
After reading the manuscript translation into English, several points seem to stand-out.
(1) First those who created this secret society appear to have in-gathered ritual to "see hiow much ceremony" they could get someopne to swallow simply for the fun of being a "secret initiate".
(2) The obsessive nature of those seeking Secret Ritual Pornography, and its Omerta, is re-enforced by the statement that if the initiate "breaks the rules against disclosure" he, or she, will be subject to sexual humiliiation by the society is reminiscent of the Story of O.
(But all this is for the amusement of the founders, and ther seduction and meeting at Masque Balls of the initiates. Shades of Eyes Wide Shut.)
(3) The real business of the Secret Society is revealed in the gang sign and street communications rules, taught to rocker novices, and also taught and practiced differently to full patch members, which conform to practice of current street gangs, and the pre-cautioned use of controlled Q & A, like that of Con Men. Further they describe a store devoid of any marking, a pre-0cursor to the Big Store a century later.
(4) Then there is the only true test of a novice, in their ability to assess and spot other native cons, as opposed to bringing in either an agent of a ruler, or a cop, or another gang member, or someone who simply blabs. Then both are shut out forever, as incommunicados.
Conclusion: These sound like German mowhawks and street cons.
Ah, one of the seven grails of cryptography:
This code is just a hoax!
Wired's Danger Room has an update a year later, with report on other related documents as well as backstory.
(I'm rather happy it confirms several of my speculations from a year ago, above.)
Schneier.com is a personal website. Opinions expressed are not necessarily those of BT.