On the Voynich Manuscript

Really interesting article on the ancient-manuscript scholars who are applying their techniques to the Voynich Manuscript.

No one has been able to understand the writing yet, but there are some new understandings:

Davis presented her findings at the medieval-studies conference and published them in 2020 in the journal Manuscript Studies. She had hardly solved the Voynich, but she’d opened it to new kinds of investigation. If five scribes had come together to write it, the manuscript was probably the work of a community, rather than of a single deranged mind or con artist. Why the community used its own language, or code, remains a mystery. Whether it was a cloister of alchemists, or mad monks, or a group like the medieval Béguines—a secluded order of Christian women—required more study. But the marks of frequent use signaled that the manuscript served some routine, perhaps daily function.

Davis’s work brought like-minded scholars out of hiding. In just the past few years, a Yale linguist named Claire Bowern had begun performing sophisticated analyses of the text, building on the efforts of earlier scholars and on methods Bowern had used with undocumented Indigenous languages in Australia. At the University of Malta, computer scientists were figuring out how to analyze the Voynich with tools for natural-language processing. Researchers found that the manuscript’s roughly 38,000 words—and 9,000-word vocabulary—had many of the statistical hallmarks of actual language. The Voynich’s most common word, whatever it meant, appeared roughly twice as often as the second-most-common word and three times as often as the third-commonest, and so on—a touchstone of natural language known as Zipf’s law. The mix of word lengths and the ratio of unique words to total words were similarly language-like. Certain words, moreover, seemed to follow one another in predictable order, a possible sign of grammar.

Finally, each of the text’s sections—as defined by the drawings of plants, stars, bathing women, and so on—had different sets of overrepresented words, just as one would expect in a real book whose chapters focused on different subjects.

Spelling was the chief aberration. The Voynich alphabet—if that’s what it was—appeared to have a conventional 20-odd letters. But compared with known languages, too many of those letters repeated in the same order, both within words and across neighboring words, like a children’s rhyme. In some places, the spellings of adjacent words so converged that a single word repeated two or three times in a row. A rough English equivalent might be something akin to “She sells sea shells by the sea shore.” Another possibility, Bowern told me, was something like pig Latin, or the Yiddishism—known as “shm-reduplication”—that begets phrases such as fancy shmancy and rules shmules.

Tags: academic papers, cryptanalysis, history of cryptography

Posted on August 13, 2024 at 7:04 AM • 9 Comments

Comments

jelo 117 • August 13, 2024 8:38 AM

If five scribes had come together to write it, the manuscript was probably the work of a community, rather than of a single deranged mind or con artist.

… was probably the work of a community of deranged minds or con artists, rather than of a single deranged mind or con artist.

Torsten Timm • August 13, 2024 9:46 AM

The article from the atlantic does not provide an accurate description of what we know about the Voynich manuscript.

For instance the article states “The mix of word lengths and the ratio of unique words to total words were similarly language-like.” The contrary is true. The word length distribution matches almost perfectly a binomial distribution and is therefore not language like (see Stolfi 2000 https://www.ic.unicamp.br/~stolfi/voynich/00-12-21-word-length-distr/). Jürgen Hermes states “When looking at word lengths the text of the VMS is astonishingly uniform (hardly any words have less than 3 or more than 10 characters). Even more surprising is the similar behaviour of type lengths and token lengths. Although Voynichese tokens are also slightly shorter on average than types, the word length distributions of both, types and tokens, is almost binomial” [Jürgen Hermes 2022, p. 2 https://ceur-ws.org/Vol-3313/paper7.pdf%5D.

The article also states: “Certain words, moreover, seemed to follow one another in predictable order, a possible sign of grammar.”
However “one of the most puzzling features of the VMS is its weak word order” [Reddy & Knight 2011, p.82]. D’Imperio wrote in 1978: “Also the strange lack of parallel context surrounding different occurrences off the ‘same’ word as shown by word indexes. In the words of several researchers ‘ the text just doesn’t act like natural language'” [D’Imperio 1976, p. 30]. Even Claire Bowern states about the distribution of words within a line: “All of these observations lead to generalizations that seem typographical rather than linguistic in nature” [Bowern & Lindemann 2021].

The article further states “Finally, each of the text’s sections—as defined by the drawings of plants, stars, bathing women, and so on—had different sets of overrepresented words, just as one would expect in a real book whose chapters focused on different subjects.”
However in natural languages the most frequent words “are distributed equally over the entire text, the so-called function words (like conjunctions, articles etc.). They do not appear contextual, but rather serve to implement grammatical structures, and they normally do not have co-occurring similar words of comparable frequency. In the VMS frequently used tokens differ from page to page” [Timm & Schinner 2020, p. 6].

Clive Robinson • August 13, 2024 9:53 AM

@ Bruce,

Let us assume it does make sense in a spoken language form.

It’s known that before Shakespear’s time although spoken words were consistent, the spelling of written words and what some might call grammar in the written form was not, and few cared.

Going back long prior to that, written language was used in a similar way to George Orwell’s various “New Speaks” to keep people segregated.

Further we know that “hiding text within text” in simple ways was also used to form hierarchies of people.

Such documents were used in hierarchies such as religions and have been found around the world.

The use of a “base” alphabet and dictionary would have been essential to allow accurate copying.

Think how in modern times computer languages use words and punctuation, and can be copied from books onto terminals by those who in effect have no knowledge of what the program means to the computer or those adept in the programming language.

Likewise cooking recipes that long predate written science experiments but formed the templates for “repeatability” science requires.

There are as many reasonable explanations as there are people who do crosswords etc.

Something tells me that if it does have secrets within, then some will always remain.

Chris Vail • August 13, 2024 3:27 PM

Perhaps the spelling aberrations signal words to be extracted from the text as an additional message.

JonKnowsNothing • August 14, 2024 3:03 AM

@Clive

re: make sense in a spoken language form

iirc(badly) There was a French cipher from ~1800s [Great Cipher] that was highly successful. It took a long time to break the code because part of the code was asymmetric but also contained duplicate word replacement options.

The key to the code was that it was a “spoken code” but it was also “phonetic” in spelling. The French are sticklers for the proper spelling, grammar and even their written styles but in this case they jettisoned the entire concept and created a long lasting durable code.

I’ve looked at various reproductions of the Voynich Manuscript and I am neither clever nor insightful as to the text other than there is a noticeable change in the writing style within the manuscript and towards the end there is separate set of encoded characters that mimic the internal style but are not part of the original manuscript and were decoded as a chemical formula.

The problem for modern folks, is we cannot dump all the stuff we know and as you go back further in time, our modern world view gets in the way.

Clive Robinson • August 14, 2024 11:39 AM

@ ALL,

Windows TCP/IP Remote Code Execution Vulnerability”

I’ve a vad feeling about this vulnerability based on the history of Microsoft network code.

https://msrc.microsoft.com/update-guide/vulnerability/CVE-2024-38063

Time will tell but I hope my feeling is wrong.

Clive Robinson • August 16, 2024 5:57 AM

@ ALL

Re : “I’ve a vad feeling about this”

Looks like Microsoft agree it’s quite bad news,

https://www.bleepingcomputer.com/news/microsoft/zero-click-windows-tcp-ip-rce-impacts-all-systems-with-ipv6-enabled-patch-now/