Ewan March 9, 2016 2:09 PM

Fraud or just misunderstanding? I wouldn’t be surprised if these crosswords are created by a generator algorithm each day and the author is whoever ran the generator. The newspapers in question are all part of the same publishing group. It’s highly likely they just happened to fit the same words together and they are running off of the same dictionary of course.

Tatütata March 9, 2016 3:51 PM

How many different crossword puzzles are possible anyway? The English vocabulary consists of a finite number of words, and is but a small subset of the possibilities offered by mere N-grams of the alphabet.

I tried my hand at writing a program for finding anagrams of a given starting sequence through a combination of brute force, random permutations, frequency tables, and a dictionary. Too ugly/hacky to publish, but I found a few good ones that let me shine. [Is that cheating?]

It’s not too difficult to find a set of words that fit the starting sequence. But finding a grammatically correct sequence sentence is already more challenging, and a meaningful one even more. The rules of the language limit the possibilities.

In a crossword puzzle the additional rules are provided by the intersections. I view it as a kind of error detecting/correcting code.

Shannon looked into these questions 70 years ago, I was already planning to look up his paper on crossword puzzles.

David Leppik March 9, 2016 5:28 PM

@Tatütata, computer generated crossword puzzles are very different from ones written by people. For human puzzles, not only do the clues need to be clever, but the puzzles have themes which tie many clues together. The themes may be jokes, long phrases, or even a series of complete sentences.

@Ewan, if you read the article you’ll find that the features in question are themes that can’t be generated algorithmically. What’s more, the copying is directional. A theme would show up in the NY Times and then be copied on a later date; themes never showed up in the NY Times second. And when they were copied, the order of the clues was preserved.

Meow March 9, 2016 6:48 PM

@Data Dog:

Alright, troll, I’ll bite: personal information can be toxic when a leak would cause you a lot of harm in reputation or legal problems, so the implication is don’t just sit on mountains of it that you don’t need…

analyzing puzzles isn’t personal information.

@Tatütata, David Leppik:

Human-generated puzzles can easily be a mixture of hand-crafted seeds or “themes” and then computer-aided for the “filler”… And this is always what I assumed such themed crossword puzzles were! Not just wrote copies!

Bruce Schneier March 9, 2016 7:25 PM

“Wait, they used data analysis to find fraud? But data is a toxic asset!”


Toxic does not equal useless.

blake March 10, 2016 4:46 AM

It was pretty toxic the the fraudster.

How many medicines are really low dosage poisons that disproportionately affect our ailments more than they affect us?

There’s probably a concept of dosage with data too: have enough to control your fraud, etc, but don’t ingest so much that you kill the host.

Tatütata March 10, 2016 7:25 AM

Some of the “plagiarised” puzzles in the link do indeed look quite suspicious, with identical skeletons thematic phrases, but with different “filler”, albeit of the same geometry.

I see in these sub-areas connecting to other sections by just a few letters, so it might be possible to work out by brute force all the possible solutions for a given set of geometries up to a dimension MxN of a moderate size, and their connectors.

I think I found something like what I was looking for in the IEEE Information Theory Society Newsletter:

2001 Shannon Lecture, Constrained Sequences, Crossword Puzzles and Shannon; Jack Keil Wolf, University of California, San Diego and QUALCOMM Incorporated

See in particular p. 6/19, middle of the left column.

Leave a comment


Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via

Sidebar photo of Bruce Schneier by Joe MacInnis.