Unintended Information Revelation
Here’s a new Internet data-mining research program with a cool name: Unintended Information Revelation:
Existing search engines process individual documents based on the number of times a key word appears in a single document, but UIR constructs a concept chain graph used to search for the best path connecting two ideas within a multitude of documents.
To develop the method, researchers used the chapters of the 9/11 Commission Report to establish concept ontologies – lists of terms of interest in the specific domains relevant to the researchers: aviation, security and anti-terrorism issues.
“A concept chain graph will show you what’s common between two seemingly unconnected things,” said Srihari. “With regular searches, the input is a set of key words, the search produces a ranked list of documents, any one of which could satisfy the query.
“UIR, on the other hand, is a composite query, not a keyword query. It is designed to find the best path, the best chain of associations between two or more ideas. It returns to you an evidence trail that says, ‘This is how these pieces are connected.'”
The hope is to develop the core algorithms exposing veiled paths through documents generated by different individuals or organisations.
I’m a big fan of research, and I’m glad to see it being done. But I hope there is a lot of discussion and debate before we deploy something like this. I want to be convinced that the false positives don’t make it useless as an intelligence-gathering tool.
mark • August 30, 2005 1:53 PM
I wonder if this is at all conceptually related to google sets? Given, the trails and underlying documents aren’t shown.
http://labs.google.com/sets