Redaction or sanitization is required to declassify sensitive textual documents or make them available for secondary use. This task, which is complex, time-consuming and prone to errors, is performed manually by one or several human experts. Our technology automatizes the process by automatically detecting terms and term combinations appearing in the documents that may disclose sensitive information. Such terms are then subject to redaction or generalization.
Our solution consists on a semantic privacy model by which the users can intuitivelly define their privacy requirements on the document contents, that is, which topics they consider sensitive.
Then, an automated algorithm analyses the document content in order to detect individual terms or combinations of terms that partially or totally disclose any of the sensitive topics stated by the user. This assessment relies on the information distribution in the Web, which represents the knowledge an attacker may use when attempting to disclose sensitive data in the protected document.
Finally, another automated algorithm redacts (supresses) or generalizes risky terms consistently with the privacy requirements stated by the user.
More technical details are provided in the following papers:
Intellectual property status
Other forms of protection
Current development status
Desired business relationship
New technology applications
Adaptation of technology to other markets