Fundació URV

Automatic sanitization of textual documents

Posted by Fundació URVResponsive · Innovative Products and Technologies · Spain

Summary of the technology

Redaction or sanitization is required to declassify sensitive textual documents or make them available for secondary use. This task, which is complex, time-consuming and prone to errors, is performed manually by one or several human experts. Our technology automatizes the process by automatically detecting terms and term combinations appearing in the documents that may disclose sensitive information. Such terms are then subject to redaction or generalization.

Fundació URV

Description of the technology

Our solution consists on a semantic privacy model by which the users can intuitivelly define their privacy requirements on the document contents, that is, which topics they consider sensitive.

Then, an automated algorithm analyses the document content in order to detect individual terms or combinations of terms that partially or totally disclose any of the sensitive topics stated by the user. This assessment relies on the information distribution in the Web, which represents the knowledge an attacker may use when attempting to disclose sensitive data in the protected document.

Finally, another automated algorithm redacts (supresses) or generalizes risky terms consistently with the privacy requirements stated by the user.

More technical details are provided in the following papers:
1) https://arxiv.org/abs/1406.4285
2) https://arxiv.org/abs/1701.00436

Intellectual property status

Other forms of protection

Current development status

Experimental technologies

Desired business relationship

Technology development

New technology applications

Adaptation of technology to other markets

Technology Owner

Fundació URV

Technology Transfer Office

Related keywords

  • Data Protection, Storage Technology, Cryptography, Data Security
  • Information Technology/Informatics
  • Computer related
  • Computer Software Market
  • Applications software
  • semantics
  • sanitization
  • document redaction
  • ontologies
  • privacy

About Fundació URV

Technology Transfer Office from Spain

The Technology Transfer and Innovation Center (CTTi) meets from the University environment the technological needs and services generated by the productive sectors and administration, through the management of Transfer of Technology and Knowledge, the Intellectual and Intellectual Property management, Technology Watch, Entrepreneurship, and Technology Infrastructures Offer (business incubator).

Send your request

By clicking "Send your request" you are signing up and accepting our Terms of Service and Privacy policy

Technology Offers on Innoget are directly posted and managed by its members as well as evaluation of requests for information. Innoget is the trusted open innovation and science network aimed at directly connect industry needs with professionals online.