In the project “Forensic Analysis of Scanned Text Documents” (FASTDoc) we plan to forensically analyze scanned text documents. This becomes highly relevant as more and more institutions demand electronic versions of documents and even non-expert users can alter these documents with imaging editing software, such as Photoshop. The goal of FASTDoc is to give back some credibility to scanned text documents.
As a first step, we want to establish the “Tyrolean Benchmarking Database for Text Forgeries” (TBD4TF). This database serves as an entry point for researchers who develop methods for testing the authenticity of scanned text documents. The TBD4TF will contain several hundred text images that were first printed with different printers and then scanned with several scanners. Furthermore, we want to include automatically generated forgeries of the scanned documents. Simultaneously, we implement a toolbox combining methods from optical character recognition, digital image forensics and decision theory. We propose to divide the document under consideration into parts that contain text and those that contain only background, and to analyze them separately. At the end of the project we plan to have a working prototype implementation. We evaluate the accuracy of the prototype on the TBD4TF and will publish the results as a benchmark for other researchers.
The FASTDoc project will start in August 2017 and will be completed in twelve months from that date.