With ADR, Scan2x can examine samples of a document – e.g., an invoice from Carrefour – and learn to recognize future Carrefour invoices, even if the invoices are for different products and the overall document is not identical to others like it. By accepting different Carrefour invoices of different colours and quality, Scan2x continues to learn and becomes cleverer at document recognition as time passes.
It is important to understand that only structured documents are recognizable in this way i.e documents that are always of the same basic structure and layout, like invoices, purchase orders, forms, tickets, etc. Unstructured documents like general correspondence and emails all look the same to the Scan2x document analysis engine as it looks for structure, not content.
Document Fingerprinting
This is a function whereby the software looks at the structure of a document rather than the content. The system does not look for specific words or patterns within the document content, but rather looks for indicators of document structure, logos and their positions, tables, footers and shading. Document fingerprinting works best for documents that are structured – forms, invoices and other documents of a relatively fixed or predictable format.Scan2x uses document fingerprinting in order to provide the first level of document recognition.
Document Content Analysis
Content Analysis involves the extraction of the text within a document image, and then the application of searches within the text for patterns. This method lends itself particularly well to unstructured documents like random correspondence and email, where document format is not used for document type identification. If, however,content analysis is used in conjunction with document fingerprinting, then this can result in a very high success rate for structured document identification and classification.
For structured document identification with Scan2x, it is therefore possible to use fingerprinting technology to provide a first-level document identification mechanism. This will allow the differentiation of documents from, for example, one supplier and another. Once a supplier has been identified, a combination of OCR zone text and a VBScript expression can then be used to identify the specific document type from that supplier.
To illustrate automatic document recognition using Scan2x, we will use the example of Accounts Payable to explain the ADR functionality and how to set it up. By Accounts Payable, we mean the scanning of multiple supplier invoices in batches separated by Document Separators. More about Document Separators on the Adding A Document Splitting Rule Tab in the Administrator's Guide. Users will insert a Document Separator between one invoice and another and will scan multiple documents from different suppliers as one batch. Each document will be of a different layout, and each may conceivably be of different document types – for example, one might scan supplier invoices, delivery notes, remittance advices and other structured documentation together. Scan2x will use the first page of each document to perform the Fingerprint Recognition function and will route each document to its respective profile for indexing and processing according to that profile’s settings.
In order to set up document recognition in Scan2x, we follow a few simple steps:
1.Create a Job Button for every document to be recognized.
2.An ADR Group is then set up and all the profiles created in step 1 above are moved into it.
3.Submit document samples to the recognition engine for it to commence the Fingerprint learning process.
ADR Templates
By submitting samples of documents for each document type, we allow Scan2x to ‘learn’ about the characteristics of each document type – logo position, if any, tables, headers and footers, etc. Scan2x refers to these document samples as ADR Templates. It is possible (and advisable) to submit more than one Template to each document type as this allows Scan2x to fine-tune its internal definitions, resulting in better recognition results during production scanning.
The ADR Templates tab below shows a list of all the profiles for the ADR Group, and it is possible to add ADR Templates from this tab.
The ADR Templates added here during the creation of the ADR Group will be used initially to recognize the first batches of documents scanned. If only one Template is added for each document profile it is possible that the first scan runs will not recognize a proportion of documents scanned, and so it is possible to assign document types at scan time.
These assignments are used by Scan2x to add to the Template list above, thereby increasing its knowledge of each document profile.
In the scenario where an ADR group has 20+ jobs stored, and in the scan preview screen a document is not recognized. Instead of scrolling through the ADR job list trying to find the correct job for the document, the user can type in the name of the job in the ADR search bar as shown in the image below.