Capture of tabular data in table format (e.g., invoice processing)
In Scan2x, it is possible to define the OCR functionality of a metadata field as Table or Table With Headers – for more information on how to do this, please see the OCR Zones best practices Tab.
Once a table OCR zone is defined, any tabular data found within the zone at scan time is formatted by Scan2x as a table, reflecting the structure of the document being scanned.
If the document has a table with six columns, the resulting data following OCR will also have six columns that exactly match the document.
While this is acceptable for some operations and documents, it is often necessary to ‘normalize’ the data coming from multiple document templates of the same type into one common format.
Let’s consider an Accounts Payable example:
When scanning creditor invoices from multiple different suppliers, it is normal that each supplier will have their own invoice layout. Positions of data on the page will change from supplier to supplier, and tabular data will change in both position and content – for example, ‘Supplier ABC’ may list each line item over six columns whereas ‘Supplier DEF’ may list each line item over nine columns of data.
If Scan2x is to process each of these invoices to provide input to an ERP system, that ERP will expect one common format of incoming data irrespective of which supplier invoice data is being supplied.
Scan2x is able to handle this data normalization by allowing a metadata field to be defined as a table, into which the standardized ERP format can be defined. A translation screen is then available which allows for the transformation of each templates’ data format to that of the ERP.