Scanning batched documents

Navigation: The Scan2x concept > Document Scanning >

Document Splitting

The scanning of multiple documents in one batch requires Scan2x to be able to decide how to split the batch - in other words, when one document has ended and the next one is starting. There are a few options available to choose from, and the choice will depend primarily upon the type of document we are planning to capture.

Structured Documents are documents whose appearance are not random in nature – for example: invoices, purchase orders, vouchers, bank cheques, delivery orders or virtually any type of form-based document. By using Scan2x’s optional Automatic Document Recognition module, it is possible to scan documents of the same type (e.g., incoming invoices from multiple different suppliers) and have Scan2x automatically recognize and process each document automatically.

More about Scan2x’s Automatic Document Recognition in the Administrators Guide under the Automatic Document Recognition (ADR) Tab.

Unstructured Documents are documents that are completely random in nature – for example: emails, letters, contracts, newspaper or magazine articles. In these documents there is no predictable format. These types of documents can be captured by Scan2x in batches, and several methods can be used to indicate the beginning or end of a document.

When a batch of documents is scanned and split, they are displayed as a list of separate thumbnails. The pre-configured metadata is automatically populated by Scan2x, while other indexes might need to be filled in by the user. Fields can be set as mandatory fields by the administrators. The user also has the option of populating the data for multiple documents as Common Metadata, or as individual data pertaining only to the document being displayed. Document Splitting allows the user a few options from merging and splitting documents to deleting or saving them, individually or as a batch, which will then route them to their pre-configured destinations.

Further information can be found in the Administrator's Guide under the Document Splitting Tab.

After selecting the Jobs Manager and choosing the job name to be edited, the administrator will be presented with the Job Configuration window, showing a list of options on the left-hand side.

Once the Document Splitting option is selected, the user will be able to add one or more splitting conditions and the administrator can choose exactly how he would like the documents to be split and recognised. Documents can be split in several ways as shown below. The administrator can further indicate to set the commands as the start of a new document or the end of a current one.

Split by text content on document

Text captured in metadata fields

It is possible to split documents based upon the presence or otherwise of text on a page. This text can be printed (i.e., human readable) or in the form of a 1D barcode or 2D barcode (e.g., QR codes, data matrix codes). Select this option if the text that the document is to be split by is in a constant or predictable position on the page. Within the metadata tab, create a metadata field to hold the text and an OCR zone that will populate this field. Then select it in the OCR’d Metadata Field dropdown below and use a splitting condition to trigger a document split.

Text somewhere on the page

In situations where different document types are using Splitting Rules to identify one document from another, it is sometimes necessary to OCR the entire page looking for the splitting condition. Using the OCR’d Metadata Field method above will cause Scan2x to OCR the page every time a new document type is checked. To prevent this, selecting the Full-Page OCR option will force Scan2x to OCR the document only once and persist the result of that OCR process when testing the split conditions of subsequent jobs during the matching process. Documents can also be split based on results returned from Expression Metadata Fields (Regex, VBScript, etc.) and Mapped AI Fields - for more information, please see the Mapped AI Fields under the Administrator's Guide.

More on how splitting conditions are used by Job Automation to recognise one type of document from another can be found in the Administrator's Guide under the Job Automation Tab.

Split by QR Code, Barcode, Aztec Code, or Data Matrix

This option works on the same premise as splitting by OCR so that every time Scan2x recognizes the presence of a QR Code, Barcode, Aztec Code, or Data Matrix it starts a new document. These can be pre-configured either to split the document on any QR Code or Barcode in general or be set to recognise specific data within that code and split only in that instance.

Split by page count

This is a simple option when scanning multi-page documents which carry the same number of pages within each document. Splitting by Page Count gives you the option to omit blank or unnecessary pages once the document is scanned. The administrator may also have the option to choose between duplex or simplex scanning. If duplex scanning is chosen, the administrator must count the front and back of each leaf as 2 separate pages.

Furthermore, a splitter page can be produced by pressing the Splitter-Page Generator button, and the resultant document can either be printed for the user to insert manually between the documents to be separated or e-mailed for the user to place at any end of the document as an extra page when importing PDFs. The splitter page is recognized using the Barcode function described above, and the administrator then has the option to automatically remove this splitter page from the final scanned result. For more information about the rest of the Splitting Options, kindly see the Document Splitting tab under the Administrator's Guide.

A final option that can be decided by the administrator is for the split documents to be saved automatically without the end user being presented with the scan preview screen. This ensures complete control over any documents that are scanned by other users.