eQuorum SimpleIndex scan data capture software
Contact eQuorum Sales for product information, or contact the eQuorum webmaster.

Automating Document Capture

The two main methods for automating indexing are Barcode Recognition and OCR Optical Character Recognition. Barcode recognition is faster and more accurate, but your documents must contain a barcode on the document or a cover page for this to work. OCR can read printed data directly from the page, which means most documents can be processed as-is. However, there are many conditions affecting the practicality of OCR that will be discussed in this section.

Using Barcode Recognition

Barcode recognition is the most efficient way to capture index data printed on documents. Some documents already have key information in barcode format. If your project is to scan new documents on an ongoing basis, you may be able to redesign the forms to include barcodes. Having a barcode with index data on the document is the best case scenario, since all the index data is on the document at the time it is created, in a format that can be read with near 100% accuracy.

If you can't print barcodes on the document itself, an alternative is to have the person who creates the document print a barcode cover page and place it on the file before it is scanned. Barcode recognition can also be useful when you have documents with a variable number of pages that will all receive the same index values. If it is not possible to generate an indexed coversheet for these at the time they are created, a generic barcode coversheet can be used to separate the scanned images into multi-page files, one for each document. A second process can then be used to index these images one file at a time instead of one page at a time, greatly increasing throughput.

Using OCR

Zone OCR solutions traditionally require you to specify a region on the page where index information will be found. This region is recognized and the result is inserted into an index field. The problem with traditional zone OCR is that if the region is moved slightly due to variations in scanning, the result could contain extra neighboring characters or cut off desired characters. This limits the usefulness of traditional zone OCR to documents where the index value is in the exact same place every time and has plenty of white space around it.

SimpleIndexs OCR contains many advanced features to overcome the inherent limitations of zone OCR. This is done by providing template and dictionary matching for OCR fields. These features search the OCR results for a certain pattern or list of possible values and return only the matching data. This allows you to draw your OCR zones much larger than normal, ensuring that no matter how much the data shifts around it will always be contained within that region.

It is even possible to draw your zone around the entire page and find key information that is not printed in any fixed location. For example, a doctors office may receive lab reports from many different labs. Each report is formatted differently, but each contains the patients name somewhere on it. Using the dictionary matching feature with a patient name list, SimpleIndex can identify the correct patient for each lab automatically.

When implementing OCR for document automation, carefully consider the data you are trying to recognize. Is the text legible? Does it appear in a fixed location? Does it conform to a unique pattern that wont be found anywhere else on the page? Is there a list available with all the possible values for this field? Answer these questions and you will know which OCR approach is best for your application.



| Products | Services | Support | The Company | Partners | Contact Us |
| eQuorum Home | Site Map |


© Copyright 1999-2012 eQuorum Corporation, Atlanta, Georgia. All rights reserved.
Please read our legal notice for more information.
Please read our Section 508 compliance notice for more information.
Please read our Privacy Policy for more information.