What is Document Indexing
Document indexing is a method of making sets of scanned documents retrievable from within a document management application. A data set can be any amount of digitized pages from as little as one to an entire folder and stored within one PDF file. The index is populated with basic information about the data set or in this case the PDF file. For example, ID #, document type, and date which allows the user to search/retrieve the PDF.
Optical Character Recognition (OCR)
It is quite labor extensive to index large amounts of files and more expensive as the amount of fields increases. Indexing software is one solution to bring the costs down. The software uses Optical Character Recognition (OCR) (Link) to populate each field with the appropriate value. The indexing software can accomplish this through Zonal OCR when the information is located on the same part of the page. Another solution is Rubber Band OCR where manual boxes are drawn around the data. The limitation is that lower quality or handwritten documents are not suitable for OCR software.
Manual Index from Images
In cases where documents are not suitable for OCR or the error rate is too high, it is more cost advantageous to manually type-in each field especially when the documents are indexed offshore (Information). Manual indexing is in fact more accurate than OCR when double keying is used. One operator types the data in and another checks the accuracy. Manual indexing from the image may be combined with the use of patch and barcodes during digitization. Patch codes may be inserted between each document and each time the scanner reads a patch code a separate document is saved out. Barcodes are more useful when pre-existing data in digital format is available from either a spreadsheet or database. This allows the scanner to populate multiple fields and the remaining fields are manually keyed. This process may also be less expensive than purchasing software.
Logical Document Determination
Even the insertion of patch codes or barcodes takes a lot of time when the documents are not pre-sorted or the documents sets are small. In these cases, the use of Logical Document Determination may be more cost effective. In this method the data entry operator logically determines where each document starts and stops. This method is more prevalent for legal coding.
Manual Retying Documents
Manual retyping is an outsourcing service used for E-book conversion in cases where the OCR results are poor due to either low quality originals, fonts not supported by OCR and dot matrix printer fonts. In these instances, the manual correction is more time-consuming than retyping the documents. Manual retyping is also suitable for spreadsheets where it isn’t possible to delimit the data into separate fields.