OCR

<< Click to Display Table of Contents >>

Navigation:  Using CloudNine LAW >

OCR

Optical Character Recognition (OCR) is the process of converting hand-written or printed text from case documents into searchable full-text records (TXT files). This process typically takes place during import via Turbo Import or ED Loader, but may be needed for documents which failed to have their text extracted properly, or documents which were imported via scanner.

The accuracy of OCR can be diminished due to uneven or skewed prints, damaged (folded, torn, etc) pages, faded or distorted text, non-standard fonts, or illegible hand writing. Since OCR is a very CPU-intensive process, and the quality of the original documents can greatly impact performance, it's generally recommended to clean up these documents as much as possible before beginning the process.

Take into account the following considerations before performing OCR on documents:

OCR does not offer perfect recognition of text. Accuracy can be reduced by many conditions, including:

Text appears skewed or uneven on the page.

Pages are dusty, folded, or torn.  

Letters are faded, blurry or otherwise distorted.

Non-standard typefaces are used in documents.
 

OCR is a CPU-intensive activity  and can require significant time and computing resources to carry out large jobs. OCR processing speed is typically between 1 to 2 pages per second. However the actual processing rate depends on many factors, including:

Processing power and memory of computer performing OCR.

The number of computers involved in OCR processing. You can configure OCR processing to be distributed among multiple computers. For more information see Distributed Batch Processing.

The amount of text and other information on pages.

The quality of the original documents. Before you perform OCR on a batch of documents, consider applying filters to deskew, remove dust and lines, and clean up images in other ways.

 

InfoIconDetermining if a Document Contains Text

The Text display in the main interface will indicate the presence of text for the current document, which may be OCR text, text extracted during an import using ED Loader or Turbo Import, text imported with load file records or raw images, text pulled during a TIFF conversion, or text from the text file linked to the current document.

 

To choose the type of text to display

In the Text display, right-click and then select a source type from the context menu.

When an image is selected, the Text button has a small "page" icon on it if there is text for that image, with the actual text displayed below the button.

 

Note

WordPerfect and HTML formats cannot be displayed in this viewer. The Text display is for viewing purposes only. If a user needs to edit the OCR, simply double-click the text to launch the text in the editor registered for that file format. If there are multiple OCR formats for the same image, users can right-click on the OCR text and specifically select the format to open from a pop-up menu.