OCR Options

<< Click to Display Table of Contents >>

Navigation:  Using CloudNine LAW > Processing Documents > Optical Character Recognition >

OCR Options

LAW provides several different options for performing OCR on case documents. To view/change these options:

1.From the Main User Interface, open the LAW Options window by using the Menu to select Tools > Options....

2.Navigate to the OCR tab. Change the OCR Options as desired here, and then click on OK at the bottom-right when finished.

OptionsOCRTabIcon OCR

NOTICE:  OCR options change based on the OCR Engine selected. Italicized options are only for ExperVision OpenRTK, while underlined options are only for ABBYY FineReader.

OCR Engine - The software used for performing OCR within LAW. Two different engines are supported:

oExperVision OpenRTK - Included with the LAW installer, but supports fewer Languages and is limited to one instance per machine.

oABBYY FineReader - Installed separately from LAW (requiring a license), but supports more Languages and can run one instance per processor core.

Page Layout - Improves OCR accuracy by specifying the column layout of document pages:

oAuto Detect - Determines the layout automatically.

oSingle Column - Specifies only one column of text. Required before running Email Thread Analysis.

Quality - Specifies the printing technology used to create documents pages, as well as the quality of scanned pages:

oNormal - Select for pages printed via inkjet printers, laser printers, or offset lithography.

oNormal (Degraded) - Same as Normal, but the printed pages are of poor quality due to defective printing, photocopying, heavy use, or aging.

oType Writer - Select for pages written via typewriter.

oDot Matrix - Select for pages printed via dot matrix printers, like the receipts from cash registers or ATMs for example.

oDot Matrix (Degraded) - Same as Dot Matrix, but the printed pages are of poor quality due to defective printing, photocopying, heavy use, or aging.

oOCR A - Select for pages with text printed in OCR-A, which is a monospaced font designed specifically for OCR, and used on credit/debit cards for example.

oOCR B - Select for pages with text printed in OCR-B, which is another font designed specifically for OCR.

oMICR - Select for pages printed via Magnetic Ink Character Recognition (MICR) technology, like the routing numbers on checks for example.

Language - Specifies a language dictionary that should be used by the OCR Engine. Selecting the wrong language may cause characters to not get recognized properly. When using ABBYY FineReader, English is automatically used as a second language if any other non-English language is selected, and languages sharing common characters (such as English, Spanish, French, German, Dutch, Portuguese, etc) are also interpreted correctly if any other similar language is selected. The available languages are as follows:

oExperVision OpenRTK - Danish, Dutch, English, French, German, Italian, Norwegian, Portuguese, Spanish, Swedish.

oABBYY FineReader - Over 200 officially recognized languages. See the full listing.

Output Format - Specifies the file format used for output by the selected OCR Engine. Using one of the "Text" settings is recommended if the OCR results are to be exported for search functionality. The available formats are as follows:

oExperVision OpenRTK - Adobe PDF (Normal), Adobe PDF (w/Hidden Text), HTML, Smart Text Document, Standard Text Document, Word for Windows, WordPerfect.

oABBYY FineReader - Adobe PDF (Normal), Adobe PDF (w/Hidden Text), Adobe PDF/A (Normal), Adobe PDF/A (w/Hidden Text), HTML, Standard Text Document, Word for Windows, Word for Windows (2007).

Page Markers - Specifies the stamp used to mark document pages when OCR is performed. You can stamp pages with a page number and/or Bates value at either the top or bottom of each page. The Bates value being used is fully customizable within the LAW50.ini file (learn more).

Auto-Rotate - Specifies when the OCR Engine should rotate document page images for the OCR output:

oAlways OFF - Images are never rotated.

oAlways ON - Images are automatically rotated when needed.

oBinary Images Only - Only monochrome (black and white) page images will be automatically rotated when needed. Prevents improper rotation of color or grayscale page images that contain little/no text.

Overwrite existing files - Replaces any existing OCR text in the Case Database for all documents being processed. Otherwise, the OCR Engine will skip over these documents to reduce processing time.

Retain page layout - Preserves the existing layout for all document pages. Only available with non-text output formats.

Create PDF thumbnails - Makes document page thumbnails automatically viewable in Adobe Acrobat. Otherwise, the thumbnails are hidden under the "Pages" tab. Only available with Adobe PDF output formats.

Reset text index status - Prevents LAW from re-flagging documents for full-text indexing after OCR has been performed. OCR text for these documents will not be searchable when performing full-text searches.

Auto deskew - Document page images are automatically deskewed before OCR occurs. This can lead to more accurate results for distorted images, but may lead to unexpected results with images containing graphics or angled vertical lines.

Retain pictures - Pictures contained within document pages are preserved in the OCR results. Only available for non-text output formats.

Create web optimized PDF - Enables the "Fast Web View" setting for PDF files, which allows individual page downloads from web servers instead of full-PDF downloading. Only available with Adobe PDF output formats.

Back to: