You can specify various optical character recognition (OCR) options by doing one of the following:
•From the main window, on the Tools menu, click Options and then click the OCR tab.
Or
•From the Batch Processing utility, on the Options menu, click OCR Settings.
These options improve OCR accuracy by specifying the layout of the pages. •Auto Detect - Automatically determines the layout of the page. This is the default option. •Single Column - Specifies that one column of text exists on a page.
|
Specifies the type of printing technology used to create the original documents and the print quality of the scanned pages. •Normal - Use this for pages printed with inkjet printers, laser printers, or offset lithography. This is the default. •Normal (Degraded) - The same as Normal, except that the print quality is known to contain some distortion or blemishes due to poorly printed originals, photocopying, heavy use, or aging. •Dot Matrix - Use this for pages printed using dot matrix printers, which include many early printer models as well as many types of printed receipts, such as from cash registers and ATM machines. •Dot Matrix (Degraded) - The same as Dot Matrix, except that except that the print quality is known to contain some distortion or blemishes due to poorly printed originals, photocopying, heavy use, or aging. Note also the following when selecting a quality option: •When a setting other than Normal is selected, OCR engine performance may be reduced. •ABBYY FineReader engine supports a quality setting called Magnetic Ink Character Recognition (MICR). This is the technology used for the routing numbers on personal checks and for other documents designed to be machine readable. •The Auto Detect setting with Xerox TextBridge OCR accommodates varying quality levels among originals.
|
The Language setting is used to specify the language dictionary the engine should use during the OCR process. If the correct language is not selected prior to the OCR process, the characters may not be recognized properly. If ABBYY FineReader is the selected engine, English will automatically be used as a second language if a non-English language is selected. For example, if Greek is selected and both Greek and English exist in the source image, ABBYY FineReader differentiates the languages and performs recognition for both. However, if a document contains Greek, and English is selected as the language, Greek characters will not be interpreted or rendered correctly in the text. This only pertains to documents containing Unicode characters, such as Chinese, Japanese, Korean, Greek, or Russian. Languages that share many common characters in their alphabet, for example, English, Spanish, French, German, Dutch, and Portuguese will be interpreted correctly when existing in the same document if any of these languages are selected.
For the unsupported Xerox TextBridge engine, the System Default setting uses whichever language is specified by Windows as the default.
|
This feature is used to select the output format produced by the selected OCR engine. The available output formats and licensing requirements are in the following table:
The Smart Text and Standard Text are essentially the same, both producing standard ANSI text output. See Creating Searchable PDFs for more information when using the Adobe output options to create searchable PDF files.
|
This option allows LAW to "stamp" the resulting OCR with a Bates number or page value using information retrieved directly from the LAW database. This feature is useful for providing 100% accurate Bates values in the OCR text to aid searching in certain applications. Page Markers can be customized via the law50.ini file located in the C:\Program Files (x86)\Law50 directory. By placing the PageStampText= section under the [OCR] key, the text stamped by the Page Marker feature can be customized. Currently supported fields are: &[Page] - Current page &[Pages] - Page count &[Page ID] - Bates number &[BegDoc#] - Beginning document number <CR> - Carriage return (new line)
ExampleThe following page marker: PageStampText=###&[Page]|||Page &[Page ID]^^^ Results in a stamp of: ###1|||Page ABC0001^^^ This value increments for each OCR page stamped.
|
This option specifies if the OCR engine should automatically rotate images for the OCR output. The three options are: •Always ON •Always OFF •Binary Images Only - Auto-rotates monochrome (black and white) images. This option can help to prevent color and grayscale images that have little or no text from being improperly rotated. This setting is available with the ExperVision engine.
|
Use this setting to prevent or allow the replacing of existing OCR text. This feature is useful if some documents already contain usable OCR text files and the only the files that do not contain an existing text file should be included for processing. If an existing text file is detected for the current document, the OCR engine will skip the document and move onto the next, thus saving processing time. It may also be necessary at times to replace all existing text files; checking this option will replace the OCR for each document.
|
This setting determines whether the layout of the page (columns, etc.) will be preserved in the OCR results (non-text output formats only).
|
The OCR engines automatically create thumbnails during the Searchable PDF creation process. Use this setting to set the "visible" property of the thumbnails when opening the PDF file in Adobe Acrobat. If this setting is checked, the thumbnails will be viewable automatically in Adobe Acrobat; otherwise, the thumbnails will be hidden under the Pages tab in Adobe Acrobat.
|
Clearing the Reset text index status check box will prevent LAW from re-flagging the document for indexing after the OCR process is performed. This means the OCR text for affected records will not be searchable in CloudNine™ LAW. See the Full Text Indexing topic for more information.
|
Enable this option to force the OCR engine to deskew the image before OCRing the document. This can often lead to more accurate OCR (depending on the type of document). However, if the document contains graphics or angled vertical lines, the deskew feature may align to these graphics and cause unexpected results. Disabling this option will OCR the document with its current orientation. This feature is only available if the ExperVision OCR engine is selected.
|
This setting determines whether pictures in the original will be preserved in the OCR results. This setting does not affect the results if the output format is set to text. Pictures are not retained in text files.
|
This setting only applies to the ExperVision OpenRTK OCR engine and the Adobe PDF (Normal) and Adobe PDF (w/ Hidden Text) output format options. When the Create web optimized PDF check box is selected, the output PDF files will have the Fast Web View setting enabled in the PDF files. The Fast Web View setting provides page-at-a-time downloading from web servers, instead of downloading the entire PDF from web servers.
|