Take into account the following considerations before performing OCR on documents:
•OCR does not offer perfect recognition of text. Accuracy can be reduced by many conditions, including:
•Text appears skewed or uneven on the page.
•Pages are dusty, folded, or torn.
•Letters are faded, blurry or otherwise distorted.
•Non-standard typefaces are used in documents.
•OCR is a CPU-intensive activity and can require significant time and computing resources to carry out large jobs. OCR processing speed is typically between 1 to 2 pages per second. However the actual processing rate depends on many factors, including:
•Processing power and memory of computer performing OCR.
•The number of computers involved in OCR processing. You can configure OCR processing to be distributed among multiple computers. For more information see Distributed Batch Processing.
•The amount of text and other information on pages.
•The quality of the original documents. Before you perform OCR on a batch of documents, consider applying filters to deskew, remove dust and lines, and clean up images in other ways.
With the assistance of CloudNine Technical Support, you can use a run command to make searchable text obtained by the TIFF and/or OCR processes within LAW available in CloudNine™ Explore. Please contact CloudNine Technical Support for instructions and assistance with this process. |
The Text display in the main interface will indicate the presence of text for the current document, which may be OCR text, text extracted during an import using ED Loader or Turbo Import, text imported with load file records or raw images, text pulled during a TIFF conversion, or text from the text file linked to the current document.
To choose the type of text to display •In the Text display, right-click and then select a source type from the context menu. When an image is selected, the Text button has a small "page" icon on it if there is text for that image, with the actual text displayed below the button.
|
CloudNine™ LAW displays a green flag, yellow flag, or red flag, or no icon at all to the selected record, depending on the OCR status for that image. You can flag all, none, or selected pages only for the OCR process. (See the Flagging Documents/Pages for OCR topic for more details.) The OCR indicator gives users a quick visual representation of the OCR status for the current document. One of the following icons will be displayed on the Text button when a document is selected:
- Page icon, the OCR has been completed for the document (or text has been extracted using ED Loader or Turbo Import) - Green flag, the document has one or more pages flagged for OCR - Yellow flag, OCR process was cancelled while processing a document - Red flag, error occurred during OCR
If no icon is displayed, no pages for the current document have been flagged for OCR.
|
When a redaction has been applied and saved to a document in CloudNine™ LAW, the HasRedactions field is set to Y for the document, and the IsRedacted field is set to Y after merging the annotation for each page in the document containing a redaction. When you perform OCR on a document containing redacted text, the OCR process automatically omits the redacted text from the extracted text. Redactions do not have to be manually merged with a document in order for the OCR process to omit redacted text. The document-level HasRedaction field has 3 values: •Y = Has a redaction (merged or unmerged) •N = Does not have a redaction •“ ”(empty or blank) = Unknown (Documents will default to the blank value when added or when opening an existing case.) The page-level field IsRedacted is only updated when a redaction is merged.
Redacted text can also be omitted from documents during export without having to manually merge redactions with a document, by selecting the Enforce protection of redacted documents check box on the Advanced tab in the Export Utility dialog box. For more information, see Advanced Tab. For more information about the HasRedaction and IsRedacted fields, see Field Descriptions - LAW. For more information about redactions and other annotations, see Annotating Documents and Pages.
|