OCR Processing

<< Click to Display Table of Contents >>

Navigation:  Concordance > Using Concordance > Content Review > Review with Concordance Native Viewer >

OCR Processing

Opitical Character Recognition (OCR) processing can be performed on a single page or an entire document displayed in Concordance Native Viewer.  Initiating the OCR Single Image command scans the page or document and then writes the data to a specified field in the in the associated Concordance database for indexing and searching.

Note

This feature requires activation of the OCR on the Fly license for Concordance Native Viewer.

Warning

While the OCR Single Image process is running, do not navigate between records in Concordance or close Concordance.  This will cause the process to fail to write the data to the selected record.  The corresponding Concordance database record is locked until the OCR process is complete.

Warning

When running the OCR process on a document that contains a markup that is not burned in, the text that lies under the markup will be scanned and imported to the specified database field.  If you do not want the text under the markup scanned, you must first produce the document using the Production command in Concordance, load the produced document into the database, and then scan the document.

Digital OCR processing does not offer perfect text recognition.  Accuracy may be reduced by many conditions:

Text appears skewed or uneven on the page

Pages are creased or torn

Letters are faded, blurry or otherwise distorted

Non-standard type faces used in documents

Font style, size, and/or color

Image files with low resolution (such as .gif formatted images), background noise, inverted colors, black margins, or wrong page orientation

Multiple languages selected for processing

The OCR scanning process is CPU intensive and can require significant time and computing resources to carry out large jobs.  Typical processing time is between 1 and 2 pages per second.  This time is greatly impacted by the amount of text and other information on the pages, and the quality of the document.

Database Considerations

Concordance Native Viewer is locked once the OCR process is initiated and remains locked until it is completed.  The corresponding Concordance database record is locked for all users.  The database record can be viewed but not edited.  Indexing cannot be performed on the database during the OCR process.

Concordance database OCR fields:

Must already exist in the associated Concordance database

May already contain data - you can append additional date to the next available OCR field or overlay existing content with new data

Must have the same alpha prefix, and have a numerical suffix that starts with 1 and are the same length (i.e. OCR1, OCR2, etc.)

Must be paragraph type fields to ensure the data is indexed and searchable

Must be large enough and/or you must have enough fields defined to support OCR processing

Must be accessible with full read/write access by the user

Language

OCR Single Image supports these language types:

Latin (for English, French, Spanish, German, etc.)

Cyrillic (for Russian, Bulgarian, etc.)

Greek

Chinese (Simplified and Traditional)

Japanese

Korean

 

To OCR the Text of a Document or Image

1.In Concordance Native Viewer, navigate to the document you want to scan.

2.From the File menu, click OCR Single Image.  The OCR Options dialog displays.

CNVOCROptions

3.Select Append to write the OCR data in the specified field at the end of any existing data, or Overlay to replace existing data in the specified field (existing data in the field will be lost).

4.Select Current Page to OCR the page currently displayed in the viewer, or Entire Document to OCR all pages of the current document.

5.From the Language Options list, choose the language that corresponds to the text represented in the document.

6.Click Start.  

7.If you chose Overlay, you will need to confirm that you want to overwrite the database field before proceeding.  Select OK to proceed.

8.A progress dialog displays.  Upon completion, you can return to Concordance to verify that the data appears in the field you specified.