When you import electronic documents, even if you select the option to extract text during the import, you may still experience errors importing text for supported documents. Missing text after import is most common for these kinds of files:
•PDF files where the format is image-only and does not contain embedded text.
•JPEGs or other image files that might contain text, but only as part of the image.
•Encrypted or locked files.
After you import, the first step is to identify files where the text extraction process failed. Depending on the type of error encountered, you can take action to extract the missing text.
If a file type is supported for text extraction, but text extraction failed CloudNine™ LAW generates an error message. The error message appears in the ErrorMsg field and the TextXStatus field is marked with the value E. You can isolate text extraction errors using the grouping or searching functions of a grid display. To find errors by grouping in a grid 1.Open either grid view. •If the TextXStatus field is not a displayed column, right-click on a column header to open the Field List dialog, select TextXStatus and then close the Field List dialog. The TextXStatus now appears. 2.In the grid, group by the TextXStatus field to see if any E values are present. 3.Expand the E group, then right-click on an E value in the TextXStatus field. The Filter options will appear. 4.Click Choose Filter by Selection. Only those records containing an E value in the TextXStatus field will be returned.
|
With larger cases, locating files with missing text by performing a search may be quicker than using the grouping method. 1.In the main user interface on the Tools menu, click Search Records. The Database Query Builder appears. 2.Add a condition by choosing the following: •Field Name = TextXStatus •Operator = Equal •Value = E 3.Click <Add Condition>. The condition will appear in the clause window. 4.Click <Execute>. The search is performed returning only the records containing a TextXStatus value of an E.
|
You may still be able to obtain text files for records where text extraction failed during import. The general process for addressing text extraction errors follows these basic steps: 1.Find files where the TextXStatus is either N or E. TextXStatus of N indicates the file could not have text extracted based on its file type. TextXStatus of E indicates the file type is supported but text extraction resulted in an error. 2.Fix these files as necessary based on the reason text could not be extracted: •For files that are locked or encrypted, get the passwords, open files in native applications, and then perform import again with text extraction. •For files not locked or encrypted, use a batch process to convert them to TIFF, and extract text during TIFF conversion. Prior to conversion, in the TIFF conversion settings, enable the Save text with images option. The TextPStatus will indicate if text was printed to file at the time of TIFF conversion. •For any other graphics or PDF files where text is visible but otherwise inaccessible, perform optical character recognition (OCR).
|