<< Click here to display Table of Contents >> Navigation: Using CloudNine LAW > Batch Processing:

Extracting Custom Metadata

Contents

There are two ways to extract metadata from documents in a case. Metadata can be extracted from documents during the import process when using LAW Electronic Discovery Loader to import files into a case or by using the Batch Processing utility after documents have been imported into a case.

When extracting metadata, you can customize what type of metadata is extracted from the case files by selecting the metadata extraction options you want to apply. When importing documents, the metadata extraction options are defined on the Settings tab under the Metadata category in the LAW Electronic Discovery Loader dialog box. In the Batch Processing utility, metadata extraction options are defined in the Document Analysis Options dialog box.

For more information about extracting custom metadata during the import process, see Metadata.

The Document Processing/Analysis feature in the Batch Processing utility analyzes and extracts the metadata from the selected record set, based on the metadata extraction options selected in the Document Analysis Options dialog box. The Document Processing/Analysis feature in the Batch Processing utility also provides the option to split Adobe Acrobat PDF files by bookmarks into separate PDF files.

custom metadata using the Batch Processing utility

1.From the main form on the Tools menu, click Batch Process. The Batch Processing utility opens.

2.Select the documents containing the metadata you want to analyze and extract.

For more information on selecting documents, see Selecting Documents for Processing.

3.Select the Document Processing/Analysis check box.

4.On the Options menu, click Document Analysis Options.

Clicking Document Analysis Options opens the Document Analysis Options dialog box. The Document Analysis Options dialog box contains all of the metadata extraction options available in the Batch Processing utility. By default, all of the check boxes, except the Split document on bookmarks check box, are selected.

•Capture custom metadata for Adobe PDF files. Extracts metadata field names and values that were assigned to the original Adobe Acrobat PDF. Such fields might include the names of e-mail attachments, or of PDF's embedded in other files such as in a Micrsoft Word document.

•Capture custom metadata for MS Office files. Extracts metadata field names and values that were assigned to Microsoft Office file, except for Publisher and Access files. Field names and data are brought into the LAW case as extended properties. See note below.

•Capture EXIF metadata for image files (TIFF/JPEG). Extracts metadata field names and values from EXIF files. If found, custom metadata fields are added to the case database with field names preceded by EP.

•Detect bookmarks in Adobe PDF files. If an Adobe Acrobat PDF file contains bookmarks, assigns a Y to the HasBookmarks field.

•Split document on bookmarks. Adobe Acrobat PDF files containing bookmarks will be split into separate PDF files by each bookmark. When this check box is selected, the bookmark names are added the BookmarkName field.

The PDF custom metadata extraction and the PDF split operation cannot be performed simultaneously or with other batch processing jobs in a batch. Each operation must be performed in a separate batch.

•Detect comments for MS Word/Excel/PowerPoint and PDF files. If comments in Microsoft Word, Excel, and PowerPoint files or sticky notes in Adobe Acrobat PDF files are detected, assigns a Y to the HasComments field.

•Detect hidden rows and/or columns for MS Excel files. If hidden columns or rows are found, assigns a Y to HasHiddenRow and HasHiddenColumn, respectively.

•Detect hidden sheets for MS Excel files. If hidden sheets are found, assigns a Y to the HasHiddenSheet field.

•Detect hidden slides for MS PowerPoint files. If hidden slides are found, assigns a Y to the HiddenSlides field.

•Detect speaker notes for MS PowerPoint files. If speaker notes are found, assigns a Y to the SpeakerNotes field.

•Detect tracked changes for MS Word and Excel files. If tracked changes are found, assigns a Y to the HasTrackChanges system field.

Extended property metadata is placed in extended property fields that are created when the documents are imported or the metadata is extracted by the Batch Processing utility. The names of all extended property fields start with EP. The remainder of the field name depends on the name of the field as it exists in the source document. For example, if a Word document is imported that contains a custom metadata field called Typist, LAW will create a metadata field during the import or batch processing called EPTypist. Deleting a document will delete all corresponding extended properties for that document. For more information on extended properties, see: Extended Properties in Grid Views.

5.Make sure the check box is selected for the options you want to apply during the metadata extraction.

6.Clear the check box for the options you do not want to apply during the metadata extraction.

7.Click OK.

Clicking OK saves the option settings and closes the Document Analysis Options dialog box.

8.Optionally, select the Enable Distributed Batch Processing check box and configure the session as needed.

For more information on using distributed batch processing see Distributed Batch Processing.

9.Click the Begin button.

When the batch process is completed, the Status dialog box is displayed.

10.Click OK to close the Status dialog box.

Adobe Acrobat PDF files can be split into separate PDF files bases on the original PDF file's bookmarks. Using the Document Processing/Analysis feature in the Batch Processing utility, you can have LAW identify the PDF files containing bookmarks, and then have these PDF files split, based on the PDF file's bookmarks.

When a PDF file is split by bookmark, the original PDF file is deleted, and the PDF file containing the pages of the first bookmark becomes the parent file. The PDF files for the remaining bookmarks become the children of the first bookmark PDF file.

For example:

If a PDF file has 15 pages with 3 bookmarks (Bookmark 1 = pages 1-5, Bookmark 2 = pages 6-10, Bookmark 3 = pages 11-15), when the PDF file is split, the PDF file generated for bookmark 1 will be the parent file, and the PDF files generated for bookmarks 2 and 3 will be the child files.

1.From the main form on the Tools menu, click Batch Process. The Batch Processing utility opens.

2.Select the PDF files containing bookmarks that you want to split.

For more information on selecting documents, see Selecting Documents for Processing.

When splitting PDF files, all PDF files containing bookmarks identified by the Batch Processing utility will be split. Make sure you only select the PDF files you want to be split.

3.Select the Document Processing/Analysis check box.

4.On the Options menu, click Document Analysis Options.

Clicking Document Analysis Options opens the Document Analysis Options dialog box.

5.Select the Detect bookmarks in Adobe PDF files and Split document on bookmarks check boxes.

The PDF custom metadata extraction and the PDF split operation cannot be performed simultaneously or with other batch processing jobs in a batch. Each operation must be performed in a separate batch.

6.Click OK.

Clicking OK opens the Confirm PDF Split message.

7.Click Yes.

8.Optionally, select the Enable Distributed Batch Processing check box and configure the session as needed.

For more information on using distributed batch processing see Distributed Batch Processing.

9.Click the Begin button.

When the batch process is completed, the Status dialog box is displayed.

10.Click OK to close the Status dialog box.

When the batch process is completed the HasBookmarks field for the PDF files containing bookmarks is set to Y, and the BookmarkName field is populated with the name of the bookmark associated with the PDF file. If a bookmark name exceeds 251 characters, the bookmark name will be truncated.

In the tree view document list on the Index tab, the PDF files generated from the bookmarks are listed as children of the original PDF file.

Need additional help? E-mail the CloudNine™ LAW Technical Support team at: lawsupport@cloudnine.com, or contact a support representative at 713-462-6464 for CloudNine™ LAW Ext. 12 or CloudNine™ Explore Support Ext. 13. The Technical Support team is available between the hours of 9:00 A.M to 7:00 P.M. Eastern Time, Monday - Friday.

Copyright © 2024 CloudNine™. All rights reserved.