There are two ways to extract metadata from documents in a case. Metadata can be extracted from documents during the import process when using LAW Electronic Discovery Loader to import files into a case or by using the Batch Processing utility after documents have been imported into a case.
When extracting metadata, you can customize what type of metadata is extracted from the case files by selecting the metadata extraction options you want to apply. When importing documents, the metadata extraction options are defined on the Settings tab under the Metadata category in the LAW Electronic Discovery Loader dialog box. In the Batch Processing utility, metadata extraction options are defined in the Document Analysis Options dialog box.
For more information about extracting custom metadata during the import process, see Metadata.
The Document Processing/Analysis feature in the Batch Processing utility analyzes and extracts the metadata from the selected record set, based on the metadata extraction options selected in the Document Analysis Options dialog box. The Document Processing/Analysis feature in the Batch Processing utility also provides the option to split Adobe Acrobat PDF files by bookmarks into separate PDF files.
To extract custom metadata using the Batch Processing utility |
1.From the main form on the Tools menu, click Batch Process. The Batch Processing utility opens. 2.Select the documents containing the metadata you want to analyze and extract. For more information on selecting documents, see Selecting Documents for Processing. 3.Select the Document Processing/Analysis check box. 4.On the Options menu, click Document Analysis Options. Clicking Document Analysis Options opens the Document Analysis Options dialog box. The Document Analysis Options dialog box contains all of the metadata extraction options available in the Batch Processing utility. By default, all of the check boxes, except the Split document on bookmarks check box, are selected. •Capture custom metadata for Adobe PDF files. Extracts metadata field names and values that were assigned to the original Adobe Acrobat PDF. Such fields might include the names of e-mail attachments, or of PDF's embedded in other files such as in a Micrsoft Word document. •Capture custom metadata for MS Office files. Extracts metadata field names and values that were assigned to Microsoft Office file, except for Publisher and Access files. Field names and data are brought into the LAW case as extended properties. See note below. •Capture EXIF metadata for image files (TIFF/JPEG). Extracts metadata field names and values from EXIF files. If found, custom metadata fields are added to the case database with field names preceded by EP. •Detect bookmarks in Adobe PDF files. If an Adobe Acrobat PDF file contains bookmarks, assigns a Y to the HasBookmarks field. •Split document on bookmarks. Adobe Acrobat PDF files containing bookmarks will be split into separate PDF files by each bookmark. When this check box is selected, the bookmark names are added the BookmarkName field.
•Detect comments for MS Word/Excel/PowerPoint and PDF files. If comments in Microsoft Word, Excel, and PowerPoint files or sticky notes in Adobe Acrobat PDF files are detected, assigns a Y to the HasComments field. •Detect hidden rows and/or columns for MS Excel files. If hidden columns or rows are found, assigns a Y to HasHiddenRow and HasHiddenColumn, respectively. •Detect hidden sheets for MS Excel files. If hidden sheets are found, assigns a Y to the HasHiddenSheet field. •Detect hidden slides for MS PowerPoint files. If hidden slides are found, assigns a Y to the HiddenSlides field. •Detect speaker notes for MS PowerPoint files. If speaker notes are found, assigns a Y to the SpeakerNotes field. •Detect tracked changes for MS Word and Excel files. If tracked changes are found, assigns a Y to the HasTrackChanges system field.
5.Make sure the check box is selected for the options you want to apply during the metadata extraction. 6.Clear the check box for the options you do not want to apply during the metadata extraction. 7.Click OK. Clicking OK saves the option settings and closes the Document Analysis Options dialog box. 8.Optionally, select the Enable Distributed Batch Processing check box and configure the session as needed. For more information on using distributed batch processing see Distributed Batch Processing. 9.Click the Begin button. When the batch process is completed, the Status dialog box is displayed. 10.Click OK to close the Status dialog box.
|
Adobe Acrobat PDF files can be split into separate PDF files bases on the original PDF file's bookmarks. Using the Document Processing/Analysis feature in the Batch Processing utility, you can have LAW identify the PDF files containing bookmarks, and then have these PDF files split, based on the PDF file's bookmarks. When a PDF file is split by bookmark, the original PDF file is deleted, and the PDF file containing the pages of the first bookmark becomes the parent file. The PDF files for the remaining bookmarks become the children of the first bookmark PDF file. For example: If a PDF file has 15 pages with 3 bookmarks (Bookmark 1 = pages 1-5, Bookmark 2 = pages 6-10, Bookmark 3 = pages 11-15), when the PDF file is split, the PDF file generated for bookmark 1 will be the parent file, and the PDF files generated for bookmarks 2 and 3 will be the child files.
1.From the main form on the Tools menu, click Batch Process. The Batch Processing utility opens. 2.Select the PDF files containing bookmarks that you want to split. For more information on selecting documents, see Selecting Documents for Processing.
3.Select the Document Processing/Analysis check box. 4.On the Options menu, click Document Analysis Options. Clicking Document Analysis Options opens the Document Analysis Options dialog box. 5.Select the Detect bookmarks in Adobe PDF files and Split document on bookmarks check boxes.
6.Click OK. Clicking OK opens the Confirm PDF Split message. 7.Click Yes. 8.Optionally, select the Enable Distributed Batch Processing check box and configure the session as needed. For more information on using distributed batch processing see Distributed Batch Processing. 9.Click the Begin button. When the batch process is completed, the Status dialog box is displayed. 10.Click OK to close the Status dialog box. When the batch process is completed the HasBookmarks field for the PDF files containing bookmarks is set to Y, and the BookmarkName field is populated with the name of the bookmark associated with the PDF file. If a bookmark name exceeds 251 characters, the bookmark name will be truncated. In the tree view document list on the Index tab, the PDF files generated from the bookmarks are listed as children of the original PDF file.
|