Compound documents are composed of a container document and embedded documents. For example, a Word document may contain an embedded spreadsheet. The embedded spreadsheet is considered to be embedded one level down from the container document. LAW supports up to 99 levels of embedded compound documents. So if the embedded spreadsheet includes an embedded PowerPoint file (second level down) that further includes an embedded PDF (third level down), LAW can extract all four files.
1.On the File menu, click Import, and then click Electronic Discovery. Clicking Electronic Discovery opens the Sources tab in the LAW Electronic Discovery Loader dialog box.
2.Click the Settings tab and then click Compound Documents. The Compound Documents options display.
3.Click one of the following options:
•Disable compound document extraction Disables the compound document extraction feature during the import.
•Enable compound document extraction on all supported file types Any file containing an embedded file is imported as a parent document, and any embedded file within the parent file is imported as an attachment to the parent file.
•Restrict compound document extraction to PDF portfolios Only PDF portfolios are imported as compound documents. When this option is selected each file in a PDF portfolio is imported as a top-level file instead of the PDF portfolio being imported as the parent file with all of the files within the PDF portfolio being imported only as attachments to the parent PDF portfolio.
|
If an embedded file cannot be opened, a warning will be generated and logged in the Session Viewer and ErrorMsg field.
|
By default, LAW does not retain the PDF portfolio container. However, you can extract a PDF portfolio container file by going to the EDLoader.case.config.ini file and changing the OLE_IncludePDFPortfolioContainer property value to 1. Then run ED Loader with either the Enable compound document extraction on all supported file types or the Restrict compound document extraction to PDF portfolios option selected.
|
|
The following file types are supported for extraction from compound documents:
•Microsoft Word/RTF •Microsoft Excel •Microsoft PowerPoint •Adobe Acrobat PDF •SnapShot •Microsoft Visio •Microsoft Outlook.FileAttach (Word-authored e-mail with inline attachments, generally stored in RTF) •Microsoft Project •Package* *A Package is a general type of embed; it can be a text file or a zip file, for example. Any of the above types may also be embedded as a package type depending on the software installed when a user embeds the file. For example, if a user were to embed an Excel spreadsheet into a Word document, and Excel is not installed, the spreadsheet will be embedded as Package.
Supported containers file types and embedded files
The following table lists common embedded file types that LAW supports for extraction:
|
Office 95 and earlier versions of Office are not supported with CloudNine™ LAW.
|
Description
|
Detection
|
Extraction
|
Non-Microsoft Office Formats
|
|
Adobe Acrobat (pdf)
|
Y
|
Y
|
|
Rich text format (rtf) *Converted to Word format for extraction. Original file is preserved.
|
Y
|
Y
|
Office 2007 and above see System Requirements for full listings.
(with LAW versions 6.8+)
|
|
Excel Spreadsheet (OpenXml)
|
Y
|
Y
|
|
MS Office Data File (OpenXml)
|
Y
|
Y
|
|
PowerPoint Presentation (OpenXml)
*Compound documents not fully supported in this format
|
Y*
|
Y*
|
|
Word (OpenXml)
|
Y
|
Y
|
Office 2013, Office 2010, Office 2007
(with LAW versions 6.7 and 6.6)
|
|
Excel Spreadsheet (OpenXml)
|
Y
|
Y
|
|
MS Office Data File (OpenXml)
|
Y
|
Y
|
|
PowerPoint Presentation (OpenXml)
|
Y
|
Y
|
|
Word (OpenXml)
|
Y
|
Y
|
Office 2003
|
|
Word
|
Y
|
Y
|
|
Word (xml)
|
Y
|
N
|
|
Excel
|
Y
|
Y
|
|
Excel (xml) *Compound documents not supported in this format
|
*N
|
*N
|
|
OneNote
|
N
|
N
|
|
PowerPoint
|
Y
|
Y
|
|
Project
|
Y
|
Y
|
|
Project (xml) *Compound documents not supported in this format
|
*N
|
*N
|
|
Publisher
|
**Y
|
Y
|
|
Visio
|
N
|
N
|
|
Visio (xml) *Currently not recognized by file engine
|
*Y
|
N
|
Office 2002/XP, Office 2000
|
|
Word
|
Y
|
Y
|
|
Excel
|
Y
|
Y
|
|
PowerPoint
|
**Y
|
N
|
|
Publisher
|
Y
|
Y
|
|
Project
|
Y
|
Y
|
|
Visio
|
N
|
N
|
Office 97
|
|
Word
|
Y
|
Y
|
|
Excel
|
Y
|
Y
|
|
PowerPoint
|
Y
|
Y
|
|
Project
|
Y
|
Y
|
**Detection of embeds in these types is limited to the types of files supported for extraction (see above list).
|