Compound Documents

<< Click to Display Table of Contents >>

Navigation:  Using CloudNine LAW > Importing Documents > ED Loader > Configuring Import Settings >

Compound Documents

Compound documents are composed of a container document and embedded documents. For example, a Word document may contain an embedded spreadsheet. The embedded spreadsheet is considered to be embedded one level down from the container document. LAW supports up to 99 levels of embedded compound documents. So if the embedded spreadsheet includes an embedded PowerPoint file (second level down) that further includes an embedded PDF (third level down), LAW can extract all four files.

 

WindowIcon To Enable Extraction of Compound Documents

1.On the File menu, click Import, and then click Electronic Discovery.

Clicking Electronic Discovery opens the Sources tab in the LAW Electronic Discovery Loader dialog box.

2.Click the Settings tab and then click Compound Documents. The Compound Documents options display.

Compound Document options on the Settings tab

3.Click one of the following options:

Disable compound document extraction

Disables the compound document extraction feature during the import.

Enable compound document extraction on all supported file types

Any file containing an embedded file is imported as a parent document, and any embedded file within the parent file is imported as an attachment to the parent file.

Restrict compound document extraction to PDF portfolios

Only PDF portfolios are imported as compound documents. When this option is selected each file in a PDF portfolio is imported as a top-level file instead of the PDF portfolio being imported as the parent file with all of the files within the PDF portfolio being imported only as attachments to the parent PDF portfolio.

If an embedded file cannot be opened, a warning will be generated and logged in the Session Viewer and ErrorMsg field.

 

Note

By default, LAW does not retain the PDF portfolio container. However, you can extract a PDF portfolio container file by going to the EDLoader.case.config.ini file and changing the OLE_IncludePDFPortfolioContainer property value to 1. Then run ED Loader with either the Enable compound document extraction on all supported file types or the Restrict compound document extraction to PDF portfolios option selected.

 

WindowIcon To Support Embedded File Types

The following file types are supported for extraction from compound documents:

Microsoft Word/RTF

Microsoft Excel

Microsoft PowerPoint

Adobe Acrobat PDF

SnapShot

Microsoft Visio

Microsoft Outlook.FileAttach (Word-authored e-mail with inline attachments, generally stored in RTF)

Microsoft Project

Package*

*A Package is a general type of embed; it can be a text file or a zip file, for example. Any of the above types may also be embedded as a package type depending on the software installed when a user embeds the file. For example, if a user were to embed an Excel spreadsheet into a Word document, and Excel is not installed, the spreadsheet will be embedded as Package.

Supported containers file types and embedded files

The following table lists common embedded file types that LAW supports for extraction:

 

 

Description

Detection

Extraction

Non-Microsoft Office Formats

 

Adobe Acrobat (pdf)

Y

Y

 

Rich text format (rtf) *Converted to Word format for extraction. Original file is preserved.

Y

Y

Office

 

Excel Spreadsheet (OpenXml)

Y

Y

 

MS Office Data File (OpenXml)

Y

Y

 

PowerPoint Presentation (OpenXml)

*Compound documents not fully supported in this format

Y*

Y*

 

Word (OpenXml)

Y

Y

Office

 

Excel Spreadsheet (OpenXml)

Y

Y

 

MS Office Data File (OpenXml)

Y

Y

 

PowerPoint Presentation (OpenXml)

Y

Y

 

Word (OpenXml)

Y

Y

Office

 

Word

Y

Y

 

Word (xml)

Y

N

 

Excel

Y

Y

 

Excel (xml) *Compound documents not supported in this format

*N

*N

 

OneNote

N

N

 

PowerPoint

Y

Y

 

Project

Y

Y

 

Project (xml) *Compound documents not supported in this format

*N

*N

 

Publisher

**Y

Y

 

Visio

N

N

 

Visio (xml) *Currently not recognized by file engine

*Y

N

Office

 

Word

Y

Y

 

Excel

Y

Y

 

PowerPoint

**Y

N

 

Publisher

Y

Y

 

Project

Y

Y

 

Visio

N

N

Office

 

Word

Y

Y

 

Excel

Y

Y

 

PowerPoint

Y

Y

 

Project

Y

Y

 

**Detection of embeds in these types is limited to the types of files supported for extraction (see above list).