Using CloudNine LAW > Acquiring Documents > ED Loader

Return to Topic

Configuring ED Loader

1.Click on the Settings tab at the top left. From here, you can configure your ED Loader Settings for this case based on the groups of options listed under the Categories pane on the left-hand side of the tab.

Categories:

2.Once you have your desired settings, left-click on the Apply Settings button located at the bottom left to save these settings for the active case.

i.The Lock Settings button prevents any further changes from being made to these settings.

ii.The Set as Default button tells LAW to automatically apply these settings to all future ED Loader cases.

3.You are now ready to start Using ED Loader.

	The following Categories are available within the Settings tab of the ED Loader tool:

Settings

Compound Documents

Certain file types, known as compound documents, can contain other files embedded within them. For example, a Word document might contain embedded images and/or spreadsheets. LAW supports up to 99 levels of embedded files that are extracted and imported separately alongside their top-level parent file.

•Disable compound document extraction - Embedded files contained within compound documents will not be extracted to the Case Database or have records created in the Case Directory.

•Enable compound document extraction on all supported file types - Compound documents will be imported as top-level (parent) files, and any embedded files are separately extracted and imported as attachments to their parent file.

•Restrict compound document extraction to PDF portfolios - Only PDF portfolios will be treated as compound documents. Additionally, files embedded within PDF portfolios will be separately extracted and imported as their own top-level (parent) file instead of as attachments. LAW will not retain the original PDF portfolio.

By default, LAW does not retain the PDF portfolio container. However, you can extract a PDF portfolio container file by going to the EDLoader.case.config.ini file and changing the OLE_IncludePDFPortfolioContainer property value to 1. Then run ED Loader with either the Enable compound document extraction on all supported file types or the Restrict compound document extraction to PDF portfolios option selected.

To Support Embedded File Types

The following file types are supported for extraction from compound documents:

•Microsoft Word/RTF

•Microsoft Excel

•Microsoft PowerPoint

•Adobe Acrobat PDF

•SnapShot

•Microsoft Visio

•Microsoft Outlook.FileAttach (Word-authored e-mail with inline attachments, generally stored in RTF)

•Microsoft Project

•Package*

*A Package is a general type of embed; it can be a text file or a zip file, for example. Any of the above types may also be embedded as a package type depending on the software installed when a user embeds the file. For example, if a user were to embed an Excel spreadsheet into a Word document, and Excel is not installed, the spreadsheet will be embedded as Package.

Supported containers file types and embedded files

The following table lists common embedded file types that LAW supports for extraction:

Description		Detection	Extraction
Non-Microsoft Office Formats
	Adobe Acrobat (pdf)	Y	Y
	Rich text format (rtf) *Converted to Word format for extraction. Original file is preserved.	Y	Y
Office
	Excel Spreadsheet (OpenXml)	Y	Y
	MS Office Data File (OpenXml)	Y	Y
	PowerPoint Presentation (OpenXml) *Compound documents not fully supported in this format	Y*	Y*
	Word (OpenXml)	Y	Y
Office
	Excel Spreadsheet (OpenXml)	Y	Y
	MS Office Data File (OpenXml)	Y	Y
	PowerPoint Presentation (OpenXml)	Y	Y
	Word (OpenXml)	Y	Y
Office
	Word	Y	Y
	Word (xml)	Y	N
	Excel	Y	Y
	Excel (xml) *Compound documents not supported in this format	*N	*N
	OneNote	N	N
	PowerPoint	Y	Y
	Project	Y	Y
	Project (xml) *Compound documents not supported in this format	*N	*N
	Publisher	**Y	Y
	Visio	N	N
	Visio (xml) *Currently not recognized by file engine	*Y	N
Office
	Word	Y	Y
	Excel	Y	Y
	PowerPoint	**Y	N
	Publisher	Y	Y
	Project	Y	Y
	Visio	N	N
Office
	Word	Y	Y
	Excel	Y	Y
	PowerPoint	Y	Y
	Project	Y	Y

**Detection of embeds in these types is limited to the types of files supported for extraction (see above list).

Deduplication

Deduplication is a process that scans all source files for any duplicates (identical copies). This is done by subjecting all source files to a hashing process, which yields unique numerical values (hashes) for each file. Files yielding identical hash values are considered duplicates.

•Enable Duplicate Detection - Turns on deduplication.

oWorking Digest - Provides two choices for the type of hashing being used to detect duplicates: MD5 (128-bit output digest), or SHA1 (160-bit output digest).

oTest for duplicate against (Scope) - Provides two choices for the level (file hierarchy) at which files are tested against each other for duplicates:

▪Case Level (Globally) - All source files being imported are tested against each other for duplicates.

▪Custodian Level - Source files being imported are only checked for duplicates within each Custodian.

oIf record is considered duplicate then (Action) - Provides three choices for how duplicates are handled once detected:

▪Include (Log record) - Creates a separate record for the duplicate in the Case Directory and copies the source file into the Case Database.

▪Partially Exclude (Log record but do not copy file) - Creates a separate record for the duplicate but does not copy the source file into the database.

▪Exclude (Do not log record or copy file) - No record is created, no text is extracted, and no source file is copied into the database for the duplicate.

•Include attachment hashes in e-mail metadata hash - Causes hashes generated for e-mail attachments to be included in the metadata hash of the parent e-mail file. When disabled, LAW displays the names of these attached files in the Attach field instead.

•Enable hashing of non-email Outlook items - Generates hash values for non-email items within an Outlook PST file during import, including: calendar items, contacts, journal entries, notes, and tasks.

In addition to deduplicating prior to the import process, LAW also allows you to deduplicate at these other times in a postF-discovery workflow:

•After the import against other records in the case by using the Deduplication Utility.

•After the import against other records in the case and other LAW cases by using Inter-Case Deduplication.

E-mail

•Sorting - This section allows you to determine how e-mails will be organized within the Case Directory:

oSort Key - E-mails can be sorted based on one of four attributes chosen from the drop-down menu: From (who sent it), Received (date received), SentOn (date sent), and Subject.

oSort Direction - Select either Ascending (oldest to newest) or Descending (newest to oldest) to determine the order e-mails are sorted in based on the chosen Sort Key.

•Metadata - This section contains settings that determine the format of e-mails being copied into the Case Database.

oSave Outlook messages as - All Outlook messages will be imported and saved as one of five formats:

▪HTML - Messages are saved as HTML unless they contain certain embedded items which cannot be rendered in HTML (such as calendars, contacts, tasks, etc). In these instances, those items are instead saved as MSG files within the same folder as the converted parent message.

▪HTML/MHTML (based on format) - RTF messages, non-mail items, and HTML messages (including those with linked images) are saved as MHTML. All other messages are saved as HTML instead.

▪HTML/RTF (based on format) - RTF messages and non-mail items are saved as RTF. HTML messages are left as HTML. This is the preferred format due to the potential loss of images or embedded objects when saving RTF messages as HTML.

▪MSG - Messages are saved as MSG unless they contain extremely large recipient lists (typically over 5,000), in which case they are saved as HTML instead.

▪MHTML - All messages are saved as MHTML. MHTML files are web archives that allow the embedding of images directly within the file, eliminating the need for any linked images or dependent files. Messages are first converted to either HTML or RTF depending on their original format before being converted to MHTML. Due to this secondary conversion, processing speed is significantly slower with this setting.

oTranslate date fields to universal time (GMT) - Converts the times on all incoming e-mail messages to GMT (Greenwich Mean Time).

oPreserve X.400 addresses - X.400 is a suite of telecommunication standards for Message Handling Systems (MHS). Today, this standard has been replaced for e-mails by the Simple Mail Transfer Protocol (SMTP). This setting preserves those older X.400 address in e-mail messages which lack SMTP addresses for either the sender or the recipient. When this setting is disabled, any messages without an SMTP address present will have one generated automatically. These auto-generated addresses may not be accurate.

i.Messages containing an SMTP address will still use that SMTP address even with the Preserve X.400 addresses box checked.

▪Legacy Mode (produces hashes compatible with LAW 6.2 and lower) - This setting allows older cases to continue being used with newer versions of LAW. This should not be used for cases created under version 6.3 or later.

•Outlook Folder Types - This section allows the inclusion of non-mail items when importing Outlook folders and mail stores. Check the box next to each item you want to include: Calendars, Contacts, Journal, Notes, Tasks.

•Lotus Notes: - This section allows you to determine how Lotus Notes files are handled:

oLog warning messages for e-mails containing RTF body content - Warning messages will be logged when RTF is found within the body of an e-mail. Formatting and/or data can be lost when converting RTF to HTML.

oExtract embedded images in RTF content as attachments - Extracts images embedded within the body or RTF of any e-mails. These images will appear as attachments to that e-mail in the Case Directory.

Exclusions

•Predefined Exclusions - These are conditional settings that can be used to prevent specific sources from being imported into the Case Database:

oExclude mail stores from e-doc processing - When folders containing mail stores are selected for import, this prevents those mail stores from merely being treated as a single record within the Case Directory. Even when disabled, mail stores are still treated as individual sources by adding them separately to the Source Queue.

oExclude empty files (0 bytes) from processing - Prevents empty (0 byte size) files from being imported.

•E-Mail Date Range Exclusions - This section only applies to PST and OST files, and allows you to establish dates for e-mails to exclude from the import.

oExclude e-mails with sent date matching the specified criteria - Prevents e-mails falling within specified date ranges from being imported. Select a condition from the first drop-down menu, and then select a date from the second drop-down menu for which that condition should apply. You can also select And or Or instead of None to add another condition and date for these exclusion parameters.

File Types

•File Type Management Database - The top field displays the database currently being used for file type management during import. Press the [...] button on the right to open a small window and select a different SQL Server or Access Database for this purpose. The Edit... button opens the File Type Manager, which is dedicated to specifying file types within the chosen database for filtering. Certain file types may be Included, Excluded, both (Exclude takes preference), or neither (determined below). You can also assign default applications for opening each file type within LAW.

•File Type Filtering Options - This section determines how the current File Type Management Database is being used for the active case.

oEnable file type filtering - Turns file type filtering on. Source files selected will be imported depending on the scope chosen below:

▪Active list (Scope) - Determines what happens with file types defined within the File Type Manager:

•'Include' List - File types marked as Include within the File Type Manager will be imported. All other files will not be imported.

•'Exclude' List - File types marked as Exclude within the File Type Manager will not be imported. All other file types will be imported.

▪If filetype is not marked 'Include' then (Action) - Determines what happens with file types not defined within the File Type Manager:

•Include (Log record) - Creates a record in the Case Directory and imports the source file into the Case Database.

•Partially Exclude (Log record but do not copy file) - Creates a record but does not import the source file into the Case Database.

•Exclude (Do not log record or copy file) - Does not create a record or import the source file into the Case Database.

oAuto-assign suspect extensions - If the extension for a source file does not match the actual file type, then LAW will assign a new extension from the "ActiveExt" field (within the File Type Manager) to the DocEXT field during import, and record the original extension in the OrigExt field.

A Knowledge Based Article on this subject can be found here: How do I correct "Could Not Identify Filetype" errors? (cloudnine.com)

CloudNine™ LAW supports import of all file types. Even if a file type is not supported for printing or conversion, metadata and text may still be extracted. A full list of Supported File Types that are recognized by both CloudNine™ LAW and CloudNine™ Explore during Import, can be found here. Supported File Types

General

•Source Selection - The following settings dictate how ED Loader queues items located within folders that are selected for import:

oEnsure source names are unique when they are added to the queue - Prevents duplicate file names from appearing in the Source Queue. This feature is particularly useful when importing multiple sources with identical names, like multiple mail stores named "Personal Folders" for example. In these instances, subsequent sources with the same file name will have numbers (001, 002, etc) appended to the end.

oAutomatically set E-Doc folder sources to 'Recurse' when I add them via drag and drop - When using the drag and drop method to queue sources, any folders selected will have all subfolders contained within them added separately to the Source Queue as well.

oScan folder selections for supported mailstores when added to the queue - Instructs ED Loader to look for any mail stores located within folders that are selected for import. Mail stores found are automatically added to the Source Queue.

▪Scan supported archives - Archive files found within selected folders will also be scanned for any mail stores located within. If a mail store is found, the Extract Stores window will prompt you to select a new location for the extracted mail stores. These extracted mail stores will then be added to the Source Queue.

▪Add selected folder to queue when mailstore scan is active - Instructs ED Loader to add the selected folders to the queue after any mail stores located within have been queued. Otherwise, no other files/folders contained within the selected folders will be imported.

•Time Zone Selection - The following setting applies to the recorded date/time of each source imported into the Case Database:

oOverride system time zone during processing - Changes the date/time of all records to match the Time Zone selected from the drop-down menu.

•Distributed Processing - You may configured ED Loader to allow multiple computers to participate in an import session. Learn more.

oEnable distributed processing - Turns on distributed processing. Click the Edit... button on the right to open the Create/Modify Job Invitation window.

Metadata

Metadata extracted through these settings will populate within various Metadata Fields.

•Capture custom metadata for Adobe PDF files - Extracts metadata from all top-level (parent) and embedded (child) PDF source files.

•Capture custom metadata for MS Office files - Extracts metadata from all Office source files except for Publisher and Access files. This data populates under the Extended Properties fields.

•Capture EXIF metadata for image files (TIFF/JPEG) - This data populates under various Extended Properties fields.

•Detect comments for MS Word/Excel/PowerPoint and PDF files - If detected, LAW assigns a Y to the HasComments field.

•Detect tracked changes for MS Word and Excel files - If detected, LAW assigns a Y to the HasTrackChanges field.

•Detect hidden rows and/or columns for MS Excel files - If detected, LAW assigns a Y to the HasHiddenRow and/or the HasHiddenColumn fields.

•Detect hidden sheets for MS Excel files - If detected, LAW assigns a Y to the HasHiddenSheet field.

•Detect hidden slides for MS PowerPoint files - If detected, LAW assigns a Y to the HiddenSlides field.

•Detect speaker notes for MS PowerPoint files - If detected, LAW assigns a Y to the SpeakerNotes field.

Extended property metadata is placed in extended property fields that are created when the documents are imported or the metadata is extracted by the Batch Processing utility. The names of all extended property fields start with EP. The remainder of the field name depends on the name of the field as it exists in the source document. For example, if a Word document is imported that contains a custom metadata field called Typist, LAW will create a metadata field during the import or batch processing called EPTypist. Deleting a document will delete all corresponding extended properties for that document. For more information on extended properties, see: Extended Properies in Grid Views.

EDLoaderSettingsMetadata

NIST (NSRL) Filter

The National Institute of Standards and Technology (NIST) maintains and publishes a database of known computer file profiles referred to as a Reference Data Set (RDS), which is compiled by the National Software Reference Library (NSRL). The NIST uses this RDS to compare files against known sets of software applications. NIST filtering is to used to remove file types that are unlikely to have useful data. Examples of such file types include system files, executable files, and application logic files.

•Enable NIST (NSRL) Filter - Requires a NIST database to be provided through the LAW Configuration Utility.

•If records are detected as NIST (NSRL) then (Action) - Choose one of the following options from the drop-down menu:

oInclude (Log record) - Creates a record in the Case Directory and imports the source file into the Case Database.

oPartially Exclude (Log record but do not copy file) - Creates a record but does not import the source file into the Case Database.

oExclude (Do not log record or copy file) - Does not create a record or import the source file into the Case Database.

Output

Settings established here are later reflected in the records listed within the Case Directory pane of the Main User Interface.

•Folder Output Scheme - Choose a main organizational structure for the active case from the drop-down menu to be represented in the Folder Tree portion of the Case Directory pane:

oRelative (Output path is relative to selection) (Applicable to EDoc selection only) - Folder structure is based initially on the name of the selected source folders, and is maintained for each record contained within. Mail stores are mirrored instead.

oMirrored - The entire folder structure of the selected folders is re-created identically within the Case Directory.

•Options - Additional options for organizing folders/files in the Case Directory upon import:

oCategorize output sources (E-Mails/E-Docs) - All mail stores are placed under an additional "E-Mail" folder, and all other files are placed under an additional "E-Doc" folder within the Case Directory.

oUse source name as top level folder - The Source Name of each source as it appears in the Source Queue is used as the top-level folder name for that source when recorded in the Case Directory.

oEnsure root output folder is unique in LAW - When choosing a Default Target Folder for the selected sources other than <Case Root> under the Sources tab, this ensures that each folder created during import is unique within the chosen target folder.

•Structure - Provides a preview of the resulting folder structure to be expected in the Case Directory based on the settings chosen above.

Password

This category is used to establish, and make quick changes to, the global list of passwords currently being used by LAW to access any encrypted (password protected) files that are selected as import sources for all cases. This global list is saved in your LAW install directory as "edloader.pwdfile.txt".

•Password - This top field shows the file path for the current TXT file being used as a global password list by LAW. Pressing the [...] button on the right opens a File Explorer that allows you to select a different TXT file to use instead.

•Passwords - This larger field displays all the passwords (one per line) as they appear in the TXT file established above. All listed passwords are case-sensitive. Passwords can be selected from the list by left-clicking on them here.

•Current - This section provides a user-input (text) field along with four buttons, allowing you to quickly perform specific actions with the current password list:

oAdd - This button will add any text exactly as it appears above to the password list on the left.

oUpdate - This button will change any password selected on the left to match the text above exactly as it appears.

oDelete - This button will delete any password selected on the left from the list.

oClear All - This button will delete all passwords from the list on the left.

If I password if found for a file, that password will be populated into the Password field. If you get the password later, you can add it to this password bank for Turbo Imager to use it.

Post Import Actions

This category provides settings for tasks you want LAW to automatically perform once the import finishes, assuming that the computer has been left unattended long enough for the Results dialog to close.

•Convert imported documents to TIFF - It's recommended to configure your TIFF conversion settings before running an ED Loader session if you wish to use this setting. This disables the Display imported documents in grid option below.

•Display imported documents in grid - Launches and displays the Grid View for all records imported during the recent session.

•Perform full-text indexing - Applies to all sources flagged for indexing.

•Results dialog will close after [N] seconds, unattended - You can specify how long LAW waits (in seconds) before it automatically closes the Results dialog and begins performing the post import tasks enabled above.

Text Extraction

•Enable Text Extraction - Applicable files will have text extracted from them during the import.

oInclude metadata in extracted text - All available file properties will be extracted alongside the text.

oEnable binary scanning in text extraction - Forces LAW to scan all file types for text instead of only those with the "Ext. Text" flag within the File Type Manager. Recommended to enable Validate extracted text alongside this option.

oValidate extracted text - Scans all extracted text to ensure readability. Useful for filtering out files with form feed or other control characters in text. Any extracted text deemed unreadable is automatically discarded.

oIdentify hidden text content - Detects specific forms of text hidden within Word, Excel, and PowerPoint documents. This hidden text will be bracketed within <<<Start Hidden Content>>> and <<<End Hidden Content>>> at the top of the extracted text for each file, and all associated records will have their HiddenText field tagged with a Y. The following hidden text can be identified:

▪Text hidden inside shape controls, such as text boxes.

▪Text specifically formatted as "Hidden".

▪Hidden spreadsheets, columns, and cells.

▪Hidden slides

oIdentify language content - Identifies the first five languages used in each source file. Languages with only one occurrence are ignored, and single-word sentences are not evaluated. Enabling this may significantly decrease import speed. For more information on the languages please visit this site.

▪Limit content analysis to first [N] KB of file - Determines the initial chunk size in KB from each file that is scanned for language. Reducing this can help improve import speed. Setting this to 0 KB will cause every file to be fully scanned.

▪Restrict language identification to common languages - Limits analysis to only the most recognizable languages to help improve accuracy. If a common language is not encountered, then a value of "Unknown" will be returned. For more information on the restricted languages please visit the bottom of this site.

Session Defaults

EDLoaderSettingsSavingOptions You can setup import session defaults, such as locking and unlocking settings, saving the current settings as default, or set the settings and the DocID seed in current use as the default for future sessions.

Use the Settings tab in ED Loader to select among options for saving ED Loader settings:

Lock Settings or Unlock Settings - Toggles between locking and unlocking import settings. When locked, settings cannot be changed.

•Apply Settings - Saves the currently selected settings and applies them if you restart ED Loader. Use this option if you want to temporarily close ED Loader before running the import session. Settings are saved to a case-level ED Loader configuration file. For more information on stopping and restarting ED Loader sessions, see Cancelling and Resuming Sessions.

•Set as Default - Saves the current Settings tab selections and the DocID seed as default for all cases started with the same LAW executable.

Configuring ED Loader

Configuring ED Loader

Configuring ED Loader

Settings

Archives

Compound Documents

Deduplication

E-mail

Exclusions

File Types

General

Metadata

NIST (NSRL) Filter

Output

Password

Post Import Actions

Text Extraction

Session Defaults