Configuring Turbo Import

<< Click to Display Table of Contents >>

Navigation:  Using CloudNine LAW > Importing Documents > Turbo Import >

Configuring Turbo Import

Any Case File with Electronic Discovery enabled and Turbo Import selected (an Import Agent License is required) will have access to the Turbo Import utility. This utility streamlines the import process for all electronic documents.

RightArrowFor enabling Electronic Discovery with a new Case File, see the Starting New Cases topic.

After you've created a Turbo Import enabled Case File, your first step before using the utility will be to configure your Turbo Import Settings for the case.

 

InfoIcon Opening and Configuring Turbo Import

1.Start from the Menu of the Main User Interface by selecting File > Import > Turbo Import.

2.The Main User Interface will close, and the Turbo Import utility will open, with the Import Settings overlay on top.

i.This overlay can opened again at any time by clicking on the button labeled Settings located at the top-right corner of the Turbo Import utility.

3.From here, you can configure your Turbo Import Settings for this case based on the options shown below for each tab.

NOTICE: Only Passwords (in the Content tab) can later be changed. All other settings will be locked once importing begins.

RightArrowTabs:

4.Once you have the desired settings configured, click on OK at the bottom right of the overlay to close it.

5.You are now ready to Start a Turbo Import Session.

 

The following five tabs of settings are available within the Import Settings overlay of the Turbo Import utility:

TurboImportSettingsContentTabIcon Content

Compound Documents - Certain file types, known as compound documents, can contain other files embedded within them. For example, a Word document might contain embedded images and/or spreadsheets. LAW supports up to 99 levels of embedded files that are extracted and imported separately alongside their top-level parent file.

oExpand compound documents - Causes embedded files to be listed separately from their parent file as searchable attachments. - cannot be changed once importing begins.

Error Handling - Embedded files sometimes fail to extract during the import process. In these instances, you can decide to have LAW do the following:

oInclude native placeholders for extraction errors - Creates a text file with error details in place of any embedded file that fails to extract during the import process.

Passwords - This section allows you to enter any known passwords for password-protected files, which will enable LAW to analyze and extract their meta-data and content. The original files will remain encrypted.  Up to 500 passwords can be included, and will be listed in the larger field. Passwords are case-sensitive and can be added/edited both before and after import. Delete passwords by selecting them from the list and pressing the DELETE key on your keyboard. There are two ways to add passwords to the list:

1)Manually, by typing or copying them into the upper text field and clicking on the Add(+) button to the right

2)Automatically, via lists of passwords which are imported from line-delimited .txt files with the Import... button.

Language Analysis:

oIdentify language content during analysis - All languages used within the files will be identified and analyzed. The first 200 MB of each file is analyzed during this process, or the entire file should the overall size be less than 200 MB.

oRestrict language identification to common languages - Limits analysis to only the most recognizable languages to help improve accuracy.

Time Zone - Select your desired time zone from the drop-down menu here. Default is Coordinated Universal Time (UTC). All files and folders imported into the Case File will use the selected time zone for their records.

TurboImportSettingsFiltersTabIcon Filters

Settings within this tab cannot be changed once importing begins.

Deduplication - A process that scans all source files for any duplicates (identical copies). This is done by subjecting all source files to a hashing process, which yields unique numerical values (hashes) for each file. Files yielding identical hash values are considered duplicates. In Turbo Import, this process can be performed either globally or within a Custodian, and is based upon family (parent/child) relationships, so embedded (child) files will not be deduped against other components of the same family.

oEnable duplicate document detection - Turns on deduplication. If disabled, deduping can still be performed later via the Deduplication Utility.

oDeduplication Mode - There are 2 primary modes for deduplication available, each with alternates for comparing sources within each Custodian rather than across all Custodians. In most cases, either MD5 (128-bit) or SHA1 (160-bit) will provide sufficient deduplication integrity.

oIf a document is considered a duplicate, then - Two options are available from this drop-down menu:

Include - Creates a record for the duplicate in the Case Directory and copies the native source file to the Case Database.

Exclude - Does not create a record, no text is extracted, and the native source file is not copied into the Case Database.

NIST (NSRL) - The National Institute of Standards and Technology (NIST) maintains and publishes a database of known computer file profiles referred to as a Reference Data Set (RDS), which is compiled by the National Software Reference Library (NSRL). The NIST uses this RDS to compare files against known sets of software applications. NIST filtering is to used to remove file types that are unlikely to have useful data. Examples of such file types include system files, executable files, and application logic files.

oEnable NIST (NSRL) detection - Requires a NIST database to be provided through the LAW Configuration Utility.

oIf hashes match, then - Select either Include or Exclude from the drop-down menu to determine what happens with NIST items detected during import.

File Type - This section is for manual filtering of files based on specified file types. Filtering is targeted to top-level (parent) files within a Case Database, thus applying automatically to any embedded (child) files contained within. LAW supports the import of all file types (recommended).

oEnable file type filtering - Turns on manual file type filtering based on settings established within the File Type Manager.

oFile Type Manager - This button opens a separate window dedicated to specifying file types for filtering. Changes made here apply globally to all cases using manual file type filtering. Certain file types may be Included, Excluded, both (Exclude takes preference), or neither (determined below). You can also assign default applications for opening each file type within LAW.

oTreat file types not specifically included or excluded as - Select either Include or Exclude from the drop-down menu to determines how to handle file types not specified within the File Type Manager.

Date Range Filtering - This section allows for filtering based on specified date ranges for files. This filtering is overly inclusive, so entire families of files will be included if even a single embedded file falls within the specified range. Add date ranges by clicking the Add(+) button to the right of the first range, and remove them by clicking the Remove(-) button to the right of the unwanted range.

oFrom - Select a start date for each range by clicking on the appropriate calendar button located in this column.

oTo - Select an end date for each range by clicking on the appropriate calendar button located in this column.

 

CloudNine™ LAW supports import of all file types. Even if a file type is not supported for printing or conversion, metadata and text may still be extracted. A full list of Supported File Types that are recognized by both CloudNine™ LAW and CloudNine™ Explore during Import, can be found here. Supported File Types

TurboImportSettingsEmailTabIcon Email

Message Format - There are several options available from the drop-down menu:

oNative - Import all e-mails in their original format. Any e-mails containing a number of recipients exceeding the maximum supported by .msg are either converted to .html, or to .mhtml for those containing images.

oHTML - Import all e-mails in .html format.

oHTML (MHTML when images are present) - Import all e-mails containing images in .mhtml format, otherwise import them in .html format.

oMHTML - Import all e-mails in .mhtml format.

TurboImportSettingsMetadataTabIcon Metadata

Metadata Settings - For the following options, the metadata extracted during import is later displayed under various Extended Property, individually labeled as "EP" followed by the name of the field as it exists within the source file. For example, a file containing the metadata field named "Location" will be separately labeled as "EPLocation" within LAW. The options are as follows:

oExtract custom metadata properties for MS Office documents - Causes the following metadata to be extracted from files during import:

Comments in Word, Excel, and Powerpoint

Tracked Changes in Microsoft Word and Excel

Hidden rows and/or columns in Excel

Hidden and Very Hidden worksheets in Excel

Hidden Slides and Speaker Notes in Powerpoint

Publisher and Access files are not supported

oExtract EXIF metadata properties for Image documents - Exchangeable Image File Format (EXIF) is a standard that specifies the format for image, sound, and ancillary tags used by systems that handle the metadata for those files. For example, many image files have EXIF tags for geolocation embedded within them. When enabled, these properties will also be extracted.

oAuto-assign suspect extensions - If the file extension for a source file does not match the file type detected by LAW, then selecting this option will place the detected extension in the DocExt field and the source file extension in the OrigExt field during import.

oIdentify hidden text - Detects specific forms of text hidden within Word, Excel, and PowerPoint documents.  If found, the hidden text will be bracketed in-between <<<Start Hidden Content>>> and <<<End Hidden Content>>> within the extracted text.  Associated records will also have the HiddenText field set to Y.  These types of hidden text can be extracted:

Text hidden inside shape controls, such as text boxes.

Text specifically formatted as hidden.

Hidden spreadsheets, columns, and cells.

Hidden slides.

TurboImportSettingsOutputTabIcon Output

Settings established here are later reflected within the Case Directory pane of the Main User Interface.

Numbering Seed - This seed is used as the starting value for the unique DocID of each record listed, increasing in value for each record added. The seed automatically defaults to the next incremental value for subsequent import sessions based on the last successfully imported source. This value can be a mixture of numbers, letters, and symbols, and may contain up to 50 characters.

oAlpha-numeric - With this option enabled, seeds ending in alphabetical characters will continue incrementing alphabetically instead of numerically.

oNumber Attachments - With this option enabled, an additional set of bracketed numbers are added to the end of each record based on the total number of embedded or attached files within that family. Attached archives may not appear listed in LAW, though they will still be stored and numbered in the SQL Database.

oPreview - Displays the incremental DocID to be expected for each record based on the starting seed and options enabled above.

LAW Folder Structure - The organizational folder structure of all sources being imported is grouped according to the levels selected here. Each additional level adds to the length of the file path recorded within, so its recommended to only add levels when needed. The following levels are available: Custodian, Source, and Import Set.

Email Sort Order - Use this drop-down to determine how e-mail records are organized.

Preview - Displays the resulting folder structure to be expected in the Case Directory based on the levels established above.

turboimportsettingsagentsTabIcon Agents

This tabHere you can control the agent level that you would like to assign to a case or a ingestion.

Maximum Ingestion Agents - Here you can control the agents level that are are used on all the ingestion stages except for OCR, Extraction and Populating the LAW case. Notes this setting is directly tied to the Import Agents that you purchased.

oNOTE: For this setting a Turbo agents will consume one machines core and one Turbo Import licenses.

Enable Turbo OCR on Ingestion - This setting will turn on the Turbo OCR application. This settings will the automatically queue PDFs and imaged based document for OCR. The Turbo OCR application uses the ABBYY Fine Reader engine. .

oNOTE: Each Turbo workstation in the Turbo pool must have the ABBYY ENGINE installed and have a licenses.

oMaximum Turbo OCR Agents on Ingestion - This engine is multi-threaded and will engage with all the cores on a workstation. If you are going to set this to a number, it is best to select a number on a workstation level.

Maximum Turbo Agents used in the Native Extraction and Populating the LAW case stages - Here you can control the agents in the native extraction and populating stage.

oNOTE: This stage will consume one available core, but this stage is not tied to any licenses. As each environment is different, you may pick a number that is best for your environment.