Using CloudNine LAW > Acquiring Documents > Turbo Import

Buttons for Subtopics Return to Topic

Opening and Configuring Turbo Import

1.Start from the Menu of the Main User Interface by selecting File > Import > Turbo Import.

2.The Main User Interface will close, and the Turbo Import utility will open, with the Import Settings overlay on top.

	The Import Settings overlay window automatically opens the first time you launch Turbo Import for a specific case. You can return to the Import Settings overlay by clicking the Settings button located in the top-right corner of the Turbo Import utility. Most Turbo Import Settings are locked and cannot be changed during and after the initial ingestion of data.

3.From here, you can configure your Turbo Import Settings for this case based on the options shown below for each tab.

	Only Passwords (in the Content tab) can later be changed.

Tabs:

4.Once you have the desired settings configured, click on OK at the bottom right of the overlay to close it.

5.You are now ready to Start a Turbo Import Session.

There are seven settings tabs available within the Import Settings overlay of the Turbo Import utility, as shown and described below.

Content

•Compound Documents - Certain file types, known as compound documents, can contain other files embedded within them. For example, a Word document might contain embedded images and/or spreadsheets. LAW supports up to 99 levels of embedded files that are extracted and imported separately alongside their top-level parent file.

oExpand compound documents - Causes embedded files to be listed separately from their parent file as searchable attachments.

•Error Handling - Embedded files sometimes fail to extract during the import process. In these instances, you can decide to have LAW do the following:

oInclude native placeholders for extraction errors - Creates a text file with error details in place of any embedded file that fails to extract during the import process.

•Passwords - This section allows you to enter any known passwords for password-protected files (Adobe Acrobat PDFs, Pkzip, Zipx, 7zip archives, and all office files), which will enable LAW to analyze and extract their meta-data and content. The original files will remain encrypted. but ingested files will be unencrypted Up to 500 passwords can be included, and will be listed in the larger field. Passwords are case-sensitive and can be added/edited both before and after import. Delete passwords by selecting them from the list and pressing the DELETE key on your keyboard. There are two ways to add passwords to the list:

1)Manually, by typing or copying them into the upper text field and clicking on the Add(+) button to the right

2)Automatically, via lists of passwords which are imported from line-delimited .txt files with the Import... button.

	The Turbo Import Password Bank is also used during Batch Process - Turbo Imager

•Language Analysis:

oIdentify language content during analysis - All languages used within the files will be identified and analyzed. The first 200 MB of each file is analyzed during this process, or the entire file should the overall size be less than 200 MB. For more information on the languages please visit this site.

oRestrict language identification to common languages - Limits analysis to only the most recognizable languages to help improve accuracy. For more information on the restricted languages please visit the bottom of this site.

•Time Zone - Select your desired time zone from the drop-down menu here. Default is Coordinated Universal Time (UTC). All files and folders imported into the Case File will use the selected time zone for their records.

Supported Embedded File Types

The following file types are supported for extraction from compound documents:

*A Package is a general type of embed; it can be a text file or a zip file, for example. Any of the above types may also be embedded as a package type depending on the software installed when a user embeds the file. For example, if a user were to embed an Excel spreadsheet into a Word document, and Excel is not installed, the spreadsheet will be embedded as Package.

Supported containers file types and embedded files

The following table lists common embedded file types that LAW supports for extraction:

Description	Detection	Extraction
Non-Microsoft Office Formats
Adobe Acrobat (pdf)	Y	Y
Rich text format (rtf) *Converted to Word format for extraction. Original file is preserved.	Y	Y
Excel Spreadsheet (OpenXml, xml) & Excel *Compound documents not supported in xml format	Y	Y
MS Office Data File (OpenXml)	Y	Y
PowerPoint Presentation (OpenXml) and PowerPoint *Compound documents not fully supported in OpenXml format	Y*	Y*
Word (OpenXml, xml) and Word	Y	Y
OneNote	N	N
Project (xml) and Project *Compound documents not fully supported in xml format	Y	Y
Publisher	**Y	Y
Visio (xml) and Visio *Currently not recognized by file engine	*Y	N

**Detection of embeds in these types is limited to the types of files supported for extraction (see above list).

Filters

•Deduplication - A process that scans all source files for any duplicates (identical copies). This is done by subjecting all source files to a hashing process, which yields unique numerical values (hashes) for each file. Files yielding identical hash values are considered duplicates. In Turbo Import, this process can be performed either globally or within a Custodian, and is based upon family (parent/child) relationships, so embedded (child) files will not be deduped against other components of the same family.

oEnable duplicate document detection - Turns on deduplication. If disabled, deduping can still be performed later via the Deduplication Utility.

oDeduplication Mode - There are 2 primary modes for deduplication available, each with alternates for comparing sources within each Custodian rather than across all Custodians. In most cases, either MD5 (128-bit) or SHA1 (160-bit) will provide sufficient deduplication integrity.

oIf a document is considered a duplicate, then - Two options are available from this drop-down menu:

▪Include - Creates a record for the duplicate in the Case Directory and copies the native source file to the Case Database.

▪Exclude - Does not create a record, no text is extracted, and the native source file is not copied into the Case Database.

In addition to deduplicating prior to the import process, LAW also allows you to deduplicate at these other times in a postF-discovery workflow:

•After the import against other records in the case by using the Deduplication Utility.

•After the import against other records in the case and other LAW cases by using Inter-Case Deduplication.

•NIST (NSRL) - The National Institute of Standards and Technology (NIST) maintains and publishes a database of known computer file profiles referred to as a Reference Data Set (RDS), which is compiled by the National Software Reference Library (NSRL). The NIST uses this RDS to compare files against known sets of software applications. NIST filtering is to used to remove file types that are unlikely to have useful data. Examples of such file types include system files, executable files, and application logic files.

oEnable NIST (NSRL) detection - Requires a NIST database to be provided through the LAW Configuration Utility.

oIf hashes match, then - Select either Include or Exclude from the drop-down menu to determine what happens with NIST items detected during import.

Changes in the NIST list are global and will apply to new imports in other cases. Both Turbo Import and Electronic Discovery cases

•File Type - This section is for manual filtering of files based on specified file types. Filtering is targeted to top-level (parent) files within a Case Database, thus applying automatically to any embedded (child) files contained within. LAW supports the import of all file types (recommended).

oEnable file type filtering - Turns on manual file type filtering based on settings established within the File Type Manager.

oFile Type Manager - This button opens a separate window dedicated to specifying file types for filtering. Changes made here apply globally to all cases using manual file type filtering. Certain file types may be Included, Excluded, both (Exclude takes preference), or neither (determined below). You can also assign default applications for opening each file type within LAW.

oTreat file types not specifically included or excluded as - Select either Include or Exclude from the drop-down menu to determines how to handle file types not specified within the File Type Manager.

At present, CloudNine LAW is unable to conduct NIST and file type filtering when processing UFDR files.

To Edit the File Type Manager

1.Select Enable file type filtering.

2.Select File Type Manager.

3.The Manage File Types opens in a new window.

4.Configure file inclusion and exclusion lists, and other options:

oInc. selected - all documents and database records with Inc. will be written to the LAW database.

oExc. selected - the file, its metadata, and its associated text are not written to the LAW database.

oBoth Inc. and Exc. selected - exclude takes precedence over the include option and the file, its metadata, and its associated text are not written to the LAW database.

oNeither Inc. nor Exc. are selected - the status is determined by the setting selected in Treat file types not specifically included or excluded as

oAssign default source applications for each file type.

Changes in the File Type Manager are global and will apply to new imports in other cases. Both Turbo Import and Electronic Discovery cases

•Date Range Filtering - This section allows for filtering based on specified date ranges for files. This filtering is overly inclusive, so entire families of files will be included if even a single embedded file falls within the specified range. Add date ranges by clicking the Add(+) button to the right of the first range, and remove them by clicking the Remove(-) button to the right of the unwanted range.

oFrom - Select a start date for each range by clicking on the appropriate calendar button located in this column.

oTo - Select an end date for each range by clicking on the appropriate calendar button located in this column.

CloudNine™ LAW supports import of all file types. Even if a file type is not supported for printing or conversion, metadata and text may still be extracted. A full list of Supported File Types that are recognized by both CloudNine™ LAW and CloudNine™ Explore during Import, can be found here. Supported File Types

•Message Format - There are several options available from the drop-down menu:

oNative - Import all e-mails in their original format. Any e-mails containing a number of recipients exceeding the maximum supported by .msg are either converted to .html, or to .mhtml for those containing images.

oHTML - Import all e-mails in .html format.

oHTML (MHTML when images are present) - Import all e-mails containing images in .mhtml format, otherwise import them in .html format.

oMHTML - Import all e-mails in .mhtml format.

Metadata

•Metadata Settings - For the following options, the metadata extracted during import is later displayed under various Extended Property, individually labeled as "EP" followed by the name of the field as it exists within the source file. For example, a file containing the metadata field named "Location" will be separately labeled as "EPLocation" within LAW. The options are as follows:

oExtract custom metadata properties for MS Office documents - Causes the following metadata to be extracted from files during import:

▪Comments in Word, Excel, and Powerpoint

▪Tracked Changes in Microsoft Word and Excel

▪Hidden rows and/or columns in Excel

▪Hidden and Very Hidden worksheets in Excel

▪Hidden Slides and Speaker Notes in Powerpoint

▪Publisher and Access files are not supported

The metadata extracted will be populated in Extended Property fields in LAW. The extended property field names will start with EP followed by the name of the field as it exists in the source document.

•Example: if a Word document is imported that contains a custom metadata field called Typist, LAW creates a metadata field during the import called EPTypist.

oExtract EXIF metadata properties for Image documents - Exchangeable Image File Format (EXIF) is a standard that specifies the format for image, sound, and ancillary tags used by systems that handle the metadata for those files. For example, many image files have EXIF tags for geolocation embedded within them. When enabled, these properties will also be extracted.

The metadata extracted will be populated in Extended Property fields. The extended property field names will start with EP followed by the name of the field as it exists in the source document.

•EX: if a PNG document is imported that contains a custom metadata field called Colors, LAW creates a metadata field during the import called EPColors.

oAuto-assign suspect extensions - If the file extension for a source file does not match the file type detected by LAW, then selecting this option will place the detected extension in the DocExt field and the source file extension in the OrigExt field during import.

oIdentify hidden text - Detects specific forms of text hidden within Word, Excel, and PowerPoint documents. If found, the hidden text will be bracketed in-between <<<Start Hidden Content>>> and <<<End Hidden Content>>> within the extracted text. Associated records will also have the HiddenText field set to Y. These types of hidden text can be extracted:

▪Text hidden inside shape controls, such as text boxes.

▪Text specifically formatted as hidden.

▪Hidden spreadsheets, columns, and cells.

▪Hidden slides.

Custom and EXIF metadata extraction, as well as the detection of hidden text, can also be performed after records are ingested through Batch Process – Document Processing / Analysis.

ModernData

The Modern Data tab provides options for importing data collected from mobile devices. CloudNine LAW supports Cellebrite UFDR files, which are containers for parsed mobile device data, including chats, contacts, and images.

To ingest UFDR files into CloudNine LAW, the UFDR file must be a top-level item and cannot reside within any container file, such as an attachment or archive file (zip, rar, etc.).

Import Type

There are three Import Types: Chats, Call Logs, and Voice Mails. By default, all options are selected (checked). Uncheck any Import Type you do not want to include. At least one type must be selected.

File Attachments

Do Not Import File Attachments larger than _________ MB.

Option to exclude attachments that are over a specified size.

•If blank (empty), all file attachments are imported into LAW.

•If zero “0” is inputted, no attachments are imported.

•Finally, if a specified value such as five “5” is inputted, then any file attachments greater than 5 MB are excluded from import.

Recommendation Warning: When creating and testing RSMF, we began experiencing issues when the attachment zip file exceeded 500 MB. Based on this limitation, we strongly recommend filtering out any individual attachments larger than 100 MB, on the Turbo import process. This will help to create a workable RSMF. For more information on RSMFs please see this page

Output

Settings established here are later reflected within the Case Directory pane of the Main User Interface.

•Numbering Seed - This seed is used as the starting value for the unique DocID of each record listed, increasing in value for each record added. The seed automatically defaults to the next incremental value for subsequent import sessions based on the last successfully imported source. This value can be a mixture of numbers, letters, and symbols, and may contain up to 50 characters.

oAlpha-numeric - With this option enabled, seeds ending in alphabetical characters will continue incrementing alphabetically instead of numerically. (ABC001a, ABC001b, ABC001c.)

oNumber Attachments - With this option enabled, an additional set of bracketed numbers are added to the end of each record based on the total number of embedded or attached files within that family. Attached archives may not appear listed in LAW, though they will still be stored and numbered in the SQL Database.

oUse nested filename for items extracted from archive - With this option enabled, the archive filename will precede the filename of each extracted item in the Filename field, as follows: archive.zip?filename.doc

oPreview - Displays the incremental DocID to be expected for each record based on the starting seed and options enabled above.

•LAW Folder Structure - The organizational folder structure of all sources being imported is grouped according to the levels selected here. Each additional level adds to the length of the file path recorded within, so its recommended to only add levels when needed. The following levels are available: Custodian, Source, and Import Set.

For folder example please see this page LAW Folder Structure Examples

•Email and Communication Sort Order - Use the drop-down to choose how emails and communication records are organized, either Oldest to Newest or Newest to Oldest.

•Preview - Displays the resulting folder structure to be expected in the Case Directory based on the levels established above.

Agents

Here you can control the agent level to assign to an import session.

•Maximum Ingestion Agents - Control the agents level that are used on all the ingestion stages except for OCR, Extraction and Populating the LAW case.

The number of Ingestion Agents available is based on the number of Turbo Import licenses purchased. Each turbo agent uses one turbo license and one machine core.

•Enable Turbo OCR on Ingestion - This setting will turn on the Turbo OCR application. Enabled PDFs and image-based documents to automatically go through OCR.

The Turbo OCR application uses the ABBYY Fine Reader engine. Each Turbo workstation in the Turbo pool must have the ABBYY ENGINE installed and have an active license.

oMaximum Turbo OCR Agents on Ingestion - This engine is multi-threaded and will engage with all the cores on a workstation. If you are going to set this to a number, it is best to select a number on a workstation level.

•Maximum Turbo Agents used in the Native Extraction and Populating the LAW case stages - Control the agents in the native extraction and populating stage.

This stage will consume one available core, but is not tied to any licenses. Each environment is different and should be configured accordingly.