Near-Duplicate & Email Thread Analysis

<< Click to Display Table of Contents >>

Navigation:  CloudNine Explore > Using CloudNine Explore > Case Dashboard >

Near-Duplicate & Email Thread Analysis

Near duplicates are documents that have duplicate content, but are not necessarily exact duplicates of each other. Email threads are email messages belonging to the same email conversation thread. An email thread includes the original email message, all replies to the original message, and any attachments to those messages. In CloudNine™ Explore, you can run the near-duplicate analysis and email thread analysis to identify the near-duplicate documents and email threads within a case. The near-duplicate analysis uses the CloudNine Near Dupe engine with the MinHash algorithm to scale to large CloudNine™ Explore cases. The email thread analysis uses the CloudNine Email Thread engine.

Before you can view near duplicates for a CloudNine™ Explore case, you will need to define the comparison threshold, enable near-duplicate analysis, and then run the near-duplicate analysis in CloudNine™ Explore.  Before you can view email threads for a CloudNine™ Explore case, you will need to enable and run email threading analysis in CloudNine™ Explore.

Once near-duplicate analysis and email thread analysis is enabled and performed in CloudNine™ Explore, you can view the results of the near-duplicate analysis and email thread analysis in CloudNine™ Explore Web. For more information, see the Explore Web Running Near-Duplicate & Email Thread Analysis topic.

Note

Be aware that to take full advantage of near-duplicate and email thread analysis functionality available from CloudNine products, you must have both CloudNine™ Explore and CloudNine™ Explore Web.

Note

The near-duplicate analysis and email thread analysis can only run on one machine at a time.

In order to use the near-duplicate analysis and email thread analysis utilities in CloudNine™ Explore, you will need the Near-Duplicate/Email Thread license. The Near-Duplicate & Email Thread license is only used by CloudNine™ LAW on an as-needed basis, so the license is only displayed in the LAW Profile Manager (Administration Mode). The license is not displayed in the license list of the LAW Management Console (LMC) or License Information dialog box (Help > About LAW > Licenses).

 

Near-Duplicate Fields

After running the near-duplicate analysis in CloudNine™ Explore, the following fields are populated with the applicable near-duplicate data for the case's documents, and become available for native and native subset exports in CloudNine™ Explore. For exports, the fields are located under the Near Duplicate category in the Select Export Fields dialog box. The fields are as follows:

ND_IsMaster

ND_Master

ND_Score

Note

If near-duplicate fields are added to a native export, and the near-duplicate analysis results are out-of-date, the export will generate an error indicating that the near-duplicate analysis needs to be ran before the export can proceed.

Note

If a master document in a set of near-duplicate documents is deleted, all documents assigned to the near-duplicate set are released, and will be reassigned the next time the near-duplicate analysis process runs.

 

Email Thread Fields

After running the email thread analysis in CloudNine™ Explore, the following fields are populated with the applicable email thread data for the case's documents and are available for native and native subset exports in CloudNine™ Explore. For exports, the fields are located under the Email\Threading category in the Select Export Fields dialog box. The fields are as follows:

ConversationID

ConversationIndex

ConversationTopic

ET_Inclusive

ET_InclusiveReason

ET_Indent

ET_IsMessage

ET_MessageId

ET_ParentId

ET_ThreadId

ET_ThreadIndex

ET_ThreadSize

ET_ThreadSort

Note

If email thread fields are added to a native export, and the email thread analysis results are out-of-date, the export will generate an error indicating that the email thread analysis needs to be ran before the export can proceed.

 

For more information about these and other CloudNine™ Explore fields, see CloudNine™ Explore Export Field Descriptions.