Near duplicates are documents that have duplicate content, but are not necessarily exact duplicates of each other. Email threads are email messages belonging to the same email conversation thread. An email thread includes the original email message, all of the subsequent replies pertaining to the original email, and any attachments to the email messages. In CloudNine™ LAW, you can run the Near-Duplicate & Email Thread Analysis utility to identify the near-duplicate documents and email threads within a case. The Near-Duplicate & Email Thread Analysis utility uses the CloudNine Near Dupe and CloudNine Email Thread engines.
The Near-Duplicate & Email Thread Analysis utility can only run on SQL LAW cases that have been enabled for electronic discovery. The utility is not supported for Access cases. |
The Near-Duplicate & Email Thread Analysis utility can only run on one machine at a time. |
In order to use the Near-Duplicate & Email Thread Analysis utility, you will need the Near-Duplicate/Email Thread license. You will also need the license to see the near-duplicate results, status, and compare near-duplicate documents from the Duplicate Viewer. The Near-Duplicate & Email Thread license is only used by LAW on an as-needed basis, so the license is only displayed in the in the LAW Profile Manager (Administration Mode). The license is not displayed in the license list in the LAW Management Console (LMC) or License Information dialog box (Help > About LAW > Licenses).
After running the near-duplicate analysis, you can view the near-duplicate results and compare near-duplicate documents from the Near-Duplicates tab in the Duplicate Viewer dialog box.
The Duplicate Viewer indicates the case's near duplication status, the number of near-duplicate documents associated with the selected document, and the percentage of similarity between the near-duplicate documents and the original "master" document. Near-Duplicate & Email Thread Analysis identifies documents with redundant text to help ensure you are only processing documents essential to the case. The Near-Duplicate & Email Thread Analysis results include near-duplicate documents without text. For email messages, the Near-Duplicate & Email Thread Analysis only compares the email message body content between email messages. It does not compare email headers or quoted prior messages. In near duplication, a family is a collection of one or more near-duplicate documents. The near-duplication process assigns a master document to each family. The master document is the document most representative of the document family. Within a family, the master document is the one with the greatest overall similarity to all the other documents in the family. It has to be a near-duplicate of every other document in the family. The Family Threshold setting in the Near-Duplicate & Email Thread Analysis utility determines the percentage of similarity between the master document and the documents within its document family. A document cluster contains documents more loosely related to each other than a document family. The documents in a cluster contain similar content. How similarly the documents are related is determined by the Cluster Threshold setting. Documents assigned to a document cluster will have a percentage of similarity at or above the current Cluster Threshold setting in the Near-Duplicate & Email Thread Analysis utility. Documents which have no direct relationship to each other might still belong to the same cluster, as long as there is some chain of relationships at or above the Cluster Threshold setting that connects them. There is no limit on how long this chain can be, so two documents in the same cluster may be very distant cousins with no apparent similarity. You can also view near-duplicate information for the case by reviewing the data in the near-duplicate fields created and populated by the utility. You can review the near-duplicate fields for individual case documents on the Index tab in the main LAW window or review near-duplicate fields for multiple documents in the grid view. For more information, see Viewing Near Duplicates.
|
After running the near-duplicate analysis, 7 near-duplicate fields are automatically added to the case's field list and populated with the applicable near-duplicate data for the case's documents. Near-Duplicate Fields: •ND_ClusterID •ND_ContentHash •ND_FamilyID •ND_IsMaster •ND_ResultSet •ND_Similarity •ND_Sort
For more information about these and other LAW fields, see Field Descriptions.
|
After running the email thread analysis, you can view email thread information for the case by reviewing the data in the email thread fields created and populated by the utility. You can review the email thread fields for individuals case documents on the Index tab in the main LAW window or review email thread fields for multiple documents in the grid view.
For more information, see Viewing Email Threads.
|
After running the email thread analysis, 13 email thread fields are automatically added to the case's field list and populated with the applicable email thread data for the case's documents. Email Thread Fields: •ET_Conversants •ET_Inclusive •ET_InclusiveReason •ET_Indent •ET_IsMessage •ET_MessageId •ET_MetaUpdate •ET_ParentId •ET_ThreadId •ET_ThreadIndex •ET_ThreadModified •ET_ThreadSize •ET_ThreadSort
For more information about these and other LAW fields, see Field Descriptions.
|