<< Click to Display Table of Contents >> Navigation: Using CloudNine LAW > Deduplication, Near-Duplicate & Email Threading |
Deduplication is the process of scanning all parent documents within the Case Database and flagging any duplicates (identical copies). This is done by subjecting documents to a hashing process, which yields unique numerical (hash) values for each document. Documents yielding identical hashes are flagged as duplicates within case records. This process typically takes place during import via Turbo Import or ED Loader, but can otherwise be performed later with one of the following utilities:
For internally deduplicating documents within a single Case File, use the Deduplication Utility.
For externally deduplicating documents across multiple Case Files, use the Inter-Case Deduplication Utility.
Deduplication is only performed on parent-level documents, so attachments will always inherit the DupStatus of their parent item.
Once Deduplication has been performed, LAW updates the following Metadata Fields for all records in the Case Directory:
•DupStatus - Indicates the duplicate status, with one of the following character values: oU - "Untested"; This document has not yet been scanned for duplicates. oN - "None"; No duplicates have been identified for this document. oP - "Parent"; One or more other documents have been identified as duplicates of this document. oG - "Global"; This document is a global-level duplicate. oC - "Custodian"; This document is a Custodian-level duplicate. •_DupID - Used to identify and group duplicate records. "Global" and "Custodian" level duplicates will have their "Parent" ID displayed here for reference. "Parent" level duplicates are assigned an ID based on how deduplication was performed. Documents without duplicates (or yet to be tested) will display a 0 for their ID. |
•DupMethod - Indicates the type of encryption (hashing) used for scanning documents, with one of the following values: o1 - MD5 hashing was used. o2 - SHA-1 hashing was used. o129 - MD5 hashing was used via the Inter-Case Deduplication Utility. o130 - SHA-1 hashing was used via the Inter-Case Deduplication Utility. •MD5Hash - MD5 hash values are stored here, where applicable. •Sha1Hash - SHA-1 hash values are stored here, where applicable. |
Additionally, the following Metadata Fields can be populated from the Menu of the Main User Interface by selecting Tools > Apply Duplicate Relationships:
NOTICE: These fields are automatically populated when deduplicating via Turbo Import.
For P ("Parent" level) duplicates: •DupCustNames - Displays all Custodians associated with duplicates of this record. •DupCustPaths - Displays the file path for each duplicate of this record. |
For G ("Global" level) and C ("Custodian" level) duplicates: •DupParentName - Displays the Custodian associated with the "Parent" duplicate record. •DupParentPath - Displays the file path for the "Parent" duplicate record. |