The Inter-Case Deduplication utility installs with CloudNine™ LAW but is run externally from CloudNine™ LAW. You use it to deduplicate cases against other cases. This type of deduplication is intended for cases that are already populated with documents imported via ED Loader or Turbo Import. If documents are added to the cases after the inter-case deduplication process, they can be deduplicated against the other records as well.
Inter-Case Deduplication is not designed for a combination of Turbo Import cases and Electronic Discovery cases. Turbo Import cases are hashed differently than Electronic Discovery. Turbo Import cases can be deduplicated against other Turbo Import cases. ED Loader cases can be deduplicated against other Ed Loader cases. |
Note the following facts when considering using the Inter-Case Deduplication Utility:
•After a case is deduplicated using the Inter-Case Deduplication utility, the case should not be deduplicated using the internal Deduplication Utility, and incoming documents should not be deduplicated using ED loader.
Doing so can lead to issues with purging, reviewing, and filtering duplicate records caused by a mixture of internal and external duplicates existing in the case. See the Deduplication and/or Deduplication Utility topics for more information about these risks and associated warnings.
•In addition to creating a database for each case, LAW also creates a database external to the case database. The external deduplication database can be created as a Microsoft SQL Server or Access database (*.MDB file).
The purpose of this database is to maintain deduplication information. When you use the Inter-Case Deduplication Utility, the database registers each case added. Cases are ready for inter-case deduplication after they have been populated with records via the ED Loader or Turbo Import. Records added to these cases during subsequent imports can also be deduplicated.
It is best practice to use SQL Server databases for deduplication due to the 2 GB limitation of Access databases. When working with large cases, the 2 GB limit can quickly be reached and exceeded. |
•Member cases are LAW cases that are added to the external deduplication database with the intention of deduplicating them against each other.
•All member cases must be ED-enabled.
•Both SQL and Access cases are supported.
1.Click Start, point to All Programs, and then click CloudNine™ LAW. 2.Click Inter-Case Deduplication Utility. The Inter-Case Deduplication dialog box will appear.
|
When you are creating an external deduplication database, you can create the database as a Microsoft SQL Server (.icd file) or Access (.mdb) database.
To create an external SQL deduplication database (recommended): 1.On the File menu click New. The Select Database Type dialog box opens. By default, the SQL Server option is selected. 2.Make sure the SQL Server option is selected, and then click OK. Clicking Accept opens the Server Connection Information dialog box. 3.In the Server Name list, click or type the name of the server where SQL Server is installed. By default, the Use Windows Authentication check box is selected, and the User Name and Password fields are disabled. 4.If you do not want to use Windows authentication, clear the Use Windows Authentication check box, and in the User Name and Password fields, type the applicable SQL Server user name and password. 5.In the Database field, type the name of the new external deduplication database, and then click OK. Clicking OK opens the Save Deduplication Project Configuration dialog box. 6.In the Save Deduplication Project Configuration dialog box, browse to a location accessible to all member cases. The File name field defaults to the database name you entered in the Database field in the Server Connection Information dialog box. The Save as type field defaults to Deduplication Projects (*.icd). 7.Click Save. Clicking Save closes the Save Deduplication Project Configuration dialog box, and in the Inter-Case Deduplication Utility dialog box, the path to the database is now displayed with Mode set to New. When the Mode is set to New all documents in the member cases are deduplicated against each other. 8.Set the deduplication options: Digest - The digest refers to the type of hash that will be used to determine duplicates. The hash values are obtained through metadata fields (e-mail) or by hashing the entire file (e-docs) during the ED Loader or Turbo Import process. Two options are available: •MD5 (128-bit output digest) - This hash value is stored in the MD5Hash field in LAW for each document. •SHA-1 (160-bit output digest) - This hash value is stored in the Sha1Hash field in LAW for each document. Scope - This option refers to the scope in which duplicates are tested. Two options are available: •Global - Records will be deduplicated against all records in all member cases. •Custodian Level - Records will only be deduplicated against others records with the same custodian assigned. Records with no custodian value set will be evaluated globally. 9.Move to the Member Cases area to add LAW cases. These cases will be deduplicated against each other in the inter-case deduplication process. See the "Member Cases" section for details.
To create an external Access deduplication database: 1.On the File menu click New. The Select Database Type dialog box opens. By default, the SQL Server option is selected. 2.In the Select Database Type dialog box, click the Access Database (*.mdb) option, and then click OK. Clicking OK opens the Create Deduplication Database dialog box. 3.In the Create Deduplication Database dialog box, browse to a location accessible to all member cases. The File name field defaults to DuplicateLog, and the Save as type field defaults to Deduplication Database (*.mdb). 4.If you want to change the database name, in the File name field, type the new name. 5.Click Save. Clicking Save closes the Create Deduplication Database dialog box, and in the Inter-Case Deduplication Utility dialog box, the path to the database is now displayed with Mode set to New. When the Mode is set to New all documents in the member cases are deduplicated against each other. 6.Set the deduplication options: Digest - The digest refers to the type of hash that will be used to determine duplicates. The hash values are obtained through metadata fields (e-mail) or by hashing the entire file (e-docs) during the ED Loader or Turbo Import process. Two options are available: •MD5 (128-bit output digest) - This hash value is stored in the MD5Hash field in LAW for each document. •SHA-1 (160-bit output digest) - This hash value is stored in the Sha1Hash field in LAW for each document. Scope - This option refers to the scope in which duplicates are tested. Two options are available: •Global - Records will be deduplicated against all records in all member cases. •Custodian Level - Records will only be deduplicated against others records with the same custodian assigned. Records with no custodian value set will be evaluated globally. 7.Move to the Member Cases area to add LAW cases. These cases will be deduplicated against each other in the inter-case deduplication process. See the "Member Cases" section for details.
|
You can open an existing database from a previous deduplication session. 1.Click the ellipsis button that is located to the left of the New button. Or on the File menu click Open. 2.Browse to and select the external Access deduplication database (*.mdb) or project configuration file for SQL (*.icd), and then click Open. If the database has already been used in the inter-case deduplication process, the Mode will change to Resume/Append.
|
1.Click the Add button. Or on the Case menu click Add. 2.Browse to and select the project.ini file for the case, located in the root of the LAW case folder. 3.Click Open. Cases can also be added to the list by dragging and dropping one or more root case folders (folder containing the project.ini) or the project.ini file itself into the Member Cases grid. Once added, the case names and paths will be listed in the grid and the number in parentheses beside Member Cases will increment by one for each added case. Since the cases are validated at the time they are added to the grid, an error will occur at this time if the case does not meet the requirements for inter-case deduplication, such as the case not being ED-enabled. Once the member cases have been added, the Up and Dn keys (or Case > Move Up | Move Down) may be used to specify the order in which records are deduplicated. The first case shown in the grid will be processed first, the second case is next, and so on. These buttons can only be used in New or Rebuild/Flush mode. The Remove button (or Case > Remove) can be used to remove the selected LAW case from the Member Cases grid. The Clear button (or Case > Clear) will remove all cases from the grid. These buttons can only be used in New or Rebuild/Flush mode.
|
Once the external database has been specified, the options have been set, and the member cases have been added, click Begin to start the inter-case deduplication process. Note the following facts about the Inter-Case Deduplication Progress screen: •The Progress area provides both visual (progress bar) and textual indicators of how much work is remaining and what has been done thus far. •The Errors value is a count of all errors that occurred during processing. The total error count will be displayed in a summary screen once processing has been completed. •The Summary screen will appear once the deduplication process has been completed, canceled, or aborted due to some sort of error condition. For a successful process, the summary will show the Digest and Scope options that were selected and the total number of member cases included in the process. The summary will also list total counts for documents loaded, duplicate records, and errors at a case level. The "Documents Loaded In This Session" value is the total number documents, not including any attachment records, that were loaded in the current deduplication session. The "Duplicates" value will show the total number of duplicates in each listed case and also does not include any attachment records in the count. •If errors occurred during processing, a View Errors button will be included at the bottom of the Summary screen. Click this button to view the error details. The log file that appears is stored in <drive>\Documents and Settings\<user>\Application Data\Law50\LawInterCaseDedup\LawInterCaseDedup.ErrorLog.txt. The Elapsed field displays the amount of time that has passed during the process. •If an error occurs during processing that aborts the deduplication process completely, the Summary screen will appear with a message stating the process was aborted. Similar behavior will occur if the process is cancelled by the user. The Summary screen will appear with a message stating the process was cancelled. •To save the contents of the summary to a text file:
|
LAW contains a number of system fields to store deduplication-related information about each document in a case. The Inter-Case Deduplication utility writes to two of those fields (_DupID and _DupMethod) differently than the ED Loader or Turbo Import deduplication and Deduplication Utility in CloudNine™ LAW. Please see the Deduplication Information topic for a list of these fields, their descriptions, and possible values. Field information for all LAW fields can also be found in the Field Descriptions topic.
|
The Mode indicator shown below the External Deduplication Database path will display one of four possible mode values: •No Database Selected - This mode will be set when no external deduplication database is selected. See the "External Deduplication Database" section above for information on creating and opening existing databases. •New - This mode will be set when the selected external deduplication database has not yet been involved in the inter-case deduplication process. The New mode will occur when a new database is created or when an existing database is selected but has not yet been through the deduplication process. •Resume/Append - This mode will be set when the selected external deduplication database has already been through the deduplication process. When the external database is in this mode, only documents added to member cases since the database's previous deduplication session will be added and deduplicated against each other and the existing documents in the database. When the process is run again on this existing database, the Summary screen will show the total number of documents added in the session (documents loaded in member cases after previous deduplication session) and the total number of duplicates in the cases. •Rebuild/Flush - This mode indicates that the external database was previously in Resume/Append mode, but a change was made to one of the member cases that requires that the external database be rebuilt. Functionally, this mode is the same as the New mode, as the cases will need to be re-deduplicated once this mode has been assigned. A "Click here for 'Rebuild/Flush' details" link will become available beside the mode value. Clicking this link will launch a message box that lists the reasons why the deduplication database was placed in Rebuild/Flush mode. The following actions are possible reasons for this mode or occur: •Cancelling the inter-case deduplication process •Deleting a document from a member case •Renaming, removing, or adding a custodian in a member case when the Custodian Level scope was used •Deduplicating a member case using ED Loader or Turbo Import Deduplication or the internal Deduplication Utility The Refresh button can be used to inspect member cases for changes that may the mode. If changes are detected, the mode will be updated to the correct state.
|
The Duplicate Viewer is a tool that can be used to review records in a case or multiple cases that have been flagged as duplicates by one of LAW's deduplication methods (ED Loader Deduplication, Turbo Import Deduplication, Deduplication Utility, or Inter-Case Deduplication). Please see the Duplicate Viewer topic for details.
|
To apply duplicate relationships to original files (Version 6.17+) |
Once the files in a case have been imported and deduplicated, you can apply duplicate relationships to the original files associated with duplicate files using the Apply Duplicate Relationships command in CloudNine™ LAW. The Apply Duplicate Relationships command populates the following fields for the original files that have duplicate files in a case: •DupCustNames •DupCustPaths •DupParentName •DupParentPath The fields indicate the custodian name and location of the duplicate files and the parents of the duplicate files. For more information about these fields, see Field Descriptions. For more information, see Applying Duplicate Relationships.
|