Checking for Duplicate Records

<< Click to Display Table of Contents >>

Navigation:  Concordance > Concordance Administration > Managing Data >

Checking for Duplicate Records

After importing records, you may want to check for duplicates in Concordance. Duplicates are identified by comparing the content in selected fields. When Concordance finds duplicate records, Concordance tags each of the duplicate records. Give some forethought to what fields you want to use to check for duplicates. Before checking for duplicate records, make sure that there are not any records that were tagged in a previous duplicate check before you start the process again.

When Concordance checks for duplicate records, Concordance only checks the records in the current query. If you want Concordance to check for duplicates in the entire database, be sure to run the Zero Query before checking for duplicates.

The duplication detection categorizes documents in three ways:

When a record is unique, no tag is assigned to the record

The first time a duplicate record appears, the Original tag, or its equivalent, is assigned to the record

The subsequent times the duplicate record appears, the Duplicate tag, or its equivalent, is assigned to the records

It is best practice to remove access to Check for Duplicates on the Tools menu from most users, except your advanced users.

Check for Duplicate Records

1.Run a search query to locate the records you want to search for duplicate records.  When Concordance checks for duplicate records, Concordance only checks the records in the current query. If you want Concordance to check for duplicates in the entire database, be sure to run the Zero Query before checking for duplicates.

2.On the Tools menu, click Check for duplicates.  The Duplicate Detection dialog displays.

DuplicateDetection

3.In the field list, select fields you want Concordance to identify duplicate records in the database's current query.  To select multiple fields, use SHIFT+click or CTRL+click.  You can select as many fields as you want, but there is a limit of 245 characters that can be compared.  Paragraph fields count as 60 characters.

4.Enter a tag to apply to the Original (first) record encountered in a duplicate set, and another tag to apply to the Duplicate (second) record in a duplicate set.  Make sure the tags you use have not already been applied to records from a previous duplicate check.

5.If you selected a paragraph field in the field list, it is best practice to select Only use the first line of each field.

6.Click OK to run the duplicate check.  The Duplicate count updates to show the number of duplicates found and those records will be tagged with the tags you specified.

7.Click Done to exit the Duplicate Detection dialog.

8.You can now query the database for the duplicate tags you specified.

Tallying Duplicates

You can also use the Tally feature to help identify duplicate records in the database. The Tally feature creates an itemized list of data values within a field, including the number of occurrences of each data value in the field.  See Searching by Tally for more information.