Basic Searching

<< Click to Display Table of Contents >>

Navigation:  Document Review >

Basic Searching

Concordance Desktop search tools sort through vast amounts of information quickly using two search methodologies:

Full-text Searching - Full-text searching is fast and produces results quickly because it searches for words based on index entries created when building the database. The index provides directions to words in the database dictionary and records are gathered for you to review based on search criteria. For more information, see Running full-text searches.

Relational Searches - Relational searches scan every word of every record in a database and take longer than full-text searches, because Concordance Desktop must read every record to locate your entries and does not use an index. Relational searches are great for locating dates and numbers, whereas full-text searches are optimal for word searches. For more information, see Running relational searches.

When your Concordance Desktop administrator builds your record databases, the information is indexed for full-text searching capabilities. Once a record collection is indexed, a dictionary is created with additional filters that weed out unnecessary words, common English words, and even some punctuation, to improve search processing time. When reviewers make edits, redactions, or add new records to the database, the administrator must perform a reindex in Concordance Desktop to update your record collection.

All of this processing affects how up-to-date your full-text searches are on a database. However, this does not affect relational searches, because relational searches run off the entire database, not just the database index.

Administrators must perform indexing when reviewers are not working in Concordance Desktop, but a reindex process can be performed while users are working in a database.

To better understand full-text searching, it is important to understand the following functionality used in full-text searching:

Indexing and Reindexing

When Concordance Desktop databases are built, the index and dictionary are generated from your document contents. The index contains directions to every word or character string in the database. The dictionary contains a list of every word or string of characters in your record collection, except words, punctuation, and/or field content your administrator specifically excludes from the dictionary.

A word is simply defined as any string of letters and/or numbers. A word does not have to be in a published dictionary to qualify as a word in a Concordance Desktop dictionary.

Example: airforce1 or ABC000001

“Airforce1” qualifies as an index entry, including both letters and numbers. Spaces between characters disqualify it as a whole word; space between airforce and 1 would be read as two words. The same is true for ABC000001, a numbering scheme that is searchable and considered a word by Concordance Desktop.

In Concordance Desktop, we refer to an index update as a reindex. Reindexing updates the database dictionary and its corresponding index. Actively used databases need frequent reindexing to keep the index and dictionary entries current for searches.  

Administrators are usually responsible for indexing or reindexing databases due to the sensitivity involved in running these processes. Concordance Desktop administrators must ensure that all data is reviewed and proper backups are made before indexing or reindexing a database.

When updates have been made to a database and your database dictionary is no longer current with the latest updates, Concordance Desktop indicates that the database needs reindexing. On the file menu a check mark is displayed next to the Reindex command and Needs Reindexing is displayed in the Databases task pane under Current Database. If you do not have reindexing privileges, the Reindex command is not visible on the File menu.

 

Stopwords

Concordance Desktop comes with a pre-defined list of stopwords for each database. Stopwords are words that are automatically excluded from a database's index. The stopwords list includes the most common words in the English language (for example: and, but, is, and the). Stopwords are words you would generally not search for. Eliminating these words from the index ensures that searches run faster and more efficiently. Stopwords lists can be modified by your database administrator. If you need a copy of a database's stopwords list, your administrator can print it out for you.

The stopwords list is maintained in the Stopwords dialog box. To access the Stopwords dialog box, on the File menu, point to Dictionaries and click Stopwords.

Punctuation

For full-text searching, punctuation is not indexed. Concordance Desktop treats punctuation symbols as spaces when full-text searches are conducted. All punctuation is ignored, such as periods or quotations, as well as any symbols like currency and percentage. All leading and trailing punctuation for words are also ignored.

There are two exceptions to this rule, and both must be true:

Punctuation is entered in Concordance Desktop by your administrator to be indexed.

Punctuation is embedded in a string of words.

Select File>Properties dialog to see the punctuation that is full-text searchable in a database (when it's embedded in a string of characters).

Punctuation dialog

Examples of Embedded Punctuation:

AOL.COM and NETSCAPE/AOL – Allows you to search on these terms because the entries relate to a case regarding Internet browsers.

D’Arcangelo – The apostrophe is embedded because of the surname spelling. People often have possessive apostrophes embedded in their names and are added to the list because they are searched often.

john.smith@organization.com – The period and ampersand are both examples of embedded punctuation within an email address.

Contact your Concordance Desktop administrator if you have questions, need a listing of the punctuation used in your databases, or would like to modify the punctuation list.

Dictionary

In Concordance Desktop, a word is any string of characters; a word can be a series of numbers or a combination of letters, numbers and even punctuation or symbols. Familiarize yourself with your database dictionary by reviewing or printing the entries added to help streamline your search queries.

The Properties dialog box provides a quick review of the number of words included in your dictionary, and also notes punctuation that is indexed. Punctuation is entered by your database administrator. You can only search on punctuation when listed in Properties dialog box and it’s embedded in the word.

Go to File>Dictionaries>Database Dictionary.  The Dictionary dialog box provides a complete listing of all words included, the number of documents it appears in, and how many word hits there are.

You can also access the Dictionary dialog box from the Search task pane by selecting Display the database dictionary under Options in the Advanced Search panel. This allows you to select words from the dictionary and add them directly into your search logic in the Advanced Search panel. You can also verify whether the word you are searching is included in the dictionary. If it isn’t, then you may want to try another spelling or try a relational search.