Text Manipulation

<< Click to Display Table of Contents >>

Navigation:  Concordance Programming Fundamentals > Using Common CPL Functions >

Text Manipulation

There are several built-in functions you can use to help manipulate text data in fields. Two of the more popular functions are match and substr.

Match function

The match function searches for a text string within another text string. You can use it search for a specific phrase within a database field. The syntax for the match function is as follows:

int match(text target, search; int offset, length);

The following table describes the parameter values.

Parameter

Description

target

The line of text to search in, such as a text variable or text field.

search

The text to search for.

offset

The offset into the target to start looking in.

length

Optional. The length of text after the offset to search in. If not set, the function will search from the offset until the end of the string.

Match returns the offset of the first character of the search string in the target string. Partial matches do not count with the match function. If there was no match, the function returns a value of 0. Note that match is case-sensitive.

For example, suppose you have the following target string:

targetString = “The quick brown fox jumped over the lazy dog.”;

And had a search string such as:

searchString = “quick”;

The match function would return a value of 5 since the searchString appears at the 5th character into the targetString.

The following example counts how many times "lazy dog" appears in an OCR field.

main()

{
   int db, n, i;

 

   cycle(db) 

   {

      i = match(db->OCR, "lazy dog", 1);        

      while (i <> 0)  

      {

         n = n + 1;

         i = match(db->OCR, "lazy dog", i + 1);

      }

   }

}

At the end of executing the cycle-loop, n equals the number of occurrences of the phrase “lazy dog.” Note that you exit the while loop when the value i equals 0. This indicates that the match function did not find anything. Remember to increment the starting offset by one or else you may be caught in an infinite-loop.

 

Substr function

The substr function extracts a piece of text from another string. It makes a copy of this text and returns it as a text variable. The format for this function is as follows:

text substr(text string; int from, width);

Parameters

Description

string

The string to extract data from. Can be a text variable or database field. Note that to specify a database field, use the pointer notation (db->OCR).

from

The starting point from where to extract the text. This is a 1-based offset that starts from the first character.

width

How many character to extract, starting at the beginning offset.

Substr returns the subset of the specified string.

For example, the following code sample extracts a subset from a string.

main()

{

   text targetString, myString;

 

   targetString = "The quick brown fox";

 

   myString = substr(targetString, 11, 5);

}

After executing the substr function, the text variable, myString, contains the value, “brown.”