Compiles and matches regular-expression patterns.
Note: Commands use the regcomp, regexec, regfree, and regerror subroutines for the functions described in this article.
#define INIT declarations #define GETC( ) getc_code #define PEEKC( ) peekc_code #define UNGETC(c) ungetc_code #define RETURN(pointer) return_code #define ERROR(val) error_code
#include <regexp.h> #include <NLregexp.h>
char *compile (InString, ExpBuffer, EndBuffer, EndOfFile)
char * ExpBuffer;
char * InString, * EndBuffer;
int EndOfFile;
int step (String, ExpBuffer)
const char * String, *ExpBuffer;
int advance (String, ExpBuffer) const char *String, *ExpBuffer;
The /usr/include/regexp.h file contains subroutines that perform regular-expression pattern matching. Programs that perform regular-expression pattern matching use this source file. Thus, only the regexp.h file needs to be changed to maintain regular expression compatibility between programs.
The interface to this file is
complex. Programs that include this file define the following six
macros before the #include <regexp.h>
statement. These macros are used by the compile
subroutine:
The compile subroutine compiles the regular expression for later use. The InString parameter is never used explicitly by the compile subroutine, but you can use it in your macros. For example, you can use the compile subroutine to pass the string containing the pattern as the InString parameter to compile and use the INIT macro to set a pointer to the beginning of this string. The example in the Examples section uses this technique. If your macros do not use InString, then call compile with a value of ((char *) 0) for this parameter.
The ExpBuffer parameter points to a character array where the compiled regular expression is to be placed. The EndBuffer parameter points to the location that immediately follows the character array where the compiled regular expression is to be placed. If the compiled expression cannot fit in (EndBuffer-ExpBuffer) bytes, the call ERROR(50) is made.
The EndOfFile parameter is the character that marks the end of the regular expression. For example, in the ed command, this character is usually / (slash).
The regexp.h file defines other subroutines that perform actual regular-expression pattern matching. One of these is the step subroutine.
The String parameter of the step subroutine is a pointer to a null-terminated string of characters to be checked for a match.
The Expbuffer parameter points to the compiled regular expression, obtained by a call to the compile subroutine.
The step subroutine returns the value 1 if the given string matches the pattern, and 0 if it does not match. If it matches, then step also sets two global character pointers: loc1, which points to the first character that matches the pattern, and loc2, which points to the character immediately following the last character that matches the pattern. Thus, if the regular expression matches the entire string, loc1 points to the first character of the String parameter and loc2 points to the null character at the end of the String parameter.
The step subroutine uses the global variable circf, which is set by the compile subroutine if the regular expression begins with a ^ (circumflex). If this variable is set, step only tries to match the regular expression to the beginning of the string. If you compile more than one regular expression before executing the first one, save the value of circf for each compiled expression and set circf to that saved value before each call to step.
Using the same parameters that were passed to it, the step subroutine calls a subroutine named advance. The step function increments through the String parameter and calls the advance subroutine until it returns a 1, indicating a match, or until the end of String is reached. To constrain the String parameter to the beginning of the string in all cases, call the advance subroutine directly instead of calling the step subroutine.
When the advance subroutine encounters an * (asterisk) or a \{ \} sequence in the regular expression, it advances its pointer to the string to be matched as far as possible and recursively calls itself, trying to match the rest of the string to the rest of the regular expression. As long as there is no match, the advance subroutine backs up along the string until it finds a match or reaches the point in the string that initially matched the * or \{ \}. You can stop this backing-up before the initial point in the string is reached. If the locs global character is equal to the point in the string sometime during the backing-up process, the advance subroutine breaks out of the loop that backs up and returns 0. This is used for global substitutions on the whole line so that expressions such as s/y*//g do not loop forever.
The following is an example of the regular expression macros and calls:
#define INIT register char *sp=instring; #define GETC() (*sp++) #define PEEKC() (*sp) #define UNGETC(c) (--sp) #define RETURN(c) return; #define ERROR(c) regerr() #include <regexp.h> . . . compile (patstr,expbuf, &expbuf[ESIZE], '\0'); . . . if (step (linebuf, expbuf)) succeed( ); . . .
These subroutines are part of Base Operating System (BOS) Runtime.
The regcmp or regex subroutine, regcomp subroutine, regerror subroutine, regexec subroutine, regfree subroutine.
List of String Manipulation Services, National Language Support Overview for Programming, Subroutines Overview in AIX 5L Version 5.1 General Programming Concepts: Writing and Debugging Programs.