[ Previous | Next | Table of Contents | Index | Library Home |
Legal |
Search ]
Technical Reference: Base Operating System and Extensions , Volume 2
Compares the null-terminated
string specified by the value of the String parameter against the
compiled basic or extended regular expression Preg, which must have
previously been compiled by a call to the regcomp
subroutine.
Standard C Library
(libc. a)
#include <regex.h>
int regexec (Preg, String, NMatch, PMatch, EFlags)
const regex_t * Preg;
const char * String;
size_t NMatch;
regmatch_t * PMatch;
int EFlags;
The regexec subroutine
compares the null-terminated string in the String parameter with
the compiled basic or extended regular expression in the Preg
parameter initialized by a previous call to the regcomp
subroutine. If a match is found, the regexec subroutine
returns a value of 0. The regexec subroutine returns a
nonzero value if it finds no match or it finds an error.
If the NMatch
parameter has a value of 0, or if the REG_NOSUB flag was set on the
call to the regcomp subroutine, the regexec subroutine
ignores the PMatch parameter. Otherwise, the
PMatch parameter points to an array of at least the number of
elements specified by the NMatch parameter. The
regexec subroutine fills in the elements of the array pointed to by
the PMatch parameter with offsets of the substrings of the
String parameter. The offsets correspond to the parenthetic
subexpressions of the original pattern parameter that was specified
to the regcomp subroutine.
The
pmatch.rm_so structure is the byte offset of the beginning
of the substring, and the pmatch.rm_eo structure is one
greater than the byte offset of the end of the substring. Subexpression
i begins at the i th matched open parenthesis, counting
from 1. The 0 element of the array corresponds to the entire
pattern. Unused elements of the PMatch parameter, up to the
value PMatch[NMatch-1], are filled with -1. If
more than the number of subexpressions specified by the NMatch
parameter (the pattern parameter itself counts as a subexpression),
only the first NMatch-1 subexpressions are recorded.
When a basic or extended regular
expression is being matched, any given parenthetic subexpression of the
pattern parameter might match several different substrings of the
String parameter. Otherwise, it might not match any
substring even though the pattern as a whole did match.
The following rules are used to
determine which substrings to report in the PMatch parameter when
regular expressions are matched:
- If a subexpression in a
regular expression participated in the match several times, the offset of the
last matching substring is reported in the PMatch parameter.
- If a subexpression did not
participate in a match, the byte offset in the PMatch parameter is
a value of -1. A subexpression does not participate in a match if any
of the following are true:
- An * (asterisk) or \{\}
(backslash, left brace, backslash, right brace) appears immediately after the
subexpression in a basic regular expression.
- An * (asterisk), ?
(question mark), or { } (left and right braces) appears immediately after the
subexpression in an extended regular expression and the subexpression did not
match (matched 0 times).
- A | (pipe) is used in an
extended regular expression to select either the subexpression that
didn't match or another subexpression, and the other subexpression
matched.
- If a subexpression is
contained in a subexpression, the data in the PMatch parameter
refers to the last such subexpression.
- If a subexpression is
contained in a subexpression and the byte offsets in the PMatch
parameter have a value of -1, the pointers in the PMatch parameter
also have a value of -1.
- If a subexpression matched a
zero-length string, the offsets in the PMatch parameter refer to
the byte immediately following the matching string.
If the REG_NOSUB flag
was set in the cflags parameter in the call to the
regcomp subroutine, and the NMatch parameter is not
equal to 0 in the call to the regexec subroutine, the content of
the PMatch array is unspecified.
If the REG_NEWLINE
flag was not set in the cflags parameter when the
regcomp subroutine was called, then a new-line character in the
pattern or String parameter is treated as an ordinary
character. If the REG_NEWLINE flag was set when the
regcomp subroutine was called, the new-line character is treated as
an ordinary character except as follows:
- A new-line character in the
String parameter is not matched by a period outside of a bracket
expression or by any form of a nonmatching list. A nonmatching list
expression begins with a ^ (circumflex) and specifies a list that matches any
character or collating element and the expression in the list after the
leading caret. For example, the regular expression [^abc]
matches any character except a, b, or
c. The circumflex has this special meaning only when it is
the first character in the list, immediately following the left
bracket.
- A ^ (circumflex) in the
pattern parameter, when used to specify expression anchoring,
matches the zero-length string immediately after a new-line character in the
String parameter, regardless of the setting of the
REG_NOTBOL flag.
- A $ (dollar sign) in the
pattern parameter, when used to specify expression anchoring,
matches the zero-length string immediately before a new-line character in the
String parameter, regardless of the setting of the
REG_NOTEOL flag.
Preg
| Contains the compiled basic or extended regular expression to compare
against the String parameter.
|
String
| Contains the data to be matched.
|
NMatch
| Contains the number of subexpressions to match.
|
PMatch
| Contains the array of offsets into the String parameter that
match the corresponding subexpression in the Preg parameter.
|
EFlags
| Contains the bitwise inclusive OR of 0 or more of the flags controlling
the behavior of the regexec subroutine capable of
customizing.
The EFlags parameter modifies the interpretation of the contents
of the String parameter. It is the bitwise inclusive OR of 0
or more of the following flags, which are defined in the
regex.h file:
- REG_NOTBOL
- The first character of the string pointed to by the String
parameter is not the beginning of the line. Therefore, the ^
(circumflex), when used as a special character, does not match the beginning
of the String parameter.
- REG_NOTEOL
- The last character of the string pointed to by the String
parameter is not the end of the line. Therefore, the $ (dollar sign),
when used as a special character, does not match the end of the
String parameter.
|
On successful completion, the
regexec subroutine returns a value of 0 to indicate that the
contents of the String parameter matched the contents of the
pattern parameter, or to indicate that no match occurred.
The REG_NOMATCH error is defined in the regex.h
file.
If the regexec
subroutine is unsuccessful, it returns a nonzero value indicating the type of
problem. The following macros for possible error codes that can be
returned are defined in the regex.h file:
REG_NOMATCH
| Indicates the basic or extended regular expression was unable to find a
match.
|
REG_BADPAT
| Indicates a basic or extended regular expression that is not
valid.
|
REG_ECOLLATE
| Indicates a collating element referenced that is not valid.
|
REG_ECTYPE
| Indicates a character class-type reference that is not valid.
|
REG_EESCAPE
| Indicates a trailing \ (backslash) in the pattern.
|
REG_ESUBREG
| Indicates a number in \digit is not valid or is in
error.
|
REG_EBRACK
| Indicates a [ ] (left and right brackets) imbalance.
|
REG_EPAREN
| Indicates a \ ( \ ) (backslash, left parenthesis, backslash, right
parenthesis) or ( ) (left and right parentheses) imbalance.
|
REG_EBRACE
| Indicates a \ { \ } (backslash, left brace, backslash, right brace)
imbalance.
|
REG_BADBR
| Indicates the content of \ { \ } (backslash, left brace, backslash, right
brace) is unusable (not a number, number too large, more than two numbers, or
first number larger than second).
|
REG_ERANGE
| Indicates an unusable end point in range expression.
|
REG_ESPACE
| Indicates out of memory.
|
REG_BADRPT
| Indicates a ? (question mark), * (asterisk), or + (plus sign) not
preceded by valid basic or extended regular expression.
|
If the value of the
Preg parameter to the regexec subroutine is not a
compiled basic or extended regular expression returned by the
regcomp subroutine, the result is undefined.
The following example
demonstrates how the REG_NOTBOL flag can be used with the
regexec subroutine to find all substrings in a line that match a
pattern supplied by a user. (For simplicity, very little error-checking
is done in this example.)
(void) regcomp (&re, pattern, 0) ;
/* this call to regexec finds the first match on the line */
error = regexec (&re, &buffer[0], 1, &pm, 0) ;
while (error = = 0) { /* while matches found */
<subString found between pm.r._sp and pm.rm_ep>
/* This call to regexec finds the next match */
error = regexec (&re, pm.rm_ep, 1, &pm, REG_NOTBOL) ;
This subroutine is part of Base
Operating System (BOS) Runtime.
The regcomp (regcomp Subroutine) subroutine, regerror (regerror Subroutine) subroutine, regfree (regfree Subroutine) subroutine.
Subroutines
Overview and Understanding Internationalized Regular
Expression Subroutines in AIX 5L Version 5.1 General
Programming Concepts: Writing and Debugging Programs.
[ Previous | Next | Table of Contents | Index |
Library Home |
Legal |
Search ]