[ Bottom of Page | Previous Page | Next Page | Contents | Index | Library Home |
Legal |
Search ]
Technical Reference: Base Operating System and Extensions, Volume 2
regexec Subroutine
Purpose
Compares the null-terminated string specified by the
value of the String parameter against the compiled
basic or extended regular expression Preg, which must
have previously been compiled by a call to the regcomp
subroutine.
Library
Standard C Library (libc. a)
Syntax
#include <regex.h>
int regexec (Preg, String, NMatch, PMatch, EFlags)
const regex_t * Preg;
const char * String;
size_t NMatch;
regmatch_t * PMatch;
int EFlags;
Description
The regexec subroutine compares
the null-terminated string in the String parameter
with the compiled basic or extended regular expression in the Preg parameter initialized by a previous call to the regcomp subroutine. If a match is found, the regexec subroutine returns a value of 0. The regexec subroutine
returns a nonzero value if it finds no match or it finds an error.
If the NMatch parameter has
a value of 0, or if the REG_NOSUB flag was set on the
call to the regcomp subroutine, the regexec subroutine ignores the PMatch parameter.
Otherwise, the PMatch parameter points to an array
of at least the number of elements specified by the NMatch parameter. The regexec subroutine fills in the
elements of the array pointed to by the PMatch parameter
with offsets of the substrings of the String parameter.
The offsets correspond to the parenthetic subexpressions of the original pattern parameter that was specified to the regcomp subroutine.
The pmatch.rm_so structure is
the byte offset of the beginning of the substring, and the pmatch.rm_eo structure is one greater than the byte offset of the end
of the substring. Subexpression i begins at the i th matched open parenthesis, counting from 1. The 0
element of the array corresponds to the entire pattern. Unused elements of
the PMatch parameter, up to the value PMatch[NMatch-1], are filled with -1. If more
than the number of subexpressions specified by the NMatch parameter (the pattern parameter itself counts
as a subexpression), only the first NMatch-1 subexpressions
are recorded.
When a basic or extended regular expression is being
matched, any given parenthetic subexpression of the pattern parameter might match several different substrings of the String parameter. Otherwise, it might not match any substring even though
the pattern as a whole did match.
The following rules are used to determine which substrings
to report in the PMatch parameter when regular expressions
are matched:
- If a subexpression in a regular expression participated
in the match several times, the offset of the last matching substring is
reported in the PMatch parameter.
- If a subexpression did not participate in a match,
the byte offset in the PMatch parameter is a value
of -1. A subexpression does not participate in a match if any of the following
are true:
- An * (asterisk) or \{\} (backslash, left
brace, backslash, right brace) appears immediately after the subexpression
in a basic regular expression.
- An * (asterisk), ? (question mark), or {
} (left and right braces) appears immediately after the subexpression in an
extended regular expression and the subexpression did not match (matched 0
times).
- A | (pipe) is used in an extended regular
expression to select either the subexpression that didn't match or another
subexpression, and the other subexpression matched.
- If a subexpression is contained in a subexpression,
the data in the PMatch parameter refers to the last
such subexpression.
- If a subexpression is contained in a subexpression
and the byte offsets in the PMatch parameter have
a value of -1, the pointers in the PMatch parameter
also have a value of -1.
- If a subexpression matched a zero-length string,
the offsets in the PMatch parameter refer to the byte
immediately following the matching string.
If the REG_NOSUB flag was set
in the cflags parameter in the call to the regcomp subroutine, and the NMatch parameter
is not equal to 0 in the call to the regexec subroutine,
the content of the PMatch array is unspecified.
If the REG_NEWLINE flag was
not set in the cflags parameter when the regcomp subroutine was called, then a new-line character in the pattern or String parameter is treated as an
ordinary character. If the REG_NEWLINE flag was set
when the regcomp subroutine was called, the new-line
character is treated as an ordinary character except as follows:
- A new-line character in the String parameter is not matched by a period outside of a bracket expression
or by any form of a nonmatching list. A nonmatching list expression begins
with a ^ (circumflex) and specifies a list that matches any character or collating
element and the expression in the list after the leading caret. For example,
the regular expression [^abc] matches any character
except a, b, or c. The circumflex has this special meaning only when
it is the first character in the list, immediately following the left bracket.
- A ^ (circumflex) in the pattern parameter, when used to specify expression anchoring, matches the zero-length
string immediately after a new-line character in the String parameter, regardless of the setting of the REG_NOTBOL flag.
- A $ (dollar sign) in the pattern parameter, when used to specify expression anchoring, matches the zero-length
string immediately before a new-line character in the String parameter, regardless of the setting of the REG_NOTEOL flag.
Parameters
Preg |
Contains the compiled basic or extended regular expression to compare
against the String parameter. |
String |
Contains the data to be matched. |
NMatch |
Contains the number of subexpressions to match. |
PMatch |
Contains the array of offsets into the String parameter that match the corresponding subexpression in the Preg parameter. |
EFlags |
Contains the bitwise inclusive OR of 0 or more of the flags controlling
the behavior of the regexec subroutine capable of customizing.
The EFlags parameter modifies the interpretation of
the contents of the String parameter. It is the bitwise
inclusive OR of 0 or more of the following flags, which are defined in the regex.h file:
- REG_NOTBOL
- The first character of the string pointed to by the String parameter is not the beginning of the line. Therefore, the ^ (circumflex),
when used as a special character, does not match the beginning of the String parameter.
- REG_NOTEOL
- The last character of the string pointed to by the String parameter is not the end of the line. Therefore, the $ (dollar
sign), when used as a special character, does not match the end of the String parameter.
|
Return Values
On successful completion, the regexec subroutine returns a value of 0 to indicate that the contents of the String parameter matched the contents of the pattern parameter, or to indicate that no match occurred. The REG_NOMATCH error is defined in the regex.h file.
Error Codes
If the regexec subroutine is
unsuccessful, it returns a nonzero value indicating the type of problem. The
following macros for possible error codes that can be returned are defined
in the regex.h file:
REG_NOMATCH |
Indicates the basic or extended regular expression was unable to
find a match. |
REG_BADPAT |
Indicates a basic or extended regular expression that is not valid. |
REG_ECOLLATE |
Indicates a collating element referenced that is not valid. |
REG_ECTYPE |
Indicates a character class-type reference that is not valid. |
REG_EESCAPE |
Indicates a trailing \ (backslash) in the pattern. |
REG_ESUBREG |
Indicates a number in \digit is not valid
or is in error. |
REG_EBRACK |
Indicates a [ ] (left and right brackets) imbalance. |
REG_EPAREN |
Indicates a \ ( \ ) (backslash, left parenthesis, backslash, right
parenthesis) or ( ) (left and right parentheses) imbalance. |
REG_EBRACE |
Indicates a \ { \ } (backslash, left brace, backslash, right brace)
imbalance. |
REG_BADBR |
Indicates the content of \ { \ } (backslash, left brace, backslash,
right brace) is unusable (not a number, number too large, more than two numbers,
or first number larger than second). |
REG_ERANGE |
Indicates an unusable end point in range expression. |
REG_ESPACE |
Indicates out of memory. |
REG_BADRPT |
Indicates a ? (question mark), * (asterisk), or + (plus sign) not
preceded by valid basic or extended regular expression. |
If the value of the Preg parameter
to the regexec subroutine is not a compiled basic or
extended regular expression returned by the regcomp
subroutine, the result is undefined.
Examples
The following example demonstrates how the REG_NOTBOL flag can be used with the regexec subroutine
to find all substrings in a line that match a pattern supplied by a user.
(For simplicity, very little error-checking is done in this example.)
(void) regcomp (&re, pattern, 0) ;
/* this call to regexec finds the first match on the line */
error = regexec (&re, &buffer[0], 1, &pm, 0) ;
while (error = = 0) { /* while matches found */
<subString found between pm.r._sp and pm.rm_ep>
/* This call to regexec finds the next match */
error = regexec (&re, pm.rm_ep, 1, &pm, REG_NOTBOL) ;
Related Information
The regcomp (regcomp Subroutine)
subroutine, regerror (regerror Subroutine) subroutine, regfree (regfree Subroutine) subroutine.
Subroutines Overview
and Understanding Internationalized Regular Expression
Subroutines in AIX 5L Version 5.2 General Programming Concepts: Writing and Debugging Programs.
[ Top of Page | Previous Page | Next Page | Contents | Index | Library Home |
Legal |
Search ]