that they are only guaranteed to be defined after a
successful match that was executed with the C (preserve) modifier.
The use of these variables incurs no global performance penalty, unlike
their punctuation character equivalents, however at the trade-off that you
have to tell perl when you want to use them.
X X
=head2 Quoting (escaping) metacharacters
To cause a metacharacter to match its literal self, you precede it with
a backslash. Unlike some other regular expression languages, any
sequence consisting of a backslash followed by a non-alphanumeric
matches that non-alphanumeric, literally. So things like C<\\>, C<\(>,
C<\)>, C<\[>, C<\]>, C<\{>, or C<\}> are always interpreted as the
literal character that follows the backslash.
(That's not true when an alphanumeric character is preceded by a
backslash. There are a few such "escape sequences", like C<\w>, which have
special matching behaviors in Perl. All such are currently limited to
ASCII-range alphanumerics.)
The best method to escape metacharacters is to use the
C> function, or the equivalent, but the
more flexible, and often more convenient, C<\Q> metaquoting escape
sequence
quotemeta $pattern;
This changes C<$pattern> so that the metacharacters are quoted. You can
then do
$string =~ s/$pattern/foo/;
and be assured that any metacharacters in C<$pattern> will match their
literal selves. If you instead use C<\Q>, like:
$string =~ s/\Qpattern/foo/;
you don't have to have a separate C<$pattern> variable. Further, there
is an additional escape sequence, C<\E> that can be combined with C<\Q>
to allow you to escape whatever portions of the pattern you desire:
$string =~ s/$unquoted\Q$quoted\E$unquoted/foo/;
Beware that if you put literal backslashes (those not inside
interpolated variables) between C<\Q> and C<\E>, double-quotish
backslash interpolation may lead to confusing results. If you
I to use literal backslashes within C<\Q...\E>,
consult L.
In older code, you may see something like this:
$pattern =~ s/(\W)/\\$1/g;
$string =~ s/$pattern/foo/;
This simply adds backslashes before all non-"word" characters to disable
any special meanings they might have. (If S> is in
effect, the current locale can affect the results.) This paradigm is
inadequate for Unicode.
C and C<\Q> are more fully described in
L.
=head2 Extended Patterns
Perl also defines a consistent extension syntax for features not
found in standard tools like B and
B. The syntax for most of these is a
pair of parentheses with a question mark as the first thing within
the parentheses. The character after the question mark indicates
the extension.
A question mark was chosen for this and for the minimal-matching
construct because 1) question marks are rare in older regular
expressions, and 2) whenever you see one, you should stop and
"question" exactly what is going on. That's psychology....
=over 4
=item C<(?#I)>
X<(?#)>
A comment. The I is ignored.
Note that Perl closes
the comment as soon as it sees a C<")">, so there is no way to put a literal
C<")"> in the comment. The pattern's closing delimiter must be escaped by
a backslash if it appears in the comment.
See L