perl5100delta (1) Linux Manual Page
NAME
perl5100delta – what is new for perl 5.10.0
DESCRIPTION
This document describes the differences between the 5.8.8 release and the 5.10.0 release.
Many of the bug fixes in 5.10.0 were already seen in the 5.8.X maintenance releases; they are not duplicated here and are documented in the set of man pages named perl58[1-8]?delta.
Core Enhancements
The feature pragma
The "feature" pragma is used to enable new syntax that would break Perl’s backwards-compatibility with older releases of the language. It’s a lexical pragma, like "strict" or "warnings".
Currently the following new features are available: "switch" (adds a switch statement), "say" (adds a "say" built-in function), and "state" (adds a "state" keyword for declaring “static” variables). Those features are described in their own sections of this document.
The "feature" pragma is also implicitly loaded when you require a minimal perl version (with the "use VERSION" construct) greater than, or equal to, 5.9.5. See feature for details.
New -E command-line switch
-E is equivalent to -e, but it implicitly enables all optional features (like "use feature ":5.10"").
Defined-or operator
A new operator "//" (defined-or) has been implemented. The following expression:
$a // $b
is merely equivalent to
defined $a ? $a : $b
and the statement
$c //= $d;
can now be used instead of
$c = $d unless defined $c;
The "//" operator has the same precedence and associativity as "||". Special care has been taken to ensure that this operator Do What You Mean while not breaking old code, but some edge cases involving the empty regular expression may now parse differently. See perlop for details.
Switch and Smart Match operator
Perl 5 now has a switch statement. It’s available when "use feature 'switch'" is in effect. This feature introduces three new keywords, "given", "when", and "default":
given($foo)
{
when(/ ^abc /)
{
$abc = 1;
}
when(/ ^def /)
{
$def = 1;
}
when(/ ^xyz /)
{
$xyz = 1;
}
default
{
$nothing = 1;
}
}
A more complete description of how Perl matches the switch variable against the "when" conditions is given in “Switch statements” in perlsyn.
This kind of match is called smart match, and it’s also possible to use it outside of switch statements, via the new "~~" operator. See “Smart matching in detail” in perlsyn.
This feature was contributed by Robin Houston.
Regular expressions
- Recursive Patterns
- It is now possible to write recursive patterns without using the "(??{})" construct. This new way is more efficient, and in many cases easier to read.
Each capturing parenthesis can now be treated as an independent pattern that can be entered by using the "(?PARNO)" syntax ("PARNO" standing for “parenthesis number”). For example, the following pattern will match nested balanced angle brackets:
/ ^ # start of line ( # start capture buffer 1 < # match an opening angle bracket (?: # match one of: (?> # don't backtrack over the inside of this group [^<>]+ # one or more non angle brackets ) # end non backtracking group | # ... or ... (?1) # recurse to bracket 1 and try it again )* # 0 or more times. > # match a closing angle bracket ) # end capture buffer one $ # end of line /xPCRE users should note that Perl’s recursive regex feature allows backtracking into a recursed pattern, whereas in PCRE the recursion is atomic or “possessive” in nature. As in the example above, you can add (?>) to control this selectively. (Yves Orton)
- Named Capture Buffers
- It is now possible to name capturing parenthesis in a pattern and refer to the captured contents by name. The naming syntax is "(?<NAME>....)". It’s possible to backreference to a named buffer with the "\k<NAME>" syntax. In code, the new magical hashes "%+" and "%-" can be used to access the contents of the capture buffers.
Thus, to replace all doubled chars with a single copy, one could write
s/(?<letter>.)\k<letter>/$+{letter}/gOnly buffers with defined contents will be “visible” in the "%+" hash, so it’s possible to do something like
foreach
my $name(keys % +)
{
print “content of buffer ‘$name’ is $+{$name}
”;
}The "%-" hash is a bit more complete, since it will contain array refs holding values from all capture buffers similarly named, if there should be many of them.
"%+" and "%-" are implemented as tied hashes through the new module "Tie::Hash::NamedCapture".
Users exposed to the .NET regex engine will find that the perl implementation differs in that the numerical ordering of the buffers is sequential, and not “unnamed first, then named”. Thus in the pattern
/(A)(?<B>B)(C)(?<D>D)/
$1 will be ‘A’, $2 will be ‘B’, $3 will be ‘C’ and $4 will be ‘D’ and not $1 is ‘A’, $2 is ‘C’ and $3 is ‘B’ and $4 is ‘D’ that a .NET programmer would expect. This is considered a feature. :-) (Yves Orton)
- Possessive Quantifiers
- Perl now supports the “possessive quantifier” syntax of the “atomic match” pattern. Basically a possessive quantifier matches as much as it can and never gives any back. Thus it can be used to control backtracking. The syntax is similar to non-greedy matching, except instead of using a ‘?’ as the modifier the ‘+’ is used. Thus "?+", "*+", "++", "{min,max}+" are now legal quantifiers. (Yves Orton)
- Backtracking control verbs
- The regex engine now supports a number of special-purpose backtrack control verbs: (*THEN), (*PRUNE), (*MARK), (*SKIP), (*COMMIT), (*FAIL) and (*ACCEPT). See perlre for their descriptions. (Yves Orton)
- Relative backreferences
- A new syntax "\g{N}" or "\gN" where “N” is a decimal integer allows a safer form of back-reference notation as well as allowing relative backreferences. This should make it easier to generate and embed patterns that contain backreferences. See “Capture buffers” in perlre. (Yves Orton)
- "\K" escape
- The functionality of Jeff Pinyan’s module Regexp::Keep has been added to the core. In regular expressions you can now use the special escape "\K" as a way to do something like floating length positive lookbehind. It is also useful in substitutions like:
s/(foo)bar/$1/g
that can now be converted to
s/foo\Kbar//g
which is much more efficient. (Yves Orton)
- Vertical and horizontal whitespace, and linebreak
- Regular expressions now recognize the "" and "\h" escapes that match vertical and horizontal whitespace, respectively. "\V" and "\H" logically match their complements.
"\R" matches a generic linebreak, that is, vertical whitespace, plus the multi-character sequence "
". - Optional pre-match and post-match captures with the /p flag
- There is a new flag "/p" for regular expressions. Using this makes the engine preserve a copy of the part of the matched string before the matching substring to the new special variable "${^PREMATCH}", the part after the matching substring to "${^POSTMATCH}", and the matched substring itself to "${^MATCH}".
Perl is still able to store these substrings to the special variables "$`", "$'", $&, but using these variables anywhere in the program adds a penalty to all regular expression matches, whereas if you use the "/p" flag and the new special variables instead, you pay only for the regular expressions where the flag is used.
For more detail on the new variables, see perlvar; for the use of the regular expression flag, see perlop and perlre.
say()
say() is a new built-in, only available when "use feature 'say'" is in effect, that is similar to print(), but that implicitly appends a newline to the printed string. See “say” in perlfunc. (Robin Houston)
Lexical $_
The default variable $_ can now be lexicalized, by declaring it like any other lexical variable, with a simple
my $_;
The operations that default on $_ will use the lexically-scoped version of $_ when it exists, instead of the global $_.
In a "map" or a "grep" block, if $_ was previously my’ed, then the $_ inside the block is lexical as well (and scoped to the block).
In a scope where $_ has been lexicalized, you can still have access to the global version of $_ by using $::_, or, more simply, by overriding the lexical declaration with "our $_". (Rafael Garcia-Suarez)
The _ prototype
A new prototype character has been added. "_" is equivalent to "$" but defaults to $_ if the corresponding argument isn’t supplied (both "$" and "_" denote a scalar). Due to the optional nature of the argument, you can only use it at the end of a prototype, or before a semicolon.
This has a small incompatible consequence: the prototype() function has been adjusted to return "_" for some built-ins in appropriate cases (for example, "prototype('CORE::rmdir')"). (Rafael Garcia-Suarez)
UNITCHECK blocks
"UNITCHECK", a new special code block has been introduced, in addition to "BEGIN", "CHECK", "INIT" and "END".
"CHECK" and "INIT" blocks, while useful for some specialized purposes, are always executed at the transition between the compilation and the execution of the main program, and thus are useless whenever code is loaded at runtime. On the other hand, "UNITCHECK" blocks are executed just after the unit which defined them has been compiled. See perlmod for more information. (Alex Gough)
New Pragma, mro
A new pragma, "mro" (for Method Resolution Order) has been added. It permits to switch, on a per-class basis, the algorithm that perl uses to find inherited methods in case of a multiple inheritance hierarchy. The default MRO hasn’t changed (DFS, for Depth First Search). Another MRO is available: the C3 algorithm. See mro for more information. (Brandon Black)
Note that, due to changes in the implementation of class hierarchy search, code that used to undef the *ISA glob will most probably break. Anyway, undef’ing *ISA had the side-effect of removing the magic on the @ISA array and should not have been done in the first place. Also, the cache *::ISA::CACHE:: no longer exists; to force reset the @ISA cache, you now need to use the "mro" API, or more simply to assign to @ISA (e.g. with "@ISA = @ISA").
readdir() may return a short filename on Windows
The readdir() function may return a “short filename” when the long filename contains characters outside the ANSI codepage. Similarly Cwd::cwd() may return a short directory name, and glob() may return short names as well. On the NTFS file system these short names can always be represented in the ANSI codepage. This will not be true for all other file system drivers; e.g. the FAT filesystem stores short filenames in the OEM codepage, so some files on FAT volumes remain unaccessible through the ANSI APIs.
Similarly, $^X, @INC, and $ENV{PATH} are preprocessed at startup to make sure all paths are valid in the ANSI codepage (if possible).
The Win32::GetLongPathName() function now returns the UTF-8 encoded correct long file name instead of using replacement characters to force the name into the ANSI codepage. The new Win32::GetANSIPathName() function can be used to turn a long pathname into a short one only if the long one cannot be represented in the ANSI codepage.
Many other functions in the "Win32" module have been improved to accept UTF-8 encoded arguments. Please see Win32 for details.
readpipe() is now overridable
The built-in function readpipe() is now overridable. Overriding it permits also to override its operator counterpart, "qx//" (a.k.a. "``"). Moreover, it now defaults to $_ if no argument is provided. (Rafael Garcia-Suarez)
Default argument for readline()
readline() now defaults to *ARGV if no argument is provided. (Rafael Garcia-Suarez)
state() variables
A new class of variables has been introduced. State variables are similar to "my" variables, but are declared with the "state" keyword in place of "my". They’re visible only in their lexical scope, but their value is persistent: unlike "my" variables, they’re not undefined at scope entry, but retain their previous value. (Rafael Garcia-Suarez, Nicholas Clark)
To use state variables, one needs to enable them by using
use feature 'state';
or by using the "-E" command-line switch in one-liners. See “Persistent Private Variables” in perlsub.
Stacked filetest operators
As a new form of syntactic sugar, it’s now possible to stack up filetest operators. You can now write "-f -w -x $file" in a row to mean "-x $file && -w _ && -f _". See “-X” in perlfunc.
UNIVERSAL::DOES()
The "UNIVERSAL" class has a new method, "DOES()". It has been added to solve semantic problems with the "isa()" method. "isa()" checks for inheritance, while "DOES()" has been designed to be overridden when module authors use other types of relations between classes (in addition to inheritance). (chromatic)
See “$obj->DOES( ROLE )” in UNIVERSAL.
Formats
Formats were improved in several ways. A new field, "^*", can be used for variable-width, one-line-at-a-time text. Null characters are now handled correctly in picture lines. Using "@#" and "~~" together will now produce a compile-time error, as those format fields are incompatible. perlform has been improved, and miscellaneous bugs fixed.
Byte-order modifiers for pack() and unpack()
There are two new byte-order modifiers, ">" (big-endian) and "<" (little-endian), that can be appended to most pack() and unpack() template characters and groups to force a certain byte-order for that type or group. See “pack” in perlfunc and perlpacktut for details.
no VERSION
You can now use "no" followed by a version number to specify that you want to use a version of perl older than the specified one.
chdir, chmod and chown on filehandles
"chdir", "chmod" and "chown" can now work on filehandles as well as filenames, if the system supports respectively "fchdir", "fchmod" and "fchown", thanks to a patch provided by Gisle Aas.
OS groups
$( and $) now return groups in the order where the OS returns them, thanks to Gisle Aas. This wasn’t previously the case.
Recursive sort subs
You can now use recursive subroutines with sort(), thanks to Robin Houston.
Exceptions in constant folding
The constant folding routine is now wrapped in an exception handler, and if folding throws an exception (such as attempting to evaluate 0/0), perl now retains the current optree, rather than aborting the whole program. Without this change, programs would not compile if they had expressions that happened to generate exceptions, even though those expressions were in code that could never be reached at runtime. (Nicholas Clark, Dave Mitchell)
Source filters in @INC
It’s possible to enhance the mechanism of subroutine hooks in @INC by adding a source filter on top of the filehandle opened and returned by the hook. This feature was planned a long time ago, but wasn’t quite working until now. See “require” in perlfunc for details. (Nicholas Clark)
New internal variables
- "${^RE_DEBUG_FLAGS}"
- This variable controls what debug flags are in effect for the regular expression engine when running under "use re "debug"". See re for details.
- "${^CHILD_ERROR_NATIVE}"
- This variable gives the native status returned by the last pipe close, backtick command, successful call to
wait()orwaitpid(), or from thesystem()operator. See perlvar for details. (Contributed by Gisle Aas.) - "${^RE_TRIE_MAXBUF}"
- See “Trie optimisation of literal string alternations”.
- "${^WIN32_SLOPPY_STAT}"
- See “Sloppy stat on Windows”.
Miscellaneous
"unpack()" now defaults to unpacking the $_ variable.
"mkdir()" without arguments now defaults to $_.
The internal dump output has been improved, so that non-printable characters such as newline and backspace are output in "
